API compatibility FAQ
What does API compatibility exactly mean?
API compatibility refers to the fact that the database APIs offered by YugabyteDB servers implement the same wire protocol and modeling/query language as that of an existing database. Since client drivers, command line shells, IDE integrations and other ecosystem integrations of the existing database rely on this wire protocol and modeling/query language, they are expected to work with YugabyteDB without major modifications.
NoteThe YSQL API is compatible with PostgreSQL. This means PostgreSQL client drivers, psql command line shell, IDE integrations such as TablePlus and DBWeaver and more can be used with YugabyteDB. The same concept applies to YCQL in the context of the Apache Cassandra Query Language.
Why are YugabyteDB APIs compatible with popular DB languages?
YugabyteDB’s API compatibility is aimed at accelerating developer onboarding. By integrating well with the existing ecosystem, YugabyteDB ensures that developers can get started easily using a language they are already comfortable with.
YugabyteDB’s API compatibility is not aimed at lift-and-shift porting of existing applications written for the original language. This is because existing applications are not written to take advantage of the distributed, strongly-consistent storage architecture that YugabyteDB provides. For such existing applications, developers should expect to modify their previously monolithic PostgreSQL and/or non-transactional Cassandra data access logic as they look to migrate to YugabyteDB.
YSQL compatibility with PostgreSQL
What is the extent of compatibility with PostgreSQL?
As highlighted in Distributed PostgreSQL on a Google Spanner Architecture – Query Layer, YSQL reuses open source PostgreSQL’s query layer (written in C) as much as possible and as a result is wire-compatible with PostgreSQL dialect and client drivers. Specifically, YSQL is based on PostgreSQL v11.2. Following are some of the currently supported features:
- DDL statements: CREATE, DROP, and TRUNCATE tables
- Data types: All primitive types including numeric types (integers and floats), text data types, byte arrays, date-time types, UUID, SERIAL, as well as JSONB
- DML statements: INSERT, UPDATE, SELECT and DELETE. Bulk of core SQL including JOINs, WHERE clauses, GROUP BY, ORDER BY, LIMIT, OFFSET, and SEQUENCES
- Transactions: ABORT, ROLLBACK, BEGIN, END, and COMMIT
- Expressions: Rich set of PostgreSQL built-in functions and operators
- Other Features: VIEWs, EXPLAIN, PREPARE-BIND-EXECUTE, and JDBC support
YugabyteDB’s goal is to remain as compatible with PostgreSQL as much as possible. If you see a feature currently missing, please file a GitHub issue for us.
Can I insert data using YCQL, but read using YSQL, or vice versa?
The YugabyteDB APIs are currently isolated and independent from one another. Data inserted or managed by one API cannot be queried by the other API. Additionally, Yugabyte does not provide a way to access the data across the APIs. An external framework, such as Presto, might be useful for simple use cases. For an example that joins YCQL and YSQL data, see the blog post about Presto on YugabyteDB: Interactive OLAP SQL Queries Made Easy.
Allowing YCQL tables to be accessed from the PostgreSQL-compatible YSQL API as foreign tables using foreign data wrappers (FDW) is on the roadmap. You can comment or increase the priority of the associated GitHub issue.
When should I pick YCQL over YSQL?
You should pick YCQL over YSQL if your application:
* Does not require fully-relational data modeling constructs, such as foreign keys and JOINs. Note that strongly-consistent secondary indexes and unique constraints are supported by YCQL. * Requires storing large amounts of data (for example, 10TB or more). * Needs to serve low-latency (sub-millisecond) queries. * Needs TTL-driven automatic data expiration. * Needs to integrate with stream processors, such as Apache Spark and KSQL.
If you have a specific use case in mind, share it in our Slack community and the community can help you decide the best approach.
YCQL compatibility with Apache Cassandra QL
Features where YCQL goes beyond Apache Cassandra QL
- Following are the features that are present in YCQL but not present in Apache Cassandra QL.
- JSONB column type for modeling document data
- Distributed transactions for multi-row ACID transactions
- Official Jepsen tests to prove correctness under extreme failure conditions
- Following are the features that are present in both YCQL and Apache Cassandra QL but YCQL provides stricter guarantees.
- Secondary indexes are by default strongly consistent since internally they use distributed transactions.
- INTEGER and COUNTER data types are equivalent and both can be incremented without any lightweight transactions.
- Following are the features that are either unnecessary or disallowed in YCQL.
- Lightweight transactions for compare-and-swap operations (such as incrementing integers) are unnecessary because YCQL achieves single row linearizability by default.
- Tunable write consistency is disallowed in YCQL because writes are committed at quorum using Raft replication protocol.
This blog goes into the details of YCQL vs Apache Cassandra architecture and is recommended for further reading.
Do INSERTs do “upserts” by default? How do I insert data only if it is absent?
By default, inserts overwrite data on primary key collisions. So INSERTs do an upsert. This an intended CQL feature. In order to insert data only if the primary key is not already present, add a clause “IF NOT EXISTS” to the INSERT statement. This will cause the INSERT fail if the row exists.
Here is an example from CQL:
INSERT INTO mycompany.users (id, lastname, firstname) VALUES (100, ‘Smith’, ‘John’) IF NOT EXISTS;
Can I have collection data types in the partition key? Will I be able to do partial matches on that collection data type?
Yes, you can have collection data types as primary keys as long as they are marked FROZEN. Collection types that are marked
FROZEN do not support partial matches.
What is the difference between a
COUNTER data type and
INTEGER data type?
Unlike Apache Cassandra, YugabyteDB COUNTER type is almost the same as INTEGER types. There is no need of lightweight transactions requiring 4 round trips to perform increments in YugabyteDB - these are efficiently performed with just one round trip.
How is ‘USING TIMESTAMP’ different in YugabyteDB?
In Apache Cassandra, the highest timestamp provided always wins. Example:
INSERT with timestamp far in the future.
> INSERT INTO table (c1, c2, c3) VALUES (1, 2, 3) USING TIMESTAMP 1607681258727447; > SELECT * FROM table;
c1 | c2 | c3 ----+----+---- 1 | 2 | 3
INSERT at the current timestamp does not overwrite previous value which was written at a higher timestamp.
> INSERT INTO table (c1, c2, c3) VALUES (1, 2, 4); > SELECT * FROM table;
c1 | c2 | c3 ----+----+---- 1 | 2 | 3
On the other hand in Yugabyte, for efficiency purposes INSERTs and UPDATEs without the
TIMESTAMP clause always overwrite the older values. On the other hand, if we have the
TIMESTAMP clause, then appropriate timestamp ordering is performed. Example:
> INSERT INTO table (c1, c2, c3) VALUES (1, 2, 3) USING TIMESTAMP 1000; > SELECT * FROM table;
c1 | c2 | c3 ----+----+---- 1 | 2 | 3
INSERT with timestamp far in the future, this would overwrite old value.
> INSERT INTO table (c1, c2, c3) VALUES (1, 2, 4) USING TIMESTAMP 1607681258727447; > SELECT * FROM table;
c1 | c2 | c3 ----+----+---- 1 | 2 | 4
INSERT without ‘USING TIMESTAMP’ will always overwrite.
> INSERT INTO table (c1, c2, c3) VALUES (1, 2, 5); > SELECT * FROM table;
c1 | c2 | c3 ----+----+---- 1 | 2 | 5