API FAQ

Answers to common questions about YugabyteDB APIs and compatibility

Contents

General
API compatibility
YSQL compatibility with PostgreSQL
YCQL compatibility with Apache Cassandra QL

General

What client APIs are supported by YugabyteDB?

YugabyteDB supports two flavors of distributed SQL.

Yugabyte SQL (YSQL)

YSQL is a fully-relational SQL API that is wire compatible with the SQL language in PostgreSQL. It is best fit for RDBMS workloads that need horizontal write scalability and global data distribution while also using relational modeling features such as JOINs, distributed transactions and referential integrity (such as foreign keys). Get started by exploring YSQL features.

Yugabyte Cloud QL (YCQL)

YCQL is a semi-relational SQL API that is best fit for internet-scale OLTP and HTAP applications needing massive data ingestion and blazing-fast queries. It supports distributed transactions, strongly consistent secondary indexes and a native JSON column type. YCQL has its roots in the Cassandra Query Language. Get started by exploring YCQL features.

Note

The YugabyteDB APIs are isolated and independent from one another today. This means that the data inserted or managed by one API cannot be queried by the other API. Additionally, there is no common way to access the data across the APIs (external frameworks such as Presto can help for basic cases).

The net impact is that you need to select an API first before undertaking detailed database schema/query design and implementation.

When should I pick YCQL over YSQL?

You should pick YCQL over YSQL if your application:

  • Does not require fully-relational data modeling constructs, such as foreign keys and JOINs. Note that strongly-consistent secondary indexes and unique constraints are supported by YCQL.
  • Needs to serve low-latency (sub-millisecond) queries.
  • Needs TTL-driven automatic data expiration.
  • Needs to integrate with stream processors, such as Apache Spark and KSQL.

If you have a specific use case in mind, share it in our Slack community and the community can help you decide the best approach.

What is the difference between ysqlsh and psql?

The YSQL shell (ysqlsh) is functionally similar to PostgreSQL's psql , but uses different default values for some variables (for example, the default user, default database, and the path to TLS certificates). This is done for the user's convenience. In the Yugabyte bin directory, the deprecated psql alias opens the ysqlsh CLI. For more details, see ysqlsh.

What is the status of the YEDIS API?

In the near-term, Yugabyte is not actively working on new feature or driver enhancements to the YEDIS API other than bug fixes and stability improvements. Current focus is on YSQL and YCQL.

For key-value workloads that need persistence, elasticity and fault-tolerance, YCQL (with the notion of keyspaces, tables, role-based access control, and more) is often a great fit, especially if the application is new rather than an existing one already written in Redis. The YCQL drivers are also more clustering aware, and hence YCQL is expected to perform better than YEDIS for equivalent scenarios. In general, our new feature development (support for data types, built-ins, TLS, backups, and more), correctness testing (using Jepsen), and performance optimization is in the YSQL and YCQL areas.

API compatibility

What does API compatibility mean exactly?

API compatibility refers to the fact that the database APIs offered by YugabyteDB servers implement the same wire protocol and modeling/query language as that of an existing database. Because client drivers, command line shells, IDE integrations, and other ecosystem integrations of the existing database rely on this wire protocol and modeling/query language, they are expected to work with YugabyteDB without major modifications.

Note

The YSQL API is compatible with PostgreSQL. This means PostgreSQL client drivers, psql command line shell, IDE integrations such as TablePlus and DBWeaver, and more can be used with YugabyteDB. The same concept applies to YCQL in the context of the Apache Cassandra Query Language.
  • YugabyteDB's API compatibility is aimed at accelerating developer onboarding. By integrating well with the existing ecosystem, YugabyteDB ensures that developers can get started quickly using a language they are already comfortable with.

  • YugabyteDB's API compatibility is not aimed at lift-and-shift porting of existing applications written for the original language. This is because existing applications are not written to take advantage of the distributed, strongly-consistent storage architecture that YugabyteDB provides. For such existing applications, developers should expect to modify their previously monolithic PostgreSQL and/or non-transactional Cassandra data access logic as they look to migrate to YugabyteDB.

YSQL compatibility with PostgreSQL

What is the extent of compatibility with PostgreSQL?

As highlighted in Distributed PostgreSQL on a Google Spanner Architecture – Query Layer, YSQL reuses the open source PostgreSQL query layer (written in C) as much as possible and as a result is wire-compatible with PostgreSQL dialect and client drivers. Specifically, YSQL is based on PostgreSQL v11.2. Following are some of the currently supported features:

  • DDL statements: CREATE, DROP, and TRUNCATE tables
  • Data types: All primitive types including numeric types (integers and floats), text data types, byte arrays, date-time types, UUID, SERIAL, as well as JSONB
  • DML statements: INSERT, UPDATE, SELECT, and DELETE. Bulk of core SQL including JOINs, WHERE clauses, GROUP BY, ORDER BY, LIMIT, OFFSET, and SEQUENCES
  • Transactions: ABORT, ROLLBACK, BEGIN, END, and COMMIT
  • Expressions: Rich set of PostgreSQL built-in functions and operators
  • Other Features: VIEWs, EXPLAIN, PREPARE-BIND-EXECUTE, and JDBC support

For more information, refer to SQL features.

YugabyteDB's goal is to remain as compatible with PostgreSQL as much as possible. If you see a feature currently missing, please file a GitHub issue for us.

Can I insert data using YCQL, but read using YSQL, or vice versa?

The YugabyteDB APIs are currently isolated and independent from one another. Data inserted or managed by one API cannot be queried by the other API. Additionally, Yugabyte does not provide a way to access the data across the APIs. An external framework, such as Presto, might be helpful for basic use cases. For an example that joins YCQL and YSQL data, see the blog post about Presto on YugabyteDB: Interactive OLAP SQL Queries Made Easy.

Allowing YCQL tables to be accessed from the PostgreSQL-compatible YSQL API as foreign tables using foreign data wrappers (FDW) is on the roadmap. You can comment or increase the priority of the associated GitHub issue.

YCQL compatibility with Apache Cassandra QL

YCQL is compatible with v3.4 of Apache Cassandra QL (CQL). Following questions highlight how YCQL differs from CQL.

Features present in YCQL but not present in CQL

  1. Strongly-consistent reads and writes for a single row as an absolute guarantee. This is because YugabyteDB is a Consistent & Partition-tolerant (CP) database as opposed to Apache Cassandra which is an Available & Partition-tolerant (AP) database. Official Jepsen tests prove this correctness aspect under extreme failure conditions.
  2. JSONB column type for modeling document data
  3. Distributed transactions for multi-row ACID transactions.

Features present in both YCQL and CQL but YCQL provides stricter guarantees

  1. Secondary indexes are by default strongly consistent because internally they use distributed transactions.
  2. INTEGER and COUNTER data types are equivalent and both can be incremented without any lightweight transactions.
  3. Timeline-consistent tunably-stale reads that maintain ordering guarantees from either a follower replica in the primary cluster or a observer replica in the read replica cluster.

CQL features that are either unnecessary or disallowed in YCQL

  1. Lightweight transactions for compare-and-swap operations (such as incrementing integers) are unnecessary because YCQL achieves single row linearizability by default.
  2. Tunable write consistency is disallowed in YCQL because writes are committed at quorum using Raft replication protocol.

This blog goes into the details of YCQL vs Apache Cassandra architecture and is recommended for further reading.

Do INSERTs do "upserts" by default? How do I insert data only if it is absent?

By default, inserts overwrite data on primary key collisions. So INSERTs do an upsert. This an intended CQL feature. In order to insert data only if the primary key is not already present, add a clause "IF NOT EXISTS" to the INSERT statement. This will cause the INSERT fail if the row exists.

Here is an example from CQL:

INSERT INTO mycompany.users (id, lastname, firstname)
   VALUES (100, 'Smith', 'John')
IF NOT EXISTS;

Can I have collection data types in the partition key? Will I be able to do partial matches on that collection data type?

Yes, you can have collection data types as primary keys as long as they are marked FROZEN. Collection types that are marked FROZEN do not support partial matches.

What is the difference between a COUNTER data type and INTEGER data type?

Unlike Apache Cassandra, YugabyteDB COUNTER type is almost the same as INTEGER types. There is no need of lightweight transactions requiring 4 round trips to perform increments in YugabyteDB - these are efficiently performed with just one round trip.

How is 'USING TIMESTAMP' different in YugabyteDB?

In Apache Cassandra, the highest timestamp provided always wins. For example:

INSERT with timestamp far in the future:

> INSERT INTO table (c1, c2, c3) VALUES (1, 2, 3) USING TIMESTAMP 1607681258727447;
> SELECT * FROM table;
 c1 | c2 | c3
----+----+----
  1 |  2 |  3

INSERT at the current timestamp does not overwrite previous value which was written at a higher timestamp:

> INSERT INTO table (c1, c2, c3) VALUES (1, 2, 4);
> SELECT * FROM table;
 c1 | c2 | c3
----+----+----
  1 |  2 |  3

On the other hand in Yugabyte, for efficiency purposes INSERTs and UPDATEs without the USING TIMESTAMP clause always overwrite the older values. On the other hand, if you have the USING TIMESTAMP clause, then appropriate timestamp ordering is performed. Example:

> INSERT INTO table (c1, c2, c3) VALUES (1, 2, 3) USING TIMESTAMP 1000;
> SELECT * FROM table;
 c1 | c2 | c3
----+----+----
  1 |  2 |  3

INSERT with timestamp far in the future, this would overwrite old value.

> INSERT INTO table (c1, c2, c3) VALUES (1, 2, 4) USING TIMESTAMP 1607681258727447;
> SELECT * FROM table;
 c1 | c2 | c3
----+----+----
  1 |  2 |  4

INSERT without 'USING TIMESTAMP' will always overwrite.

> INSERT INTO table (c1, c2, c3) VALUES (1, 2, 5);
> SELECT * FROM table;
 c1 | c2 | c3
----+----+----
  1 |  2 |  5