Best practices

Tips and tricks to build YSQL applications for high performance and availability

Use application patterns

Running applications in multiple data centers with data split across them is not a trivial task. When designing global applications, choose a suitable design pattern for your application from a suite of battle-tested design paradigms, including Global database, Multi-master, Standby cluster, Duplicate indexes, and more. You can also combine these patterns as per your needs.

For more details, see Build global applications.

Colocation

Colocated tables optimize latency and performance for data access by reducing the need for additional trips across the network for small tables. Additionally, it reduces the overhead of creating a tablet for every relation (tables, indexes, and so on) and their storage per node.

For more details, see colocation.

Faster reads with covering indexes

When a query uses an index to look up rows faster, the columns that are not present in the index are fetched from the original table. This results in additional round trips to the main table leading to increased latency.

Use covering indexes to store all the required columns needed for your queries in the index. Indexing converts a standard Index-Scan to an Index-Only-Scan.

Faster writes with partial indexes

A partial index is an index that is built on a subset of a table and includes only rows that satisfy the condition specified in the WHERE clause. This speeds up any writes to the table and reduces the size of the index, thereby improving speed for read queries that use the index.

For more details, see Partial indexes.

Distinct keys with unique indexes

If you need values in some of the columns to be unique, you can specify your index as UNIQUE.

When a unique index is applied to two or more columns, the combined values in these columns can't be duplicated in multiple rows. Note that because a NULL value is treated as a distinct value, you can have multiple NULL values in a column with a unique index.

For more details, see Unique indexes.

Faster sequences with server-level caching

Sequences in databases automatically generate incrementing numbers, perfect for generating unique values like order numbers, user IDs, check numbers, and so on. They prevent multiple application instances from concurrently generating duplicate values. However, generating sequences on a database that is spread across regions could have a latency impact on your applications.

Enable server-level caching to improve the speed of sequences, and also avoid discarding many sequence values when an application disconnects.

For a demo, see the YugabyteDB Friday Tech Talk on Scaling sequences with server-level caching.

Fast single-row transactions

Common scenarios of updating rows and fetching the results in multiple statements can lead to multiple round-trips between the application and server. In many cases, rewriting these statements as single statements using the RETURNING clause will lead to lower latencies as YugabyteDB has optimizations to make single statements faster. For example, the following statements:

SELECT v FROM txndemo WHERE k=1 FOR UPDATE;
UPDATE txndemo SET v = v + 3 WHERE k=1;
SELECT v FROM txndemo WHERE k=1;

can be re-written as follows:

UPDATE txndemo SET v = v + 3 WHERE k=1 RETURNING v;

For more details, see Fast single-row transactions.

Delete older data quickly with partitioning

Use table partitioning to split your data into multiple partitions according to date so that you can quickly delete older data by dropping the partition.

For more details, see Partition data by time.

Load balance and failover using smart drivers

YugabyteDB smart drivers provide advanced cluster-aware load-balancing capabilities that enables your applications to send requests to multiple nodes in the cluster just by connecting to one node. You can also set a fallback hierarchy by assigning priority to specific regions and ensuring that connections are made to the region with the highest priority, and then fall back to the region with the next priority in case the high-priority region fails.

For more details, see Load balancing with smart drivers.

Scale your application with connection pools

Set up different pools with different load balancing policies as needed for your application to scale by using popular pooling solutions such as HikariCP and Tomcat along with YugabyteDB smart drivers.

For more details, see Connection pooling.

Use YSQL Connection Manager

YugabyteDB includes a built-in connection pooler, YSQL Connection Manager, which provides the same connection pooling advantages as other external pooling solutions, but without many of their limitations. As the manager is bundled with the product, it is convenient to manage, monitor, and configure the server connections.

Re-use query plans with prepared statements

Whenever possible, use prepared statements to ensure that YugabyteDB can re-use the same query plan and eliminate the need for a server to parse the query on each operation.

For more details, see Prepared statements in PL/pgSQL.

Large scans and batch jobs

Use BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY DEFERRABLE for batch or long-running jobs, which need a consistent snapshot of the database without interfering, or being interfered with by other transactions.

For more details, see Large scans and batch jobs.

Paralleling across tablets

For large or batch SELECTs or DELETEs that have to scan all tablets, you can parallelize your operation by creating queries that affect only a specific part of the tablet using the yb_hash_code function.

For more details, see Distributed parallel queries.

Single availability zone (AZ) deployments

In single AZ deployments, you need to set the yb-tserver flag --durable_wal_write=true to not lose data if the whole datacenter goes down (For example, power failure).

Row size limit

Big columns add up when you select full or multiple rows. For consistent latency or performance, it is recommended keeping the size under 10MB or less, and a maximum of 32MB.

Column size limit

For consistent latency or performance, it is recommended to size columns in the 2MB range or less even though an individual column or row limit is supported till 32MB.

TRUNCATE tables instead of DELETE

TRUNCATE deletes the database files that store the table and is much faster than DELETE which inserts a delete marker for each row in transactions that are later removed from storage during compaction runs.

Settings for CI and CD integration tests

You can set certain gflags to increase performance using YugabyteDB in CI and CD automated test scenarios as follows:

  • Point the gflags --fs_data_dirs, and --fs_wal_dirs to a RAMDisk directory to make DML, DDL, cluster creation and deletion faster, ensuring that data is not written to disk. This will make DML,DDL and create,destroy a cluster faster because data will not be written to disk
  • Set the flag --yb_num_shards_per_tserver=1. Reducing the number of shards lowers overhead when creating or dropping YCQL tables, and writing or reading small amounts of data. Reducing the number of shards lowers overhead when creating,dropping YCQL tables and writing,reading small amounts of data
  • Use colocated databases in YSQL. Colocation lowers overhead when creating or dropping YSQL tables, and writing or reading small amounts of data. Colocation will lower overhead when creating/dropping YSQL tables and writing,reading small amounts of data
  • Set the flag --replication_factor=1 for test scenarios, as keeping the data three way replicated (default) is not necessary. Reducing that to 1 reduces space usage and increases performance. For these testing scenarios, perhaps the default of keeping the data 3-way replicated is not necessary. Reducing that down to 1 cuts space usage and increases perf.
  • Use TRUNCATE table1,table2,table3..tablen; instead of CREATE TABLE, and DROP TABLE between test cases.