DocDB, YugabyteDB's underlying distributed document store, uses a heavily customized version of RocksDB for node-local persistence. It has been engineered ground up to deliver high performance at a massive scale. Several features have been built into DocDB to support this design goal, including the following:
- Scan-resistant global block cache
- Bloom/index data splitting
- Global memstore limit
- Separate compaction queues to reduce read amplification
- Smart load balancing across disks
These features enable YugabyteDB to handle massive datasets with ease. The following scenario loads 1 billion rows into a 3-node cluster.
1 billion rows
This experiment uses the YCSB benchmark with the standard JDBC binding to load 1 billion rows. The expected dataset at the end of the data load is 1TB on each node (or a 3TB total data set size in each cluster).
|Storage||2 x 5TB SSDs (gp2 EBS)|
|Sharding||Range and Hash|
Results at a glance
1B row data load completed successfully for YSQL (using a range-sharded table) in about 26 hours. The following graphs show the throughput and average latency for the range-sharded load phase of the YCSB benchmark during the load phase.
The total dataset was 3.2TB across the two nodes. Each node had just over 1TB of data.