Join us on
Star us on
Get Started
Slack
GitHub
Get Started
v2.5 (latest) v2.2 (stable) v2.1 (earlier version) v2.0 (earlier version) v1.3 (earlier version)
  • YUGABYTEDB CORE
    • Quick start
      • 1. Install YugabyteDB
      • 2. Create a local cluster
      • 3. Explore distributed SQL
      • 4. Build an application
        • Java
        • NodeJS
        • Go
        • Python
        • Ruby
        • C#
        • PHP
        • C++
        • C
        • Scala
    • Explore features
      • Linear scalability
      • Fault tolerance
      • Global distribution
      • Auto sharding
      • Follower reads
      • Observability
      • Colocated tables
      • Change data capture (CDC)
      • Two data center (2DC)
    • Develop
      • Learn app development
        • 1. SQL vs NoSQL
        • 2. Data modeling
        • 3. Data types
        • 4. ACID transactions
        • 5. Aggregations
        • 6. Batch operations
        • 7. Date and time
        • 8. Strings and text
        • 9. TTL for data expiration
      • Ecosystem integrations
        • Apache Kafka
        • Apache Spark
        • JanusGraph
        • KairosDB
        • Presto
        • Metabase
      • Build GraphQL apps
        • Hasura
        • Prisma
      • Real-world examples
        • E-Commerce app
        • IoT fleet management
        • Retail Analytics
      • Explore sample apps
      • Best practices
    • Migrate
      • Migration Process
      • Migrating From PostgreSQL
        • Schema Migration
        • App Migration
        • Export Data
        • Prepare Cluster
        • Import Data
        • Verify Migration
    • Deploy
      • Deployment checklist
      • Manual deployment
        • 1. System configuration
        • 2. Install software
        • 3. Start YB-Masters
        • 4. Start YB-TServers
        • 5. Verify deployment
      • Kubernetes
        • Single-zone
          • Open Source
          • Amazon EKS
          • Google Kubernetes Engine
          • Azure Kubernetes Service
        • Multi-zone
          • Amazon EKS
          • Google Kubernetes Engine
        • Multi-cluster
          • Google Kubernetes Engine
        • Best practices
        • Connect clients
      • Docker
      • Public clouds
        • Amazon Web Services
        • Google Cloud Platform
        • Microsoft Azure
      • Multi-DC deployments
        • Three+ data center (3DC)
        • Two data center (2DC)
        • Read replica clusters
      • Change data capture (CDC)
        • CDC to Kafka
    • Benchmark
      • TPC-C
      • sysbench
      • YCSB
      • Key-value workload
      • Large datasets
      • Scalability
        • Scaling queries
      • Resilience
        • Jepsen testing
    • Secure
      • Security checklist
      • Authentication
        • Authentication
        • Fine-grained authentication
      • Encryption in transit
        • Create client certificates
        • Create server certificates
        • Enable server-to-server encryption
        • Enable client-to-server encryption
        • Connect to clusters
      • Encryption at rest
      • Authorization
        • RBAC model
        • Create roles
        • Grant privileges
      • Audit logging
    • Manage
      • Back up and restore
        • Back up data
        • Restore data
        • Snapshot and restore data
      • Migrate data
        • Bulk import
        • Bulk export
      • Change cluster configuration
      • Diagnostics reporting
      • Upgrade a deployment
    • Troubleshoot
      • Troubleshooting
      • Cluster level issues
        • YCQL connection issues
        • YEDIS connection Issues
        • Recover tserver/master
      • Node level issues
        • Check servers
        • Inspect logs
        • System statistics
        • Disk failure
    • Contribute
  • YUGABYTE PLATFORM
    • Yugabyte Platform
      • Overview
        • Install
        • Configure
      • Install Yugabyte Platform
        • Prerequisites
        • Prepare the environment
        • Install software
        • Prepare nodes (on-prem)
        • Uninstall software
      • Configure Yugabyte Platform
        • Create admin user
        • Configure the cloud provider
        • Configure the backup target
        • Configure alerts and health checking
        • Create and edit instance tags
      • Create deployments
        • Multi-zone universe
        • Multi-region universe
        • Read replica cluster
      • Manage deployments
        • Start and stop processes
        • Add a node
        • Remove a node
        • Edit a universe
        • Edit configuration flags
        • Upgrade the YugabyteDB software
        • Delete a universe
        • Migrate to Helm 3
      • Back up and restore universes
        • Configure backup storage
        • Back up universe data
        • Restore universe data
        • Schedule data backups
      • Security
        • Security checklist
        • Customize ports
        • Authorization platform
        • Create a KMS configuration
        • Enable encryption at rest
        • Enable encryption in transit (TLS)
        • Network security
      • Troubleshoot
        • Install and upgrade
        • Universe
      • Administer Yugabyte Platform
        • Back up and restore Yugabyte Platform
  • YUGABYTE CLOUD
    • Yugabyte Cloud
      • Free tier
      • Create clusters
      • Monitor clusters
      • Create databases
      • Manage database access
      • Connect to clusters
  • REFERENCE
    • Reference
    • Architecture
      • Design goals
      • Key concepts
        • Universe
        • YB-TServer Service
        • YB-Master Service
      • Layered architecture
      • Query layer
        • Overview
      • DocDB transactions layer
        • Transactions overview
        • Transaction isolation levels
        • Explicit locking
        • Single-row transactions
        • Distributed transactions
        • Transactional IO path
      • DocDB sharding layer
        • Hash & range sharding
        • Tablet splitting
        • Colocated tables
      • DocDB replication layer
        • Replication
        • xCluster replication
        • Read replicas
        • Change data capture (CDC)
      • DocDB storage layer
        • Persistence
        • Performance
    • APIs
      • YSQL
        • Statements
          • ABORT
          • ALTER DATABASE
          • ALTER DEFAULT PRIVILEGES
          • ALTER DOMAIN
          • ALTER GROUP
          • ALTER POLICY
          • ALTER ROLE
          • ALTER SEQUENCE
          • ALTER TABLE
          • ALTER USER
          • BEGIN
          • COMMENT
          • COMMIT
          • COPY
          • CREATE AGGREGATE
          • CREATE CAST
          • CREATE DATABASE
          • CREATE DOMAIN
          • CREATE EXTENSION
          • CREATE FUNCTION
          • CREATE GROUP
          • CREATE INDEX
          • CREATE OPERATOR
          • CREATE OPERATOR CLASS
          • CREATE POLICY
          • CREATE PROCEDURE
          • CREATE ROLE
          • CREATE RULE
          • CREATE SCHEMA
          • CREATE SEQUENCE
          • CREATE TABLE
          • CREATE TABLE AS
          • CREATE TRIGGER
          • CREATE TYPE
          • CREATE USER
          • CREATE VIEW
          • DEALLOCATE
          • DELETE
          • DO
          • DROP AGGREGATE
          • DROP CAST
          • DROP DATABASE
          • DROP DOMAIN
          • DROP EXTENSION
          • DROP FUNCTION
          • DROP GROUP
          • DROP OPERATOR
          • DROP OPERATOR CLASS
          • DROP OWNED
          • DROP POLICY
          • DROP PROCEDURE
          • DROP ROLE
          • DROP RULE
          • DROP SEQUENCE
          • DROP TABLE
          • DROP TRIGGER
          • DROP TYPE
          • DROP USER
          • END
          • EXECUTE
          • EXPLAIN
          • GRANT
          • INSERT
          • LOCK
          • PREPARE
          • REASSIGN OWNED
          • RESET
          • REVOKE
          • ROLLBACK
          • SELECT
          • SET
          • SET CONSTRAINTS
          • SET ROLE
          • SET SESSION AUTHORIZATION
          • SET TRANSACTION
          • SHOW
          • SHOW TRANSACTION
          • TRUNCATE
          • UPDATE
        • Data types
          • Array
            • array[] constructor
            • Literals
              • Text typecasting and literals
              • Array of primitive values
              • Row
              • Array of rows
            • FOREACH loop (PL/pgSQL)
            • array of DOMAINs
            • Functions and operators
              • ANY and ALL
              • Array comparison
              • Array slice operator
              • Array concatenation
              • Array properties
              • array_agg(), unnest(), generate_subscripts()
              • array_fill()
              • array_position(), array_positions()
              • array_remove()
              • array_replace() / set value
              • array_to_string()
              • string_to_array()
          • Binary
          • Boolean
          • Character
          • Date and time
          • JSON
            • JSON literals
            • Primitive and compound data types
            • Code example conventions
            • Indexes and check constraints
            • Functions & operators
              • ::jsonb, ::json, ::text (typecast)
              • ->, ->>, #>, #>> (JSON subvalues)
              • - and #- (remove)
              • || (concatenation)
              • = (equality)
              • @> and <@ (containment)
              • ? and ?| and ?& (key or value existence)
              • array_to_json()
              • jsonb_agg()
              • jsonb_array_elements()
              • jsonb_array_elements_text()
              • jsonb_array_length()
              • jsonb_build_object()
              • jsonb_build_array()
              • jsonb_each()
              • jsonb_each_text()
              • jsonb_extract_path()
              • jsonb_extract_path_text() and json_extract_path_text()
              • jsonb_object()
              • jsonb_object_agg()
              • jsonb_object_keys()
              • jsonb_populate_record()
              • jsonb_populate_recordset()
              • jsonb_pretty()
              • jsonb_set() and jsonb_insert()
              • jsonb_strip_nulls()
              • jsonb_to_record()
              • jsonb_to_recordset()
              • jsonb_typeof()
              • row_to_json()
              • to_jsonb()
          • Money
          • Numeric
          • Serial
          • UUID
        • Functions and operators
          • currval()
          • lastval()
          • nextval()
          • Window functions
            • Informal functionality overview
            • Invocation SQL syntax and semantics
            • Per function signature and purpose
              • row_number(), rank() and dense_rank()
              • percent_rank(), cume_dist() and ntile()
              • first_value(), nth_value(), last_value()
              • lag(), lead()
              • Tables for the code examples
                • table t1
                • table t2
                • table t3
                • table t4
            • Analyzing a normal distribution
              • Bucket allocation scheme
              • do_clean_start.sql
              • cr_show_t4.sql
              • cr_dp_views.sql
              • cr_int_views.sql
              • cr_pr_cd_equality_report.sql
              • cr_bucket_using_width_bucket.sql
              • cr_bucket_dedicated_code.sql
              • do_assert_bucket_ok
              • cr_histogram.sql
              • cr_do_ntile.sql
              • cr_do_percent_rank.sql
              • cr_do_cume_dist.sql
              • do_populate_results.sql
              • do_report_results.sql
              • do_compare_dp_results.sql
              • do_demo.sql
              • Reports
                • Histogram report
                • dp-results
                • compare-dp-results
                • int-results
        • Extensions
        • Keywords
        • Reserved names
      • YCQL
        • ALTER KEYSPACE
        • ALTER ROLE
        • ALTER TABLE
        • CREATE INDEX
        • CREATE KEYSPACE
        • CREATE ROLE
        • CREATE TABLE
        • CREATE TYPE
        • DROP INDEX
        • DROP KEYSPACE
        • DROP ROLE
        • DROP TABLE
        • DROP TYPE
        • GRANT PERMISSION
        • GRANT ROLE
        • REVOKE PERMISSION
        • REVOKE ROLE
        • USE
        • INSERT
        • SELECT
        • EXPLAIN
        • UPDATE
        • DELETE
        • TRANSACTION
        • TRUNCATE
        • Simple expressions
        • Subscripted expressions
        • Function call
        • Operators
        • BLOB
        • BOOLEAN
        • Collection
        • FROZEN
        • INET
        • Integer and counter
        • Non-integer
        • TEXT
        • DATE, TIME, and TIMESTAMP
        • UUID and TIMEUUID
        • JSONB
        • Date and time
        • BATCH
    • CLIs
      • yb-ctl
      • yb-docker-ctl
      • ysqlsh
      • ycqlsh
      • yb-admin
      • yb-ts-cli
      • ysql_dump
      • ysql_dumpall
    • Configuration
      • yb-tserver
      • yb-master
      • yugabyted
      • Default ports
    • Drivers
      • Client drivers for YSQL API
      • YugabyteDB JDBC Driver
      • Client drivers for YCQL
      • Spring Data YugabyteDB
    • Connectors
      • Kafka Connect YugabyteDB
    • Third party tools
      • DBeaver
      • DbSchema
      • pgAdmin
      • SQL Workbench/J
      • TablePlus
      • Visual Studio Code
    • Sample datasets
      • Chinook
      • Northwind
      • PgExercises
      • SportsDB
  • RELEASES
    • Releases
    • Releases overview
    • Release versioning
    • What's new
      • 2.3 (latest)
      • 2.2 (stable)
    • Earlier releases
      • v2.1.8
      • v2.1.6
      • v2.1.5
      • v2.1.4
      • v2.1.3
      • v2.1.2
      • v2.1.1
      • v2.1.0
      • v2.0.11
      • v2.0.10
      • v2.0.9
      • v2.0.8
      • v2.0.7
      • v2.0.6
      • v2.0.5
      • v2.0.3
      • v2.0.1
      • v2.0.0
      • v1.3.1
      • v1.3.0
      • v1.2.12
      • v1.2.11
      • v1.2.10
      • v1.2.9
      • v1.2.8
      • v1.2.6
      • v1.2.5
      • v1.2.4
  • FAQ
    • Comparisons
      • Amazon Aurora
      • Google Cloud Spanner
      • CockroachDB
      • TiDB
      • Vitess
      • MongoDB
      • FoundationDB
      • Amazon DynamoDB
      • Azure Cosmos DB
      • Apache Cassandra
      • PostgreSQL
      • Redis in-memory store
      • Apache HBase
    • FAQs
      • General FAQ
      • Operations FAQ
      • API compatibility FAQ
      • Yugabyte Platform FAQ
  • MISC
    • YEDIS
      • Quick start
      • Develop
        • Build an application
        • C#
        • C++
        • Go
        • Java
        • NodeJS
        • Python
      • API reference
        • APPEND
        • AUTH
        • CONFIG
        • CREATEDB
        • DELETEDB
        • LISTDB
        • SELECT
        • DEL
        • ECHO
        • EXISTS
        • EXPIRE
        • EXPIREAT
        • FLUSHALL
        • FLUSHDB
        • GET
        • GETRANGE
        • GETSET
        • HDEL
        • HEXISTS
        • HGET
        • HGETALL
        • HINCRBY
        • HKEYS
        • HLEN
        • HMGET
        • HMSET
        • HSET
        • HSTRLEN
        • HVALS
        • INCR
        • INCRBY
        • KEYS
        • MONITOR
        • PEXPIRE
        • PEXPIREAT
        • PTTL
        • ROLE
        • SADD
        • SCARD
        • RENAME
        • SET
        • SETEX
        • PSETEX
        • SETRANGE
        • SISMEMBER
        • SMEMBERS
        • SREM
        • STRLEN
        • ZRANGE
        • TSADD
        • TSCARD
        • TSGET
        • TSLASTN
        • TSRANGEBYTIME
        • TSREM
        • TSREVRANGEBYTIME
        • TTL
        • ZADD
        • ZCARD
        • ZRANGEBYSCORE
        • ZREM
        • ZREVRANGE
        • ZSCORE
        • PUBSUB
        • PUBLISH
        • SUBSCRIBE
        • UNSUBSCRIBE
        • PSUBSCRIBE
        • PUNSUBSCRIBE
    • Legal
      • Third party software
> Explore features >

Auto sharding

Attention

This page documents an earlier version. Go to the latest (v2.3) version.
  • 1. Create a universe
  • 2. Create a table
  • 3. Explore tablets
  • 4. Insert and query a table
  • 5. Automatic sharding when add nodes
  • 6. [Optional] Clean up
  • macOS
  • Linux

YugabyteDB automatically splits user tables into multiple shards, called tablets. The primary key for each row in the table uniquely determines the tablet the row lives in. For data distribution purposes, a hash based partitioning scheme is used. Read more about how sharding works in YugabyteDB.

By default, YugabyteDB creates eight tablets per node in the cluster for each table and automatically distributes the data across the various tablets, which in turn are distributed evenly across the nodes. In this tutorial, you will explore how automatic sharding is done internally for tables. The system Redis table works in an identical manner.

We will explore automatic sharding inside YugabyteDB by creating these tables:

  • Use a replication factor (RF) of 1. This will make it easier to understand how automatic sharding is achieved independent of data replication.
  • Insert entries one by one, and examine which how the data gets distributed across the various nodes.

This tutorial uses the yb-ctl local cluster management utility.

1. Create a universe

If you have a currently running local universe, destroy it using the following.

$ ./bin/yb-ctl destroy

Start a new local universe with a replication factor of 1 (rf=1). We are passing the following flags:

  • --rf 1 This creates a universe with a replication factor of 1.
  • --num_shards_per_tserver 4 This flag controls the total number of tablets (or partitions) when creating a new table. By setting the value to 4, 12 tablets will be created on a 3-node cluster.
  • --tserver_flags "memstore_size_mb=1" This sets the total size of memstores on the tablet-servers to 1MB. This will force a flush of the data to disk when a value greater than 1MB is added, so that you can observe which tablets the data is written to.

You can do this as shown below.

$ ./bin/yb-ctl --rf 1 --num_shards_per_tserver 4 create \
             --tserver_flags "memstore_size_mb=1"

This example creates a universe with one node. Now, let's add two more nodes to make this a 3-node, rf=1 universe. We need to pass the memstore size flag to each of the added YB-TServer servers. You can do that by running the following:

$ ./bin/yb-ctl add_node --tserver_flags "memstore_size_mb=1"
$ ./bin/yb-ctl add_node --tserver_flags "memstore_size_mb=1"

We can check the status of the cluster to confirm that you have three YB-TServer servers.

$ ./bin/yb-ctl status
----------------------------------------------------------------------------------------------------
| Node Count: 3 | Replication Factor: 1                                                            |
----------------------------------------------------------------------------------------------------
| JDBC                : jdbc:postgresql://127.0.0.1:5433/yugabyte                                  |
| YSQL                : ./bin/ysqlsh                                                               |
| YCQL                : ./bin/ycqlsh 127.0.0.1 9042                                                 |
| YEDIS               : ./bin/redis-cli -h 127.0.0.1 -p 6379                                       |
| Web UI              : http://127.0.0.1:7000/                                                     |
| Cluster Data        : /Users/schoudhury/yugabyte-data                                            |
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
| Node 1: yb-tserver (pid 72053), yb-master (pid 72050)                                            |
----------------------------------------------------------------------------------------------------
| JDBC                : jdbc:postgresql://127.0.0.1:5433/yugabyte                                  |
| YSQL                : ./bin/ysqlsh                                                               |
| YCQL                : ./bin/ycqlsh 127.0.0.1 9042                                                 |
| YEDIS               : ./bin/redis-cli -h 127.0.0.1 -p 6379                                       |
| data-dir[0]         : /Users/schoudhury/yugabyte-data/node-1/disk-1/yb-data                      |
| TServer Logs        : /Users/schoudhury/yugabyte-data/node-1/disk-1/yb-data/tserver/logs         |
| Master Logs         : /Users/schoudhury/yugabyte-data/node-1/disk-1/yb-data/master/logs          |
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
| Node 2: yb-tserver (pid 72128)                                                                   |
----------------------------------------------------------------------------------------------------
| JDBC                : jdbc:postgresql://127.0.0.1:5433/yugabyte                                  |
| YSQL                : ./bin/ysqlsh                                                               |
| YCQL                : ./bin/ycqlsh 127.0.0.2 9042                                                 |
| YEDIS               : ./bin/redis-cli -h 127.0.0.2 -p 6379                                       |
| data-dir[0]         : /Users/schoudhury/yugabyte-data/node-2/disk-1/yb-data                      |
| TServer Logs        : /Users/schoudhury/yugabyte-data/node-2/disk-1/yb-data/tserver/logs         |
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
| Node 3: yb-tserver (pid 72166)                                                                   |
----------------------------------------------------------------------------------------------------
| JDBC                : jdbc:postgresql://127.0.0.1:5433/yugabyte                                  |
| YSQL                : ./bin/ysqlsh -U postgres -h 127.0.0.3 -p 5433                              |
| YCQL                : ./bin/ycqlsh 127.0.0.3 9042                                                 |
| YEDIS               : ./bin/redis-cli -h 127.0.0.3 -p 6379                                       |
| data-dir[0]         : /Users/schoudhury/yugabyte-data/node-3/disk-1/yb-data                      |
| TServer Logs        : /Users/schoudhury/yugabyte-data/node-3/disk-1/yb-data/tserver/logs         |
----------------------------------------------------------------------------------------------------

2. Create a table

Create a YCQL table. Since you will be using a workload application in the YugabyteDB workload generator to write data into this table, the keyspace and table name below must created exactly as shown.

$ ./bin/ycqlsh
ycqlsh> CREATE KEYSPACE ybdemo_keyspace;
ycqlsh> CREATE TABLE ybdemo_keyspace.cassandrakeyvalue (k text PRIMARY KEY, v blob);

For each table, you have instructed YugabyteDB to create four shards for each YB-TServer in this universe. Because you have three nodes, you expect 12 tablets for the ybdemo_keyspace.cassandrakeyvalue table.

3. Explore tablets

  • The tablets are evenly balanced across the various nodes.

You can see the number of tablets per node in the Tablet Servers page of the master Admin UI, by going to the table details page. The page should look something like the image below.

Number of tablets in the table

We see that each node has 4 tablets, and the total number of tablets is 12 as you expected.

  • The table has 12 tablets, each owning a range of the keyspace.

Let us navigate to the table details page to examine the various tablets. This page should look as follows.

Tablet details of the table

What you see here is that there are 12 tablets as expected, and the key ranges owned by each tablet are shown. This page also shows which node that is currently hosting (and is the leader for) each of these tablets. Note here that the tablets balancing across nodes happens on a per-table basis, so that each table is scaled out to an appropriate number of nodes.

  • Each tablet has a separate directory dedicated to it for data.

Let us list out all the tablet directories and see their sizes. This can be done as follows.

$ du -hs /yugabyte-data/node*/disk*/yb-data/tserver/data/rocksdb/table*/* | grep -v '0B'
 20K    /yugabyte-data/node-1/disk-1/yb-data/tserver/data/rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-439ae3bde90049d6812e198e76ad29a4
 20K    /yugabyte-data/node-1/disk-1/yb-data/tserver/data/rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-eecd01f0a7cd4537ba571bdb85d0c094
 20K    /yugabyte-data/node-1/disk-2/yb-data/tserver/data/rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-4ea334056a3845518cc6614baef96966
 20K    /yugabyte-data/node-1/disk-2/yb-data/tserver/data/rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-52642a3a9d7b4d38a103dff5dd77a3c6
 20K    /yugabyte-data/node-2/disk-1/yb-data/tserver/data/rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-4e31e26b3b204e34a1e0cfd6f7500525
 20K    /yugabyte-data/node-2/disk-1/yb-data/tserver/data/rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-b7ac08a983aa45a3843ab92b1719799a
 20K    /yugabyte-data/node-2/disk-2/yb-data/tserver/data/rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-22c349a07afb48e3844b570c24455431
 20K    /yugabyte-data/node-2/disk-2/yb-data/tserver/data/rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-8955db9e1ec841f3a30535b77d707586
 20K    /yugabyte-data/node-3/disk-1/yb-data/tserver/data/rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-adac9f92466b4d288a4ae346aaad3880
 20K    /yugabyte-data/node-3/disk-1/yb-data/tserver/data/rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-f04a6d5113a74ba79a04f01c80423ef5
 20K    /yugabyte-data/node-3/disk-2/yb-data/tserver/data/rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-1c472c1204fe40afbc7948dadce22be8
 20K    /yugabyte-data/node-3/disk-2/yb-data/tserver/data/rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-5aaeb96381044aa2b09ed9973830bb27

4. Insert and query a table

Let us insert a key-value entry, with the value size around 2MB. Since the memstores are configured to be 1MB, this will cause the data to flush to disk immediately. Note that the key flags you pass to the sample app are:

  • --num_unique_keys 1 - Write exactly one key. Keys are numbers converted to text, and typically start from 0.
  • --num_threads_read 0 - Do not perform any reads (hence 0 read threads).
  • --num_threads_write 1 - Since you are not writing a lot of data, a single writer thread is sufficient.
  • --value_size 10000000 - Generate the value being written as a random byte string of around 10MB size.
  • --nouuid - Do not prefix a UUID to the key. A UUID allows multiple instances of the load tester to run without interfering with each other.

Download the YugabyteDB workload generator JAR file (yb-sample-apps.jar) by running the following command.

$ wget https://github.com/yugabyte/yb-sample-apps/releases/download/1.3.1/yb-sample-apps.jar?raw=true -O yb-sample-apps.jar

Run the CasandraKeyValue workload application.

$ java -jar ./yb-sample-apps.jar --workload CassandraKeyValue \
                                    --nodes 127.0.0.1:9042 \
                                    --nouuid \
                                    --num_unique_keys 1 \
                                    --num_writes 2 \
                                    --num_threads_read 0 \
                                    --num_threads_write 1 \
                                    --value_size 10000000
2018-02-05 07:33:33,525 [INFO|...] Num unique keys to insert: 1
...
2018-02-05 07:33:36,899 [INFO|...] The sample app has finished

Let us check what we have inserted using ycqlsh.

$ ./bin/ycqlsh
ycqlsh> SELECT k FROM ybdemo_keyspace.cassandrakeyvalue;
 k
-------
 key:0

(1 rows)

Now let us check the sizes of the various tablets:

$ du -hs /yugabyte-data/node*/disk*/yb-data/tserver/data/rocksdb/table*/* | grep -v '0B'
 20K    .../rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-439ae3bde90049d6812e198e76ad29a4
9.6M    .../rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-eecd01f0a7cd4537ba571bdb85d0c094
 20K    .../rocksdb/table-9987797012ce4c1c91782c25e7608c34/tablet-4ea334056a3845518cc6614baef96966
 ...

We see that the key has been written to one of the tablets, in the case of this experiment the UUID of the tablet is eecd01f0a7cd4537ba571bdb85d0c094. We can find out from the table details page which node this tablet belongs to - it is node-1 in this case. Here is the relevant screenshot.

Tablet ownership with auto-sharding

We can also easily confirm that the node-1 indeed has about 10MB of storage being used.

Inserting values with auto-sharding

5. Automatic sharding when add nodes

Let us add one more node to the universe for a total of 4 nodes, by running the following command.

$ ./bin/yb-ctl add_node --tserver_flags "memstore_size_mb=1"

By looking at the tablet servers page, you find that the tablets are re-distributed evenly among the 4 nodes and each node now has 3 shards.

Auto-sharding when adding one node

Next, let us add 2 more nodes to the universe, making it a total of 6 nodes. We can do this by running the following.

$ ./bin/yb-ctl add_node --tserver_flags "memstore_size_mb=1"
$ ./bin/yb-ctl add_node --tserver_flags "memstore_size_mb=1"

We can verify that the tablets are evenly distributed across the 6 nodes. Each node now has 2 tablets.

Auto-sharding when adding three nodes

6. [Optional] Clean up

Optionally, you can shutdown the local cluster created in Step 1.

$ ./bin/yb-ctl destroy
  • 1. Create a universe
  • 2. Create a table
  • 3. Explore tablets
  • 4. Insert and query a table
  • 5. Automatic sharding when add nodes
  • 6. [Optional] Clean up
Ask our community
  • Slack
  • Github
  • Forum
  • StackOverflow
Yugabyte
Contact Us
Copyright © 2017-2020 Yugabyte, Inc. All rights reserved.