Join us on
Star us on
Get Started
Slack
GitHub
Get Started
v2.0 (latest version) v1.3 (earlier version) v1.2 (earlier version) v1.1 (earlier version) v1.0 (earlier version)
  • GET STARTED
    • Quick start
      • 1. Install YugabyteDB
      • 2. Create a local cluster
      • 3. Explore YSQL
      • 4. Build an application
        • Java
        • NodeJS
        • Go
        • Python
        • Ruby
        • C#
        • PHP
        • C++
        • C
    • Introduction
    • Explore core
      • Linear scalability
      • Fault tolerance
      • Global distribution
      • Auto-sharding
      • Tunable reads
      • Observability
      • Two data center deployment
  • USER GUIDES
    • Develop
      • Learn app development
        • 1. SQL vs NoSQL
        • 2. Data modeling
        • 3. Data types
        • 4. ACID transactions
        • 5. Aggregations
        • 6. Batch operations
        • 7. Date and time
        • 8. Strings and text
      • Ecosystem integrations
        • Apache Kafka
        • Apache Spark
        • JanusGraph
        • KairosDB
        • Presto
        • Metabase
      • Real-world examples
        • E-Commerce App
        • IoT Fleet Management
        • Retail Analytics
      • Work with GraphQL
        • Hasura
        • Prisma
      • Explore sample applications
    • Deploy
      • Deployment checklist
      • Manual deployment
        • 1. System configuration
        • 2. Install software
        • 3. Start YB-Masters
        • 4. Start YB-TServers
        • 5. Verify deployment
      • Kubernetes
        • Helm chart
        • Helm configuration
        • Local SSD
        • Rook operator
        • YugabyteDB operator
      • Docker
      • Public clouds
        • Amazon Web Services
        • Google Cloud Platform
        • Microsoft Azure
      • Change data capture (CDC)
        • Using the Yugabyte CDC connector
        • CDC to Kafka
        • CDC to stdout
      • Two data center (2DC)
      • Pivotal Cloud Foundry
      • Yugabyte Platform
        • 1. Prepare cloud environment
        • 2. Install Admin Console
        • 3. Configure Admin Console
        • 4. Configure cloud providers
    • Benchmark
      • Performance
      • YCSB
      • Large datasets
    • Secure
      • Security checklist
      • Authentication
        • Authentication
        • Fine-grained authentication
      • Authorization
        • RBAC model
        • Create roles
        • Grant privileges
      • Encryption in transit
        • 1. Prepare nodes
        • 2. Server-server encryption
        • 3. Client-server encryption
        • 4. Connect to cluster
      • Encryption at rest
    • Manage
      • Backup and restore
        • Backing up data
        • Restoring data
        • Backing Up Data using Snapshots
      • Data migration
        • Bulk import
        • Bulk export
      • Change cluster configuration
      • Diagnostics reporting
      • Upgrade deployment
      • Yugabyte Platform
        • Create universe - Multi-zone
        • Create universe - Multi-region
        • Edit universe
        • Edit configuration options
        • Health checking and alerts
        • Create and edit instance tags
        • Node status and actions
        • Read replicas
        • Back up and restore
        • Upgrade universe
        • Delete universe
    • Troubleshoot
      • Troubleshooting
      • Cluster level issues
        • YCQL connection issues
        • YEDIS connection Issues
        • Recover tserver/master
      • Node level issues
        • Check processes
        • Inspect logs
        • System statistics
      • Yugabyte Platform
        • Troubleshoot universes
  • REFERENCE
    • APIs
      • YSQL
        • Statements
          • ABORT
          • ALTER DATABASE
          • ALTER DEFAULT PRIVILEGES
          • ALTER DOMAIN
          • ALTER GROUP
          • ALTER POLICY
          • ALTER ROLE
          • ALTER SEQUENCE
          • ALTER TABLE
          • ALTER USER
          • BEGIN
          • COMMIT
          • COPY
          • CREATE AGGREGATE
          • CREATE CAST
          • CREATE DATABASE
          • CREATE DOMAIN
          • CREATE FUNCTION
          • CREATE GROUP
          • CREATE INDEX
          • CREATE OPERATOR
          • CREATE OPERATOR CLASS
          • CREATE POLICY
          • CREATE PROCEDURE
          • CREATE ROLE
          • CREATE RULE
          • CREATE SCHEMA
          • CREATE SEQUENCE
          • CREATE TABLE
          • CREATE TABLE AS
          • CREATE TRIGGER
          • CREATE TYPE
          • CREATE USER
          • CREATE VIEW
          • DEALLOCATE
          • DELETE
          • DO
          • DROP AGGREGATE
          • DROP CAST
          • DROP DATABASE
          • DROP DOMAIN
          • DROP FUNCTION
          • DROP GROUP
          • DROP OPERATOR
          • DROP OPERATOR CLASS
          • DROP OWNED
          • DROP POLICY
          • DROP PROCEDURE
          • DROP ROLE
          • DROP RULE
          • DROP SEQUENCE
          • DROP TABLE
          • DROP TRIGGER
          • DROP TYPE
          • DROP USER
          • END
          • EXECUTE
          • EXPLAIN
          • GRANT
          • INSERT
          • LOCK
          • PREPARE
          • REASSIGN OWNED
          • RESET
          • REVOKE
          • ROLLBACK
          • SELECT
          • SET
          • SET CONSTRAINTS
          • SET ROLE
          • SET SESSION AUTHORIZATION
          • SET TRANSACTION
          • SHOW
          • SHOW TRANSACTION
          • TRUNCATE
          • UPDATE
        • Data types
          • Binary
          • Boolean
          • Character
          • Date-time
          • Json
          • Money
          • Numeric
          • Serial
          • UUID
        • Expressions
          • currval()
          • lastval()
          • nextval()
        • Extensions
        • Keywords
        • Reserved names
      • YCQL
        • Quick start YCQL
        • ALTER KEYSPACE
        • ALTER ROLE
        • ALTER TABLE
        • CREATE INDEX
        • CREATE KEYSPACE
        • CREATE ROLE
        • CREATE TABLE
        • CREATE TYPE
        • DROP INDEX
        • DROP KEYSPACE
        • DROP ROLE
        • DROP TABLE
        • DROP TYPE
        • GRANT PERMISSION
        • GRANT ROLE
        • REVOKE PERMISSION
        • REVOKE ROLE
        • USE
        • INSERT
        • SELECT
        • UPDATE
        • DELETE
        • TRANSACTION
        • TRUNCATE
        • Simple Value
        • Subscript
        • Function Call
        • Operator Call
        • BLOB
        • BOOLEAN
        • MAP, SET, LIST
        • FROZEN
        • INET
        • Integer & Counter
        • Non-Integer
        • TEXT
        • Date & Time Types
        • UUID & TIMEUUID
        • JSONB
        • Date and time Functions
    • CLIs
      • yb-ctl
      • yb-docker-ctl
      • ysqlsh
      • cqlsh
      • yb-admin
    • Configuration
      • YB-TServer
      • YB-Master
      • Default ports
    • Tools
      • DBeaver
      • pgAdmin
      • SQL Workbench/J
      • TablePlus
      • Visual Studio Code
    • Sample data
      • Chinook
      • Northwind
      • PgExercises
      • SportsDB
    • Connectors and drivers
      • PostgreSQL JDBC Driver
      • YugabyteDB JDBC Driver
      • Kafka Connect YugabyteDB
      • Spring Data YugabyteDB
  • RELEASES
    • Release history
      • v2.0.7
      • v2.0.6
      • v2.0.5
      • v2.0.3
      • v2.0.1
      • v2.0.0
      • v1.3.1
      • v1.3.0
      • v1.2.12
      • v1.2.11
      • v1.2.10
      • v1.2.9
      • v1.2.8
      • v1.2.6
      • v1.2.5
      • v1.2.4
  • CONCEPTS
    • Architecture
      • Design goals
      • Key concepts
        • Universe
        • YB-TServer
        • YB-Master
        • Acknowledgements
      • Core functions
        • Universe creation
        • Table creation
        • Write IO path
        • Read IO path
        • High availability
      • Layered architecture
      • Query layer
        • Overview
      • DocDB store
        • Sharding
        • Replication
        • Persistence
        • Performance
      • DocDB transactions
        • Isolation Levels
        • Single row transactions
        • Distributed transactions
        • Transactional IO path
      • Change data capture (CDC)
      • Two data center (2DC) deployments
  • FAQ
    • Comparisons
      • CockroachDB
      • Google Cloud Spanner
      • MongoDB
      • FoundationDB
      • Amazon DynamoDB
      • Azure Cosmos DB
      • Apache Cassandra
      • Redis in-memory store
      • Apache HBase
    • Other FAQs
      • Product
      • Architecture
      • Yugabyte Platform
      • API compatibility
  • CONTRIBUTE
    • Get involved
    • Core database
      • Checklist
      • Building the source
      • Running the Tests
  • Misc
    • YEDIS
      • Quick start
      • Develop
        • Client drivers
          • C
          • C++
          • C#
          • Go
          • Java
          • NodeJS
          • Python
      • API reference
        • APPEND
        • AUTH
        • CONFIG
        • CREATEDB
        • DELETEDB
        • LISTDB
        • SELECT
        • DEL
        • ECHO
        • EXISTS
        • EXPIRE
        • EXPIREAT
        • FLUSHALL
        • FLUSHDB
        • GET
        • GETRANGE
        • GETSET
        • HDEL
        • HEXISTS
        • HGET
        • HGETALL
        • HINCRBY
        • HKEYS
        • HLEN
        • HMGET
        • HMSET
        • HSET
        • HSTRLEN
        • HVALS
        • INCR
        • INCRBY
        • KEYS
        • MONITOR
        • PEXPIRE
        • PEXPIREAT
        • PTTL
        • ROLE
        • SADD
        • SCARD
        • RENAME
        • SET
        • SETEX
        • PSETEX
        • SETRANGE
        • SISMEMBER
        • SMEMBERS
        • SREM
        • STRLEN
        • ZRANGE
        • TSADD
        • TSCARD
        • TSGET
        • TSLASTN
        • TSRANGEBYTIME
        • TSREM
        • TSREVRANGEBYTIME
        • TTL
        • ZADD
        • ZCARD
        • ZRANGEBYSCORE
        • ZREM
        • ZREVRANGE
        • ZSCORE
        • PUBSUB
        • PUBLISH
        • SUBSCRIBE
        • UNSUBSCRIBE
        • PSUBSCRIBE
        • PUNSUBSCRIBE
> Manage > Data migration >

Bulk export


This page documents the options for exporting data out of YugabyteDB.

  • YCQL

This page documents bulk export for YugabyteDB’s Cassandra compatible YCQL API. To export data from a YugabyteDB (or even an Apache Cassandra) table, you can use the cassandra-unloader tool.

We will first create a source YugabyteDB table and populate it with data. Then we will export the data out using the cassandra-unloader tool. We will use a generic gaming user profile use case as a running example to illustrate the export process.

Create Source Table

Following is the schema of the destination YugabyteDB table.

CREATE KEYSPACE load;
USE load;

CREATE TABLE users(
	user_id varchar, 
	score1 double, 
	score2 double,
	points int, 
	object_id varchar,
   PRIMARY KEY (user_id));

Generate Sample Data

# sample usage:
#  To generate a 10GB (10240 MB) file.
#  % python gen_csv.py <outfile_name> <outfile_size_MB>
#  % python gen_csv.py file01.csv 10240
#
import numpy as np
import uuid
import csv
import os
import sys
outfile    = sys.argv[1] # output file name
outsize_mb = int(sys.argv[2])
print("Outfile = " + outfile)
print("Outfile Size (MB) = " + str(outsize_mb))
chunksize = 10000
with open(outfile, 'ab') as csvfile:
    while (os.path.getsize(outfile)//1024**2) < outsize_mb:
        data = [[uuid.uuid4() for i in range(chunksize)],
                np.random.random(chunksize)*1000,
                np.random.random(chunksize)*50,
                np.random.randint(1000000, size=(chunksize,)),
                [uuid.uuid4() for i in range(chunksize)]]
        csvfile.writelines(['%s,%.6f,%.6f,%i,%s\n' % row for row in zip(*data)])

Sample rows generated by script would like the following.

$ head file00.csv
3399bebc-d2cc-40c6-89d4-26102e08ff61,622.491927,40.262305,658257,44d73f8c-1d3c-424e-8fd2-d316c56b8454
4f362eac-f79f-45f6-b6b1-bd5a81f931dc,141.344278,3.024717,694290,7768b010-8411-490a-b523-88cc3ec53cb5
a24a6587-eea4-4907-ac7f-9f99dcac8f82,345.110599,3.869150,510943,5765d1d3-2855-4dbe-9f11-bb3b8631789f
...

To generate 5 CSV files of about 5 GB each, run the following commands.

python ./gen_csv.py file00.csv 5120 &
python ./gen_csv.py file01.csv 5120 &
python ./gen_csv.py file02.csv 5120 &
python ./gen_csv.py file03.csv 5120 &
python ./gen_csv.py file04.csv 5120 &

Load Sample Data

cassandra-loader is a general purpose bulk loader for CQL that supports various types of delimited files (particularly csv files). For more details, review the README of the YugabyteDB cassandra-loader fork. Note that cassandra-loader requires quotes for collection types (e.g. “[1,2,3]” rather than [1,2,3] for lists).

Install cassandra-loader

You can do this as shown below.

$ wget https://github.com/yugabyte/cassandra-loader/releases/download/v0.0.27-yb-2/cassandra-loader
$ chmod a+x cassandra-loader

Run cassandra-loader

The files can be queued up for upload one at a time. Sample invocation:

./cassandra-loader \
    -schema "load.users(user_id, score1, score2, points, object_id)" \
    -boolStyle 1_0 \
    -numFutures 1000 \
    -rate 10000 \
    -queryTimeout 65 \
    -numRetries 10 \
    -progressRate 200000 \
    -host <clusterNodeIP> \
    -f file01.csv

For additional options to cassandra-loader, see here.

Export Data

Install cassandra-unloader

$ wget https://github.com/brianmhess/cassandra-loader/releases/download/v0.0.27/cassandra-unloader
$ chmod a+x cassandra-unloader

Run cassandra-unloader

./cassandra-unloader \
   -schema "load.users(user_id, score1, score2, points, object_id)" \
   -boolStyle 1_0 \
   -host <clusterNodeIP> \
   -f outfile.csv

For additional options to cassandra-unloader, see here.

Manage
Bulk import
Manage
Change cluster configuration
Talk to Community
  • Slack
  • Github
  • Forum
  • StackOverflow
Yugabyte
Contact Us
Copyright © 2017-2019 Yugabyte, Inc. All rights reserved.