Join us on
Star us on
Get Started
Slack
GitHub
Get Started
v2.5 (latest) v2.2 (stable) v2.1 (earlier version) v2.0 (earlier version) v1.3 (earlier version)
  • GET STARTED
    • Quick start
      • 1. Install YugabyteDB
      • 2. Create a local cluster
      • 3. Explore YSQL
      • 4. Build an application
        • Java
        • NodeJS
        • Go
        • Python
        • Ruby
        • C#
        • PHP
        • C++
        • C
    • Introduction
    • Explore core
      • 1. Linear scalability
      • 2. Fault tolerance
      • 3. Global distribution
      • 4. Auto sharding
      • 5. Tunable reads
      • 6. Observability
  • USER GUIDES
    • Develop
      • Learn app development
        • 1. SQL vs NoSQL
        • 2. Data modeling
        • 3. Data types
        • 4. ACID transactions
        • 5. Aggregations
        • 6. Batch operations
        • 7. Date and time
        • 8. Strings and text
      • Ecosystem integrations
        • Apache Kafka
        • Apache Spark
        • JanusGraph
        • KairosDB
        • Presto
        • Metabase
      • Real-world examples
        • E-Commerce App
        • IoT Fleet Management
        • Retail Analytics
      • Explore sample applications
    • Deploy
      • Checklist
      • Manual deployment
        • 1. System configuration
        • 2. Install software
        • 3. Start YB-Masters
        • 4. Start YB-TServers
        • 5. Verify deployment
      • Kubernetes
        • Helm Chart
        • Helm configuration
        • Local SSD
      • Docker
      • Public clouds
        • Amazon Web Services
        • Google Cloud Platform
        • Microsoft Azure
      • Pivotal Cloud Foundry
      • Yugabyte Platform
        • 1. Prepare cloud environment
        • 2. Install Admin Console
        • 3. Configure Admin Console
        • 4. Configure Cloud Providers
    • Benchmark
      • Performance
      • YCSB
      • Large datasets
    • Secure
      • Security checklist
      • Authentication
      • Authorization
        • 1. RBAC Model
        • 2. Create Roles
        • 3. Grant permissions
      • TLS encryption
        • 1. Prepare nodes
        • 2. Server-server encryption
        • 3. Client-server encryption
        • 4. Connect to cluster
      • Encryption at Rest
    • Manage
      • Backup and restore
        • Backing up data
        • Restoring data
      • Data migration
        • Bulk import
        • Bulk export
      • Change cluster config
      • Upgrade deployment
      • Diagnostics reporting
      • Yugabyte Platform
        • Create universe - Multi-zone
        • Create universe - Multi-region
        • Edit universe
        • Edit config flags
        • Health checking and alerts
        • Create and edit instance tags
        • Node status and actions
        • Read replicas
        • Back up and restore
        • Upgrade universe
        • Delete universe
    • Troubleshoot
      • Troubleshooting overview
      • Cluster level issues
        • YCQL connection issues
        • YEDIS connection Issues
      • Node level issues
        • Check processes
        • Inspect logs
        • System statistics
      • Yugabyte Platform
        • Troubleshoot universes
  • REFERENCE
    • APIs
      • YSQL
        • Statements
          • ABORT
          • ALTER DATABASE
          • ALTER DOMAIN
          • ALTER TABLE
          • BEGIN
          • COMMENT
          • COMMIT
          • COPY
          • CREATE DATABASE
          • CREATE DOMAIN
          • CREATE INDEX
          • CREATE SCHEMA
          • CREATE SEQUENCE
          • CREATE TABLE
          • CREATE TABLE AS
          • CREATE TYPE
          • CREATE USER
          • CREATE VIEW
          • DEALLOCATE
          • DELETE
          • DROP DATABASE
          • DROP DOMAIN
          • DROP SEQUENCE
          • DROP TABLE
          • DROP TYPE
          • END
          • EXECUTE
          • EXPLAIN
          • GRANT
          • INSERT
          • LOCK
          • PREPARE
          • RESET
          • REVOKE
          • ROLLBACK
          • SELECT
          • SET
          • SET CONSTRAINTS
          • SET TRANSACTION
          • SHOW
          • SHOW TRANSACTION
          • TRUNCATE
          • UPDATE
        • Data types
          • Binary
          • Boolean
          • Character
          • Date-time
          • Json
          • Money
          • Numeric
          • Serial
          • UUID
        • Expressions
          • currval()
          • lastval()
          • nextval()
        • Keywords
        • Reserved Names
      • YCQL
        • Quick Start YCQL
        • ALTER KEYSPACE
        • ALTER ROLE
        • ALTER TABLE
        • CREATE INDEX
        • CREATE KEYSPACE
        • CREATE ROLE
        • CREATE TABLE
        • CREATE TYPE
        • DROP INDEX
        • DROP KEYSPACE
        • DROP ROLE
        • DROP TABLE
        • DROP TYPE
        • GRANT PERMISSION
        • GRANT ROLE
        • REVOKE PERMISSION
        • REVOKE ROLE
        • USE
        • INSERT
        • SELECT
        • UPDATE
        • DELETE
        • TRANSACTION
        • TRUNCATE
        • Simple Value
        • Subscript
        • Function Call
        • Operator Call
        • BLOB
        • BOOLEAN
        • MAP, SET, LIST
        • FROZEN
        • INET
        • Integer & Counter
        • Non-Integer
        • TEXT
        • Date & Time Types
        • UUID & TIMEUUID
        • JSONB
        • Date and time functions
    • CLIs
      • yb-ctl
      • yb-docker-ctl
      • yb-master
      • yb-tserver
      • ysqlsh
      • cqlsh
    • Sample data
      • Chinook
      • Northwind
      • PgExercises
      • SportsDB
    • Tools
      • TablePlus
  • RELEASES
    • Release history
      • v1.3.1
      • v1.3.0
      • v1.2.12
      • v1.2.11
      • v1.2.10
      • v1.2.9
      • v1.2.8
      • v1.2.6
      • v1.2.5
      • v1.2.4
  • CONCEPTS
    • Architecture
      • Design goals
      • Layered architecture
      • Basic concepts
        • Universe
        • YB-TServer
        • YB-Master
        • Acknowledgements
      • Query layer
        • Overview
      • DocDB store
        • Sharding
        • Replication
        • Persistence
        • Performance
      • DocDB transactions
        • Isolation Levels
        • Single row transactions
        • Distributed transactions
        • Transactional IO path
  • FAQ
    • Comparisons
      • CockroachDB
      • Google Cloud Spanner
      • MongoDB
      • FoundationDB
      • Amazon DynamoDB
      • Azure Cosmos DB
      • Apache Cassandra
      • Redis in-memory store
      • Apache HBase
    • Other FAQs
      • Product
      • Architecture
      • Yugabyte Platform
      • API compatibility
  • CONTRIBUTOR GUIDES
    • Get involved
  • Misc
    • YEDIS
      • Quick start
      • Develop
        • Client drivers
          • C
          • C++
          • C#
          • Go
          • Java
          • NodeJS
          • Python
      • API reference
        • APPEND
        • AUTH
        • CONFIG
        • CREATEDB
        • DELETEDB
        • LISTDB
        • SELECT
        • DEL
        • ECHO
        • EXISTS
        • EXPIRE
        • EXPIREAT
        • FLUSHALL
        • FLUSHDB
        • GET
        • GETRANGE
        • GETSET
        • HDEL
        • HEXISTS
        • HGET
        • HGETALL
        • HINCRBY
        • HKEYS
        • HLEN
        • HMGET
        • HMSET
        • HSET
        • HSTRLEN
        • HVALS
        • INCR
        • INCRBY
        • KEYS
        • MONITOR
        • PEXPIRE
        • PEXPIREAT
        • PTTL
        • ROLE
        • SADD
        • SCARD
        • RENAME
        • SET
        • SETEX
        • PSETEX
        • SETRANGE
        • SISMEMBER
        • SMEMBERS
        • SREM
        • STRLEN
        • ZRANGE
        • TSADD
        • TSCARD
        • TSGET
        • TSLASTN
        • TSRANGEBYTIME
        • TSREM
        • TSREVRANGEBYTIME
        • TTL
        • ZADD
        • ZCARD
        • ZRANGEBYSCORE
        • ZREM
        • ZREVRANGE
        • ZSCORE
        • PUBSUB
        • PUBLISH
        • SUBSCRIBE
        • UNSUBSCRIBE
        • PSUBSCRIBE
        • PUNSUBSCRIBE
> Manage > Data migration >

Bulk import

Attention

This page documents an earlier version. Go to the latest (v2.3) version.

Depending on the data volume imported, various bulk import tools can be used to bring data into YugabyteDB.

  • YCQL

This page documents bulk import for YugabyteDB’s Cassandra compatible YCQL API.

We will first export data from existing Apache Cassandra and MySQL tables. Thereafter, we will import the data using the various bulk load options supported by YugabyteDB. We will use a generic IoT timeseries data use case as a running example to illustrate the import process.

Create Destination Table

Following is the schema of the destination YugabyteDB table.

CREATE KEYSPACE example;
USE EXAMPLE;

CREATE TABLE SensorData ( customer_name text, device_id int, ts timestamp, sensor_data map<text, double>, PRIMARY KEY((customer_name, device_id), ts) );

Prepare Source Data

We will prepare a csv (comma-separated values) file where each row of entries must match with the column types declared in the table schema above. Concretely, each comma-separated value must be a valid Cassandra Query Language (CQL) literal for its corresponding type, except the top-level quotes are not needed (e.g. foo rather than 'foo' for strings).

Generate Sample Data

If you do not have the data already available in a database table, you can create sample data for the import using the instructions below.

#!/bin/bash

# Usage: ./generate_data.sh <number_of_rows> <output_filename>
# Example ./generate_data.sh 1000 sample.csv

if [ "$#" -ne 2 ] then echo "Usage: ./generate_data.sh <number_of_rows> <output_filename>" Echo "Example ./generate_data.sh 1000 sample.csv" exit 1 fi

> $2 # clearing file for i in </span>seq <span style="color:#666">1</span> <span style="color:#19177c">$1</span><span style="color:#ba2121"> do echo customer$((i%10)),$((i%3)),2017-11-11 12:30:$((i%60)).000000+0000,&#34;{temp:$i, humidity:$i}&#34; >> $2 done

customer1,1,2017-11-11 12:32:1.000000+0000,"{temp:1, humidity:1}"
customer2,2,2017-11-11 12:32:2.000000+0000,"{temp:2, humidity:2}"
customer3,0,2017-11-11 12:32:3.000000+0000,"{temp:3, humidity:3}"
customer4,1,2017-11-11 12:32:4.000000+0000,"{temp:4, humidity:4}"
customer5,2,2017-11-11 12:32:5.000000+0000,"{temp:5, humidity:5}"
customer6,0,2017-11-11 12:32:6.000000+0000,"{temp:6, humidity:6}"

Export from Apache Cassandra

If you already had the data in an Apache Cassandra table, then use the following command to create a csv file with the data.

cqlsh> COPY example.SensorData TO '/path/to/sample.csv';

Export from MySQL

If you already had the data in a MySQL table named SensorData, then use the following command to create a csv file with the data.

SELECT customer_name, device_id, ts, sensor_data
FROM SensorData 
INTO OUTFILE '/path/to/sample.csv' FIELDS TERMINATED BY ',';

Import Data

These instructions are organized by the size of the input datasets, ranging from small (few MB of data), to medium (GB) to large (TB or larger) datasets.

Small Datasets (MBs)

Cassandra’s CQL Shell provides the COPY FROM (see also COPY TO) command which allows importing data from csv files.

cqlsh> COPY example.SensorData FROM '/path/to/sample.csv';

Medium Datasets (GBs)

cassandra-loader is a general purpose bulk loader for CQL that supports various types of delimited files (particularly csv files). For more details, review the README of the YugabyteDB cassandra-loader fork. Note that cassandra-loader requires quotes for collection types (e.g. “[1,2,3]” rather than [1,2,3] for lists).

Install cassandra-loader

You can do this as shown below.

$ wget https://github.com/yugabyte/cassandra-loader/releases/download/v0.0.27-yb-2/cassandra-loader
$ chmod a+x cassandra-loader

Run cassandra-loader

time ./cassandra-loader \
	-dateFormat 'yyyy-MM-dd HH:mm:ss.SSSSSSX' \
	-f sample.csv \
	-host <clusterNodeIP> \
	-schema "example.SensorData(customer_name, device_id, ts, sensor_data)"

For additional options, refer to the cassandra-loader options.

Large datasets (TBs or larger)

For large datasets that are in the order of terabytes, YugabyteDB's bulk-importer is the tool to be used. Currently, it is supported only for AWS based deployments. Further documentation on this topic will be added soon. Meanwhile, reach out to [email protected] for more details.

Ask our community
  • Slack
  • Github
  • Forum
  • StackOverflow
Yugabyte
Contact Us
Copyright © 2017-2020 Yugabyte, Inc. All rights reserved.