import data file reference

yb-voyager import data file

Load data from files in CSV or text format directly to the YugabyteDB database. These data files can be located either on a local filesystem, an AWS S3 bucket, GCS bucket, or an Azure blob. For more details, see Bulk data load from files.

Syntax

Usage: yb-voyager import data file [ <arguments> ... ]

Arguments

The following table lists the valid CLI flags and parameters for import data file command.

When run at the same time, flags take precedence over configuration flag settings.

CLI flag
Config file parameter Description
--batch-size
import-data-file:
  batch-size:
Size of batches in the number of rows generated for ingestion during import data.
Default: 20000 rows
--data-dir
import-data-file:
  data-dir:
Path to the location of the data files to import; this can be a local directory or a URL for a cloud storage location such as an AWS S3 bucket, GCS bucket, or an Azure blob. For more details, see Bulk data load from files.
--delimiter
import-data-file:
  delimiter:
Character used as a delimiter to separate column values in rows of the datafile(s).
Default: comma ',' for CSV file format and tab '\t' for TEXT file format.
Example: yb-voyager import data file .... --delimiter ','
--disable-pb
import-data-file:
  disable-pb:
Use this argument to disable progress bar or statistics during data import.
Default: false
Accepted parameters: true, false, yes, no, 0, 1
--enable-adaptive-parallelism
import-data-file:
  enable-adaptive-parallelism:
Adapt parallelism based on the resource usage (CPU, memory) of the target YugabyteDB cluster.
Default: true
Accepted parameters: true, false, yes, no, 0, 1
--adaptive-parallelism-max
import-data-file:
  adaptive-parallelism-max:
Number of maximum parallel jobs to use while importing data when adaptive parallelism is enabled. By default, voyager tries to determine the total number of cores N and use N/2 as the maximum parallel jobs.
--enable-upsert
import-data-file:
  enable-upsert:
Enable UPSERT mode on target tables while importing data.
Note: Ensure that tables on the target YugabyteDB database do not have secondary indexes. If a table has secondary indexes, setting this flag to true may lead to corruption of the indexes.
Default: false
Usage for disabling the mode: yb-voyager import data status ... --enable-upsert false
--escape-char
import-data-file:
  escape-char:
Escape character
Default: double quotes '"'
Example: yb-voyager import data file ... --escape-char '"'
--null-string
import-data-file:
  null-string:
String that represents null values in the datafile.
Default: "" (empty string) for CSV, and '\N' for text.
Example: yb-voyager import data file ... --null-string 'NULL'
--file-table-map
import-data-file:
  file-table-map:
Comma-separated mapping between the files in --data-dir argument to the corresponding table in the database. You can import multiple files in one table either by providing one <fileName>:<tableName> entry for each file OR by passing a glob expression in place of the file name.
Example: --file-table-map 'fileName1:tableName,fileName2:tableName' OR --file-table-map 'fileName*:tableName'.
--format
import-data-file:
  format:
Format of the data file. One of csv or text.
Default: csv
Example: yb-voyager import data file ... --format text
--has-header
import-data-file:
  has-header:
For csv datafiles, use this argument if the datafile has a header with column names for the table.
Default: false
Example: yb-voyager import data file ... --format csv --has-header true
Accepted parameters: true, false, yes, no, 0, 1
--parallel-jobs
import-data-file:
  parallel-jobs:
Number of parallel COPY commands issued to the target database. Depending on the YugabyteDB database configuration, the value of --parallel-jobs should be tweaked such that at most 50% of target cores are utilised.
Default: If yb-voyager can determine the total number of cores N in the YugabyteDB database cluster, it uses N/2 as the default. Otherwise, it defaults to twice the number of nodes in the cluster.
--quote-char
import-data-file:
  quote-char:
Character used to quote the values.
Default: double quotes '"'
Example: yb-voyager import data file ... --quote-char '"'
--use-public-ip
import-data-file:
  use-public-ip:
Use the node public IP addresses to distribute --parallel-jobs uniformly on data import.
Default: false
Note that you may need to configure the YugabyteDB cluster with public IP addresses by setting server-broadcast-addresses.
Example: yb-voyager import data status ... --use-public-ip true
Accepted parameters: true, false, yes, no, 0, 1
--target-endpoints
import-data-file:
  target-endpoints:
Comma-separated list of node endpoints to use for parallel import of data.
Default: Use all the nodes in the cluster. For example: "host1:port1,host2:port2" or "host1,host2". Note: use-public-ip flag is ignored if this is used.
-e, --export-dir
  export-dir:
Path to the export directory. This directory is a workspace used to store exported schema DDL files, export data files, migration state, and a log file.
--target-db-name
target:
  db-name:
Target database name.
--target-db-password
target:
  db-password:
Target database password. Alternatively, you can also specify the password by setting the environment variable TARGET_DB_PASSWORD. If you don't provide a password via the CLI during any migration phase, yb-voyager will prompt you at runtime for a password. If the password contains special characters that are interpreted by the shell (for example, # and $), enclose the password in single quotes.
--target-db-port
target:
  db-port:
Port number of the target database machine.
Default: 5433
--target-db-schema
target:
  db-schema:
Schema name of the target database. MySQL and Oracle migrations only.
--target-db-user
target:
  db-user:
Username of the target database.
--target-db-host
target:
  db-host:
Domain name or IP address of the machine on which the target database server is running.
Default: "127.0.0.1"
--target-ssl-cert
target:
  ssl-cert:
Path to a file containing the certificate which is part of the SSL <cert,key> pair.
--target-ssl-key
target:
  ssl-key:
Path to a file containing the key which is part of the SSL <cert,key> pair.
--target-ssl-crl
target:
  ssl-crl:
Path to a file containing the SSL certificate revocation list (CRL).
--target-ssl-mode
target:
  ssl-mode:
Specify the SSL mode for the target database as one of disable, allow, prefer (default), require, verify-ca, or verify-full.
--target-ssl-root-cert
target:
  ssl-root-cert:
Path to a file containing SSL certificate authority (CA) certificate(s).
--file-opts [Deprecated] Comma-separated string options for CSV file format.
Options:
  • escape_char - escape character
  • quote_char - character used to quote the values
Default: double quotes (") for both escape and quote characters
Note that escape_char and quote_char are only valid and required for CSV file format.
Example: --file-opts "escape_char \",quote_char \"" or --file-opts 'escape_char ",quote_char "'
--skip-replication-checks
import-data-file:
  skip-replication-checks:
It is NOT recommended to have any form of replication (CDC/xCluster) running on the target YugabyteDB cluster during data import as it may lead to a rapid increase in disk use. If detected, data import is aborted. Use this flag to turn off the checks and continue importing data.
Default: false
Accepted parameters: true, false, yes, no, 0, 1
--send-diagnostics
send-diagnostics:
Enable or disable sending diagnostics information to Yugabyte.
Default: true
Accepted parameters: true, false, yes, no, 0, 1
--truncate-tables
import-data-file:
  truncate-tables:
Truncate tables on target YugabyteDB database before importing data. This option is only valid if --start-clean is set to true.
Default: false
--start-clean Starts a fresh import with data files present in the data directory.
If there's any non-empty table on the target YugabyteDB database, you get a prompt whether to continue the import without truncating those tables.
Note that for cases where a table doesn't have a primary key, it may lead to insertion of duplicate data. In that case, you can avoid the duplication by excluding the table from the --file-table-map, or truncating those tables manually before using the start-clean flag.
Default: false
Accepted parameters: true, false, yes, no, 0, 1
-h, --help Command line help.
-y, --yes Answer yes to all prompts during the export schema operation.
Default: false

Examples

Configuration file

yb-voyager import data file --config-file <path-to-config-file>

You can specify additional parameters in the import data file section of the configuration file. For more details, refer to the bulk-data-load.yaml template.

CLI

The following examples use the --data-dir argument along with --file-table-map argument and contextually differ based on whether the data is imported from local disk or cloud storage options.

Import data from CSV files using import data file by providing the argument --format csv in the command.

Import data file from local disk

The --data-dir argument is a path to the local directory where all the CSV files are present, and the --file-table-map argument provides a comma-separated mapping between each CSV file in --data-dir to the corresponding table in the database (you can mention case-sensitive table names as well, for example, --file-table-map 'foo.csv:"Foo"'); each file has a header, delimiter as ',' (default), and escape and quote character as '"' (default).

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir /dir/data-dir \
        --file-table-map 'accounts.csv:accounts,transactions.csv:transactions' \
        --format csv \
        --has-header true

Import data file from AWS S3

The --data-dir argument is the AWS S3 URL of the data directory on the S3 bucket where all the CSV files are present, and the --file-table-map argument provides a comma-separated mapping between each CSV file in --data-dir to the corresponding table in the database, where each file has '\t' (tab character) as the delimiter, escape character as '\' and quote character as "'" with no header (default) as demonstrated in the following command:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir s3://abc-mart/data \
        --file-table-map 'orders.csv:"Orders",products.csv:"Products",users.csv:"Users",order-items.csv:"Order_items",payments.csv:"Payments",reviews.csv:"Reviews",categories.csv:"Categories"' \
        --format csv \
        --delimiter '\t' \
        --escape-char '\' \
        --quote-char "'"

Import data file from GCS

The --data-dir argument is the GCS URL of the data directory on the GCS bucket where all the CSV files are present, and the --file-table-map argument provides a comma-separated mapping of each CSV file in --data-dir to the corresponding table in the database, where each file has delimiter as ' ' (white space), escape character as '^', quote character as '"', and null string for null values as 'NULL' with no header (default) as demonstrated in the following command:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir gs://abc-bank/data \
        --file-table-map 'accounts.csv:accounts,transactions.csv:transactions' \
        --format csv \
        --delimiter ' ' \
        --null-string 'NULL' \
        --escape-char '^' \
        --quote-char '"'

Import data file from Azure blob

The --data-dir argument is the Azure blob URL of the data directory on the Azure blob container where all the CSV files are present, and the --file-table-map argument provides a comma-separated mapping of each CSV file in --data-dir to the corresponding table in the database, where each file has delimiter '#', escape character as '%', quote character as '"', and null string for null values as 'null' with no header (default) as demonstrated in the following command:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir https://admin.blob.core.windows.net/air-world/data \
        --file-table-map 'airlines.csv:airlines,airports.csv:airports,flights.csv:flights,passengers.csv:passengers,bookings.csv:booking' \
        --format csv \
        --delimiter '#' \
        --null-string 'null' \
        --escape-char '%' \
        --quote-char '"'

Load multiple files to the same table

Multiple files can be imported in one table (for example, foo1.csv:foo,foo2.csv:foo or foo*.csv:foo).

The --data-dir argument is a path to the local directory where all the CSV files are present, and the --file-table-map argument provides a comma-separated mapping between each CSV file in --data-dir to the corresponding table in the database, where each file has a header, delimiter as ',' (default), and escape and quote character as '"'(default).

Example for each file entry in --file-table-map (foo1.csv:foo,foo2.csv:foo) is as follows:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir /dir/data-dir \
        --file-table-map 'accounts.csv:accounts,transactions1.csv:transactions,transactions2.csv:transactions' \
        --format csv \
        --has-header true

Example for glob expression of files in --file-table-map (foo*.csv:foo) is as follows:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir /dir/data-dir \
        --file-table-map 'accounts.csv:accounts,transactions*.csv:transactions' \
        --format csv \
        --has-header true

Import data from text files using import data file by providing the argument --format text in the command.

Import data file from local disk

The --data-dir argument is a path to the local directory where all the text files are present, and the --file-table-map argument provides a comma-separated mapping between each text file in --data-dir to the corresponding table in the database, where each file has '\t' (default) as a delimiter and '\N' (default) as a null string.

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir /dir/data-dir \
        --file-table-map 'accounts.txt:accounts,transactions.txt:transactions' \
        --format text

Import data file from AWS S3

The --data-dir argument is the AWS S3 URL of the data directory on the S3 bucket where all the text files are present, and the --file-table-map argument provides a comma-separated mapping between each text file in --data-dir to the corresponding table in the database, where each file has ',' (comma) as the delimiter, and null string as 'NULL' for null values as demonstrated in the following command:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir s3://social-media/data \
        --file-table-map 'posts.txt:post,comments.txt:comments,profiles.txt:profiles,likes.txt:likes,messages.txt:messages,followers.txt:followers' \
        --format text \
        --delimiter ',' \
        --null-string 'NULL'

Import data file from GCS

The --data-dir argument is the GCS URL of the data directory on the GCS bucket where all the text files are present, and the --file-table-map argument provides a comma-separated mapping of each text file in --data-dir to the corresponding table in the database (you can mention case-sensitive table names as well), where each file has delimiter as '-' (hyphen) as demonstrated in the following command:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir gs://xyz-hospital/data \
        --file-table-map 'patients.txt:"Patients",doctors.txt:"Doctors",appointments.txt:"Appointments"' \
        --format text \
        --delimiter '-'

Import data file from Azure blob

The --data-dir argument is the Azure blob URL of the data directory on the Azure blob container where all the text files are present, and the --file-table-map argument provides a comma-separated mapping of each text file in --data-dir to the corresponding table in the database, where each file has delimiter '#' (hash character), and null string as 'null' for null values as demonstrated in the following command:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir https://admin.blob.core.windows.net/weather-forecast/data \
        --file-table-map 'locations.txt:locations,weather.txt:weather,data.txt:data,forecasts.txt:forecasts' \
        --format text \
        --delimiter '#' \
        --null-string 'null'

Load multiple files to the same table

Multiple files can be imported in one table (for example, foo1.csv:foo,foo2.csv:foo or foo*.csv:foo).

The --data-dir argument is a path to the local directory where all the text files are present, and the --file-table-map argument provides a comma-separated mapping between each text file in --data-dir to the corresponding table in the database where each file has a header, delimiter as '|' (pipe character).

Example for each file entry in --file-table-map (foo1.txt:foo,foo2.txt:foo) is as follows:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir /dir/academy-data \
        --file-table-map 'students1.txt:students,students2.txt:students,students3.txt:students,courses.txt:courses,instructors1.txt:instructors,instructors2.txt:instructors,enrollments.txt:enrollments,grades.txt:grades' \
        --format text \
        --delimiter '|'

Example for glob expression of files in --file-table-map (foo*.txt:foo) is as follows:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir /dir/academy-data \
        --file-table-map 'students*.txt:students,courses.txt:courses,instructors*.txt:instructors,enrollments.txt:enrollments,grades.txt:grades' \
        --format text \
        --delimiter '|'