import data file reference
Load data from files in CSV or text format directly to the YugabyteDB database. These data files can be located either on a local filesystem, an AWS S3 bucket, GCS bucket, or an Azure blob. For more details, see Bulk data load from files.
Syntax
Usage: yb-voyager import data file [ <arguments> ... ]
Arguments
The valid arguments for import data file are described in the following table:
Argument | Description/valid options |
---|---|
--batch-size | Size of batches in the number of rows generated for ingestion during import data. Default: 20000 rows |
--data-dir | Path to the location of the data files to import; this can be a local directory or a URL for a cloud storage location such as an AWS S3 bucket, GCS bucket, or an Azure blob. For more details, see Bulk data load from files. |
--delimiter | Character used as a delimiter to separate column values in rows of the datafile(s). Default: comma ',' for CSV file format and tab '\t' for TEXT file format.Example: yb-voyager import data file .... --delimiter ',' |
--disable-pb | Use this argument to disable progress bar or statistics during data import. Default: false Accepted parameters: true, false, yes, no, 0, 1 |
--enable-upsert | Enable UPSERT mode on target tables while importing data. Note: Ensure that tables on the target YugabyteDB database do not have secondary indexes. If a table has secondary indexes, setting this flag to true may lead to corruption of the indexes. Default: false Usage for disabling the mode: yb-voyager import data status ... --enable-upsert false |
--escape-char | Escape character Default: double quotes '"' Example: yb-voyager import data file ... --escape-char '"' |
--file-opts | [Deprecated] Comma-separated string options for CSV file format. Options:
Note that escape_char and quote_char are only valid and required for CSV file format. Example: --file-opts "escape_char \",quote_char \"" or --file-opts 'escape_char ",quote_char "' |
--null-string | String that represents null values in the datafile. Default: "" (empty string) for CSV, and '\N' for text.Example: yb-voyager import data file ... --null-string 'NULL' |
--file-table-map | Comma-separated mapping between the files in --data-dir argument to the corresponding table in the database. You can import multiple files in one table either by providing one <fileName>:<tableName> entry for each file OR by passing a glob expression in place of the file name.Example: --file-table-map 'fileName1:tableName,fileName2:tableName' OR --file-table-map 'fileName*:tableName' . |
--format | Format of the data file. One of csv or text . Default: csv Example: yb-voyager import data file ... --format text |
--has-header | For csv datafiles, use this argument if the datafile has a header with column names for the table. Default: false Example: yb-voyager import data file ... --format csv --has-header true Accepted parameters: true, false, yes, no, 0, 1 |
-e, --export-dir | Path to the export directory. This directory is a workspace used to store exported schema DDL files, export data files, migration state, and a log file. |
-h, --help | Command line help. |
--parallel-jobs | Number of parallel COPY commands issued to the target database. Depending on the YugabyteDB database configuration, the value of --parallel-jobs should be tweaked such that at most 50% of target cores are utilised. Default: If yb-voyager can determine the total number of cores N in the YugabyteDB database cluster, it uses N/2 as the default. Otherwise, it defaults to twice the number of nodes in the cluster. |
--quote-char | Character used to quote the values. Default: double quotes '"' Example: yb-voyager import data file ... --quote-char '"' |
--send‑diagnostics | Enable or disable sending diagnostics information to Yugabyte. Default: true Accepted parameters: true, false, yes, no, 0, 1 |
--start-clean | Starts a fresh import with data files present in the data directory.If there's any non-empty table on the target YugabyteDB database, you get a prompt whether to continue the import without truncating those tables; if you go ahead without truncating, then yb-voyager starts ingesting the data present in the data files with upsert mode. Note that for cases where a table doesn't have a primary key, it may lead to insertion of duplicate data. In that case, you can avoid the duplication by excluding the table from the --file-table-map , or truncating those tables manually before using the start-clean flag.Default: false Accepted parameters: true, false, yes, no, 0, 1 |
--target-db-name | Target database name. |
--target‑db‑password | Target database password. Alternatively, you can also specify the password by setting the environment variable TARGET_DB_PASSWORD . If you don't provide a password via the CLI during any migration phase, yb-voyager will prompt you at runtime for a password. If the password contains special characters that are interpreted by the shell (for example, # and $), enclose the password in single quotes. |
--target-db-port | Port number of the target database machine. Default: 5433 |
--target-db-schema | Schema name of the target database. MySQL and Oracle migrations only. |
--target-db-user | Username of the target database. |
--target-db-host | Domain name or IP address of the machine on which the target database server is running. Default: "127.0.0.1" |
--target-endpoints | Comma-separated list of node endpoints to use for parallel import of data. Default: Use all the nodes in the cluster. For example: "host1:port1,host2:port2" or "host1,host2". Note: use-public-ip flag is ignored if this is used. |
--target-ssl-cert | Path to a file containing the certificate which is part of the SSL <cert,key> pair. |
--target-ssl-key | Path to a file containing the key which is part of the SSL <cert,key> pair. |
--target-ssl-crl | Path to a file containing the SSL certificate revocation list (CRL). |
--target-ssl-mode | Specify the SSL mode for the target database as one of disable , allow , prefer (default), require , verify-ca , or verify-full . |
--target-ssl-root-cert |
Path to a file containing SSL certificate authority (CA) certificate(s). |
--use-public-ip | Use the node public IP addresses to distribute --parallel-jobs uniformly on data import. Default: false Note that you may need to configure the YugabyteDB cluster with public IP addresses by setting server-broadcast-addresses. Example: yb-voyager import data status ... --use-public-ip true Accepted parameters: true, false, yes, no, 0, 1 |
-y, --yes | Answer yes to all prompts during the export schema operation. |
Examples
The following examples use the --data-dir
argument along with --file-table-map
argument and contextually differ based on whether the data is imported from local disk or cloud storage options.
Import data from CSV files using import data file
by providing the argument --format csv
in the command.
Import data file from local disk
The --data-dir
argument is a path to the local directory where all the CSV files are present, and the --file-table-map
argument provides a comma-separated mapping between each CSV file in --data-dir
to the corresponding table in the database (you can mention case-sensitive table names as well, for example, --file-table-map 'foo.csv:"Foo"'
); each file has a header, delimiter as ','
(default), and escape and quote character as '"'
(default).
yb-voyager import data file --export-dir /dir/export-dir \
--target-db-host 127.0.0.1 \
--target-db-user ybvoyager \
--target-db-password 'password' \
--target-db-name target_db \
--target-db-schema target_schema \
--data-dir /dir/data-dir \
--file-table-map 'accounts.csv:accounts,transactions.csv:transactions' \
--format csv \
--has-header true
Import data file from AWS S3
The --data-dir
argument is the AWS S3 URL of the data directory on the S3 bucket where all the CSV files are present, and the --file-table-map
argument provides a comma-separated mapping between each CSV file in --data-dir
to the corresponding table in the database, where each file has '\t'
(tab character) as the delimiter, escape character as '\'
and quote character as "'"
with no header (default) as demonstrated in the following command:
yb-voyager import data file --export-dir /dir/export-dir \
--target-db-host 127.0.0.1 \
--target-db-user ybvoyager \
--target-db-password 'password' \
--target-db-name target_db \
--target-db-schema target_schema \
--data-dir s3://abc-mart/data \
--file-table-map 'orders.csv:"Orders",products.csv:"Products",users.csv:"Users",order-items.csv:"Order_items",payments.csv:"Payments",reviews.csv:"Reviews",categories.csv:"Categories"' \
--format csv \
--delimiter '\t' \
--escape-char '\' \
--quote-char "'"
Import data file from GCS
The --data-dir
argument is the GCS URL of the data directory on the GCS bucket where all the CSV files are present, and the --file-table-map
argument provides a comma-separated mapping of each CSV file in --data-dir
to the corresponding table in the database, where each file has delimiter as ' '
(white space), escape character as '^'
, quote character as '"'
, and null string for null values as 'NULL'
with no header (default) as demonstrated in the following command:
yb-voyager import data file --export-dir /dir/export-dir \
--target-db-host 127.0.0.1 \
--target-db-user ybvoyager \
--target-db-password 'password' \
--target-db-name target_db \
--target-db-schema target_schema \
--data-dir gs://abc-bank/data \
--file-table-map 'accounts.csv:accounts,transactions.csv:transactions' \
--format csv \
--delimiter ' ' \
--null-string 'NULL' \
--escape-char '^' \
--quote-char '"'
Import data file from Azure blob
The --data-dir
argument is the Azure blob URL of the data directory on the Azure blob container where all the CSV files are present, and the --file-table-map
argument provides a comma-separated mapping of each CSV file in --data-dir
to the corresponding table in the database, where each file has delimiter '#'
, escape character as '%'
, quote character as '"'
, and null string for null values as 'null' with no header (default) as demonstrated in the following command:
yb-voyager import data file --export-dir /dir/export-dir \
--target-db-host 127.0.0.1 \
--target-db-user ybvoyager \
--target-db-password 'password' \
--target-db-name target_db \
--target-db-schema target_schema \
--data-dir https://admin.blob.core.windows.net/air-world/data \
--file-table-map 'airlines.csv:airlines,airports.csv:airports,flights.csv:flights,passengers.csv:passengers,bookings.csv:booking' \
--format csv \
--delimiter '#' \
--null-string 'null' \
--escape-char '%' \
--quote-char '"'
Load multiple files to the same table
Multiple files can be imported in one table (for example, foo1.csv:foo,foo2.csv:foo
or foo*.csv:foo
).
The --data-dir
argument is a path to the local directory where all the CSV files are present, and the --file-table-map
argument provides a comma-separated mapping between each CSV file in --data-dir
to the corresponding table in the database, where each file has a header, delimiter as ','
(default), and escape and quote character as '"'
(default).
Example for each file entry in --file-table-map (foo1.csv:foo,foo2.csv:foo
) is as follows:
yb-voyager import data file --export-dir /dir/export-dir \
--target-db-host 127.0.0.1 \
--target-db-user ybvoyager \
--target-db-password 'password' \
--target-db-name target_db \
--target-db-schema target_schema \
--data-dir /dir/data-dir \
--file-table-map 'accounts.csv:accounts,transactions1.csv:transactions,transactions2.csv:transactions' \
--format csv \
--has-header true
Example for glob expression of files in --file-table-map (foo*.csv:foo
) is as follows:
yb-voyager import data file --export-dir /dir/export-dir \
--target-db-host 127.0.0.1 \
--target-db-user ybvoyager \
--target-db-password 'password' \
--target-db-name target_db \
--target-db-schema target_schema \
--data-dir /dir/data-dir \
--file-table-map 'accounts.csv:accounts,transactions*.csv:transactions' \
--format csv \
--has-header true
Import data from text files using import data file
by providing the argument --format text
in the command.
Import data file from local disk
The --data-dir
argument is a path to the local directory where all the text files are present, and the --file-table-map
argument provides a comma-separated mapping between each text file in --data-dir
to the corresponding table in the database, where each file has '\t'
(default) as a delimiter and '\N'
(default) as a null string.
yb-voyager import data file --export-dir /dir/export-dir \
--target-db-host 127.0.0.1 \
--target-db-user ybvoyager \
--target-db-password 'password' \
--target-db-name target_db \
--target-db-schema target_schema \
--data-dir /dir/data-dir \
--file-table-map 'accounts.txt:accounts,transactions.txt:transactions' \
--format text
Import data file from AWS S3
The --data-dir
argument is the AWS S3 URL of the data directory on the S3 bucket where all the text files are present, and the --file-table-map
argument provides a comma-separated mapping between each text file in --data-dir
to the corresponding table in the database, where each file has ','
(comma) as the delimiter, and null string as 'NULL'
for null values as demonstrated in the following command:
yb-voyager import data file --export-dir /dir/export-dir \
--target-db-host 127.0.0.1 \
--target-db-user ybvoyager \
--target-db-password 'password' \
--target-db-name target_db \
--target-db-schema target_schema \
--data-dir s3://social-media/data \
--file-table-map 'posts.txt:post,comments.txt:comments,profiles.txt:profiles,likes.txt:likes,messages.txt:messages,followers.txt:followers' \
--format text \
--delimiter ',' \
--null-string 'NULL'
Import data file from GCS
The --data-dir
argument is the GCS URL of the data directory on the GCS bucket where all the text files are present, and the --file-table-map
argument provides a comma-separated mapping of each text file in --data-dir
to the corresponding table in the database (you can mention case-sensitive table names as well), where each file has delimiter as '-'
(hyphen) as demonstrated in the following command:
yb-voyager import data file --export-dir /dir/export-dir \
--target-db-host 127.0.0.1 \
--target-db-user ybvoyager \
--target-db-password 'password' \
--target-db-name target_db \
--target-db-schema target_schema \
--data-dir gs://xyz-hospital/data \
--file-table-map 'patients.txt:"Patients",doctors.txt:"Doctors",appointments.txt:"Appointments"' \
--format text \
--delimiter '-'
Import data file from Azure blob
The --data-dir
argument is the Azure blob URL of the data directory on the Azure blob container where all the text files are present, and the --file-table-map
argument provides a comma-separated mapping of each text file in --data-dir
to the corresponding table in the database, where each file has delimiter '#'
(hash character), and null string as 'null'
for null values as demonstrated in the following command:
yb-voyager import data file --export-dir /dir/export-dir \
--target-db-host 127.0.0.1 \
--target-db-user ybvoyager \
--target-db-password 'password' \
--target-db-name target_db \
--target-db-schema target_schema \
--data-dir https://admin.blob.core.windows.net/weather-forecast/data \
--file-table-map 'locations.txt:locations,weather.txt:weather,data.txt:data,forecasts.txt:forecasts' \
--format text \
--delimiter '#' \
--null-string 'null'
Load multiple files to the same table
Multiple files can be imported in one table (for example, foo1.csv:foo,foo2.csv:foo
or foo*.csv:foo
).
The --data-dir
argument is a path to the local directory where all the text files are present, and the --file-table-map
argument provides a comma-separated mapping between each text file in --data-dir
to the corresponding table in the database where each file has a header, delimiter as '|'
(pipe character).
Example for each file entry in --file-table-map (foo1.txt:foo,foo2.txt:foo
) is as follows:
yb-voyager import data file --export-dir /dir/export-dir \
--target-db-host 127.0.0.1 \
--target-db-user ybvoyager \
--target-db-password 'password' \
--target-db-name target_db \
--target-db-schema target_schema \
--data-dir /dir/academy-data \
--file-table-map 'students1.txt:students,students2.txt:students,students3.txt:students,courses.txt:courses,instructors1.txt:instructors,instructors2.txt:instructors,enrollments.txt:enrollments,grades.txt:grades' \
--format text \
--delimiter '|'
Example for glob expression of files in --file-table-map (foo*.txt:foo
) is as follows:
yb-voyager import data file --export-dir /dir/export-dir \
--target-db-host 127.0.0.1 \
--target-db-user ybvoyager \
--target-db-password 'password' \
--target-db-name target_db \
--target-db-schema target_schema \
--data-dir /dir/academy-data \
--file-table-map 'students*.txt:students,courses.txt:courses,instructors*.txt:instructors,enrollments.txt:enrollments,grades.txt:grades' \
--format text \
--delimiter '|'