import data file reference

yb-voyager import data file

Load data from files in CSV or text format directly to the YugabyteDB database. These data files can be located either on a local filesystem, an AWS S3 bucket, GCS bucket, or an Azure blob. For more details, see Bulk data load from files.

Syntax

Usage: yb-voyager import data file [ <arguments> ... ]

Arguments

The following table lists the valid CLI flags and parameters for import data file command.

When run at the same time, flags take precedence over configuration flag settings.

CLI flag	Config file parameter	Description
--batch-size	`import-data-file: batch-size:`	Size of batches in the number of rows generated for ingestion during import data. Default: 20000 rows
--data-dir	`import-data-file: data-dir:`	Path to the location of the data files to import; this can be a local directory or a URL for a cloud storage location such as an AWS S3 bucket, GCS bucket, or an Azure blob. For more details, see Bulk data load from files.
--delimiter	`import-data-file: delimiter:`	Character used as a delimiter to separate column values in rows of the datafile(s). Default: comma `','` for CSV file format and tab `'\t'` for TEXT file format. Example: `yb-voyager import data file .... --delimiter ','`
--disable-pb	`import-data-file: disable-pb:`	Use this argument to disable progress bar or statistics during data import. Default: false Accepted parameters: true, false, yes, no, 0, 1
--enable-adaptive-parallelism	`import-data-file: enable-adaptive-parallelism:`	Adapt parallelism based on the resource usage (CPU, memory) of the target YugabyteDB cluster. Default: true Accepted parameters: true, false, yes, no, 0, 1
--adaptive-parallelism-max	`import-data-file: adaptive-parallelism-max:`	Number of maximum parallel jobs to use while importing data when adaptive parallelism is enabled. By default, voyager tries to determine the total number of cores `N` and use `N/2` as the maximum parallel jobs.
--enable-upsert	`import-data-file: enable-upsert:`	Enable UPSERT mode on target tables while importing data. Note: Ensure that tables on the target YugabyteDB database do not have secondary indexes. If a table has secondary indexes, setting this flag to true may lead to corruption of the indexes. Default: false Usage for disabling the mode: `yb-voyager import data status ... --enable-upsert false`
--escape-char	`import-data-file: escape-char:`	Escape character Default: double quotes `'"'` Example: `yb-voyager import data file ... --escape-char '"'`
--null-string	`import-data-file: null-string:`	String that represents null values in the datafile. Default: `""` (empty string) for CSV, and `'\N'` for text. Example: `yb-voyager import data file ... --null-string 'NULL'`
--file-table-map	`import-data-file: file-table-map:`	Comma-separated mapping between the files in `--data-dir` argument to the corresponding table in the database. You can import multiple files in one table either by providing one `<fileName>:<tableName>` entry for each file OR by passing a glob expression in place of the file name. Example: `--file-table-map 'fileName1:tableName,fileName2:tableName'` OR `--file-table-map 'fileName*:tableName'`.
--format	`import-data-file: format:`	Format of the data file. One of `csv` or `text`. Default: csv Example: `yb-voyager import data file ... --format text`
--has-header	`import-data-file: has-header:`	For `csv` datafiles, use this argument if the datafile has a header with column names for the table. Default: false Example: `yb-voyager import data file ... --format csv --has-header true` Accepted parameters: true, false, yes, no, 0, 1
--parallel-jobs	`import-data-file: parallel-jobs:`	Number of parallel COPY commands issued to the target database. Depending on the YugabyteDB database configuration, the value of `--parallel-jobs` should be tweaked such that at most 50% of target cores are utilised. Default: If yb-voyager can determine the total number of cores N in the YugabyteDB database cluster, it uses N/2 as the default. Otherwise, it defaults to twice the number of nodes in the cluster.
--quote-char	`import-data-file: quote-char:`	Character used to quote the values. Default: double quotes `'"'` Example: `yb-voyager import data file ... --quote-char '"'`
--use-public-ip	`import-data-file: use-public-ip:`	Use the node public IP addresses to distribute `--parallel-jobs` uniformly on data import. Default: false Note that you may need to configure the YugabyteDB cluster with public IP addresses by setting server-broadcast-addresses. Example: `yb-voyager import data status ... --use-public-ip true` Accepted parameters: true, false, yes, no, 0, 1
--target-endpoints	`import-data-file: target-endpoints:`	Comma-separated list of node endpoints to use for parallel import of data. Default: Use all the nodes in the cluster. For example: "host1:port1,host2:port2" or "host1,host2". Note: use-public-ip flag is ignored if this is used.
--on-primary-key-conflict	`import-data-file: on-primary-key-conflict:`	Action to take on primary key conflict during data import. Accepted parameters: `ERROR` (Default): Import in this mode fails if any primary key conflict is encountered, indicating such conflicts are unexpected. `IGNORE`: Skips rows that have an existing primary key allowing the import to continue for the remaining data.
-e, --export-dir	`export-dir:`	Path to the export directory. This directory is a workspace used to store exported schema DDL files, export data files, migration state, and a log file.
--target-db-name	`target: db-name:`	Target database name.
--target-db-password	`target: db-password:`	Target database password. Alternatively, you can also specify the password by setting the environment variable `TARGET_DB_PASSWORD`. If you don't provide a password via the CLI during any migration phase, yb-voyager will prompt you at runtime for a password. If the password contains special characters that are interpreted by the shell (for example, # and $), enclose the password in single quotes.
--target-db-port	`target: db-port:`	Port number of the target database machine. Default: 5433
--target-db-schema	`target: db-schema:`	Schema name of the target database. MySQL and Oracle migrations only.
--target-db-user	`target: db-user:`	Username of the target database.
--target-db-host	`target: db-host:`	Domain name or IP address of the machine on which the target database server is running. Default: "127.0.0.1"
--target-ssl-cert	`target: ssl-cert:`	Path to a file containing the certificate which is part of the SSL `<cert,key>` pair.
--target-ssl-key	`target: ssl-key:`	Path to a file containing the key which is part of the SSL `<cert,key>` pair.
--target-ssl-crl	`target: ssl-crl:`	Path to a file containing the SSL certificate revocation list (CRL).
--target-ssl-mode	`target: ssl-mode:`	Specify the SSL mode for the target database as one of `disable`, `allow`, `prefer` (default), `require`, `verify-ca`, or `verify-full`.
--target-ssl-root-cert	`target: ssl-root-cert:`	Path to a file containing SSL certificate authority (CA) certificate(s).
--file-opts	—	[Deprecated] Comma-separated string options for CSV file format. Options: `escape_char` - escape character `quote_char` - character used to quote the values Default: double quotes (") for both escape and quote characters Note that escape_char and quote_char are only valid and required for CSV file format. Example: `--file-opts "escape_char \",quote_char \""` or `--file-opts 'escape_char ",quote_char "'`
--skip-replication-checks	`import-data-file: skip-replication-checks:`	It is NOT recommended to have any form of replication (CDC/xCluster) running on the target YugabyteDB cluster during data import as it may lead to a rapid increase in disk use. If detected, data import is aborted. Use this flag to turn off the checks and continue importing data. Default: false Accepted parameters: true, false, yes, no, 0, 1
--send-diagnostics	`send-diagnostics:`	Enable or disable sending diagnostics information to Yugabyte. Default: true Accepted parameters: true, false, yes, no, 0, 1
--truncate-tables	`import-data-file: truncate-tables:`	Truncate tables on target YugabyteDB database before importing data. This option is only valid if `--start-clean` is set to true. Default: false
--error-policy	`import-data-file: error-policy:`	Specifies how to handle errors when processing and importing rows to the target YugabyteDB database. Errors can arise from reading data from file, transforming rows, or ingesting them into YugabyteDB. Accepted parameters: `abort` (Default) - Immediately aborts the process. `stash-and-continue` - Stashes the errored rows to a file and continues the import.
--start-clean	—	Starts a fresh import with data files present in the `data` directory. If there's any non-empty table on the target YugabyteDB database, you get a prompt whether to continue the import without truncating those tables. Note that for cases where a table doesn't have a primary key, it may lead to insertion of duplicate data. In that case, you can avoid the duplication by excluding the table from the `--file-table-map`, or truncating those tables manually before using the `start-clean` flag. Default: false Accepted parameters: true, false, yes, no, 0, 1
-h, --help	—	Command line help.
-y, --yes	—	Answer yes to all prompts during the export schema operation. Default: false

Examples

Configuration file

yb-voyager import data file --config-file <path-to-config-file>

You can specify additional parameters in the import data file section of the configuration file. For more details, refer to the bulk-data-load.yaml template.

CLI

The following examples use the --data-dir argument along with --file-table-map argument and contextually differ based on whether the data is imported from local disk or cloud storage options.

Import data from CSV files using import data file by providing the argument --format csv in the command.

Import data file from local disk

The --data-dir argument is a path to the local directory where all the CSV files are present, and the --file-table-map argument provides a comma-separated mapping between each CSV file in --data-dir to the corresponding table in the database (you can mention case-sensitive table names as well, for example, --file-table-map 'foo.csv:"Foo"'); each file has a header, delimiter as ',' (default), and escape and quote character as '"' (default).

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir /dir/data-dir \
        --file-table-map 'accounts.csv:accounts,transactions.csv:transactions' \
        --format csv \
        --has-header true

Import data file from AWS S3

The --data-dir argument is the AWS S3 URL of the data directory on the S3 bucket where all the CSV files are present, and the --file-table-map argument provides a comma-separated mapping between each CSV file in --data-dir to the corresponding table in the database, where each file has '\t' (tab character) as the delimiter, escape character as '\' and quote character as "'" with no header (default) as demonstrated in the following command:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir s3://abc-mart/data \
        --file-table-map 'orders.csv:"Orders",products.csv:"Products",users.csv:"Users",order-items.csv:"Order_items",payments.csv:"Payments",reviews.csv:"Reviews",categories.csv:"Categories"' \
        --format csv \
        --delimiter '\t' \
        --escape-char '\' \
        --quote-char "'"

Import data file from GCS

The --data-dir argument is the GCS URL of the data directory on the GCS bucket where all the CSV files are present, and the --file-table-map argument provides a comma-separated mapping of each CSV file in --data-dir to the corresponding table in the database, where each file has delimiter as ' ' (white space), escape character as '^', quote character as '"', and null string for null values as 'NULL' with no header (default) as demonstrated in the following command:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir gs://abc-bank/data \
        --file-table-map 'accounts.csv:accounts,transactions.csv:transactions' \
        --format csv \
        --delimiter ' ' \
        --null-string 'NULL' \
        --escape-char '^' \
        --quote-char '"'

Import data file from Azure blob

The --data-dir argument is the Azure blob URL of the data directory on the Azure blob container where all the CSV files are present, and the --file-table-map argument provides a comma-separated mapping of each CSV file in --data-dir to the corresponding table in the database, where each file has delimiter '#', escape character as '%', quote character as '"', and null string for null values as 'null' with no header (default) as demonstrated in the following command:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir https://admin.blob.core.windows.net/air-world/data \
        --file-table-map 'airlines.csv:airlines,airports.csv:airports,flights.csv:flights,passengers.csv:passengers,bookings.csv:booking' \
        --format csv \
        --delimiter '#' \
        --null-string 'null' \
        --escape-char '%' \
        --quote-char '"'

Load multiple files to the same table

Multiple files can be imported in one table (for example, foo1.csv:foo,foo2.csv:foo or foo*.csv:foo).

The --data-dir argument is a path to the local directory where all the CSV files are present, and the --file-table-map argument provides a comma-separated mapping between each CSV file in --data-dir to the corresponding table in the database, where each file has a header, delimiter as ',' (default), and escape and quote character as '"'(default).

Example for each file entry in --file-table-map (foo1.csv:foo,foo2.csv:foo) is as follows:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir /dir/data-dir \
        --file-table-map 'accounts.csv:accounts,transactions1.csv:transactions,transactions2.csv:transactions' \
        --format csv \
        --has-header true

Example for glob expression of files in --file-table-map (foo*.csv:foo) is as follows:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir /dir/data-dir \
        --file-table-map 'accounts.csv:accounts,transactions*.csv:transactions' \
        --format csv \
        --has-header true

Import data from text files using import data file by providing the argument --format text in the command.

Import data file from local disk

The --data-dir argument is a path to the local directory where all the text files are present, and the --file-table-map argument provides a comma-separated mapping between each text file in --data-dir to the corresponding table in the database, where each file has '\t' (default) as a delimiter and '\N' (default) as a null string.

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir /dir/data-dir \
        --file-table-map 'accounts.txt:accounts,transactions.txt:transactions' \
        --format text

Import data file from AWS S3

The --data-dir argument is the AWS S3 URL of the data directory on the S3 bucket where all the text files are present, and the --file-table-map argument provides a comma-separated mapping between each text file in --data-dir to the corresponding table in the database, where each file has ',' (comma) as the delimiter, and null string as 'NULL' for null values as demonstrated in the following command:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir s3://social-media/data \
        --file-table-map 'posts.txt:post,comments.txt:comments,profiles.txt:profiles,likes.txt:likes,messages.txt:messages,followers.txt:followers' \
        --format text \
        --delimiter ',' \
        --null-string 'NULL'

Import data file from GCS

The --data-dir argument is the GCS URL of the data directory on the GCS bucket where all the text files are present, and the --file-table-map argument provides a comma-separated mapping of each text file in --data-dir to the corresponding table in the database (you can mention case-sensitive table names as well), where each file has delimiter as '-' (hyphen) as demonstrated in the following command:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir gs://xyz-hospital/data \
        --file-table-map 'patients.txt:"Patients",doctors.txt:"Doctors",appointments.txt:"Appointments"' \
        --format text \
        --delimiter '-'

Import data file from Azure blob

The --data-dir argument is the Azure blob URL of the data directory on the Azure blob container where all the text files are present, and the --file-table-map argument provides a comma-separated mapping of each text file in --data-dir to the corresponding table in the database, where each file has delimiter '#' (hash character), and null string as 'null' for null values as demonstrated in the following command:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir https://admin.blob.core.windows.net/weather-forecast/data \
        --file-table-map 'locations.txt:locations,weather.txt:weather,data.txt:data,forecasts.txt:forecasts' \
        --format text \
        --delimiter '#' \
        --null-string 'null'

Load multiple files to the same table

Multiple files can be imported in one table (for example, foo1.csv:foo,foo2.csv:foo or foo*.csv:foo).

The --data-dir argument is a path to the local directory where all the text files are present, and the --file-table-map argument provides a comma-separated mapping between each text file in --data-dir to the corresponding table in the database where each file has a header, delimiter as '|' (pipe character).

Example for each file entry in --file-table-map (foo1.txt:foo,foo2.txt:foo) is as follows:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir /dir/academy-data \
        --file-table-map 'students1.txt:students,students2.txt:students,students3.txt:students,courses.txt:courses,instructors1.txt:instructors,instructors2.txt:instructors,enrollments.txt:enrollments,grades.txt:grades' \
        --format text \
        --delimiter '|'

Example for glob expression of files in --file-table-map (foo*.txt:foo) is as follows:

yb-voyager import data file --export-dir /dir/export-dir \
        --target-db-host 127.0.0.1 \
        --target-db-user ybvoyager \
        --target-db-password 'password' \
        --target-db-name target_db \
        --target-db-schema target_schema \
        --data-dir /dir/academy-data \
        --file-table-map 'students*.txt:students,courses.txt:courses,instructors*.txt:instructors,enrollments.txt:enrollments,grades.txt:grades' \
        --format text \
        --delimiter '|'