Verify migration YSQL
This page documents the preview version (v2.23). Preview includes features under active development and is for development and testing only. For production, use the stable version (v2024.1). To learn more, see Versioning.
What follows are some steps that you can use to verify that your migration was successful.
Verify database objects
- Verify that all the tables and indexes have been created in YugabyteDB.
- Ensure that triggers and constraints are migrated and are working as expected.
Verify row counts for tables
Run a COUNT(*)
command to verify that the total number of rows match between the source database and YugabyteDB.
Use a PLPGSQL function to do the following:
-
Create the following function to print the number of rows in a single table:
create function cnt_rows(schema text, tablename text) returns integer as $body$ declare result integer; query varchar; begin query := 'SELECT count(1) FROM ' || schema || '.' || tablename; execute query into result; return result; end; $body$ language plpgsql;
-
Run the following command to print the sizes of all tables in the database.
SELECT cnt_rows(table_schema, table_name) FROM information_schema.tables WHERE table_schema NOT IN ('pg_catalog', 'information_schema') AND table_type='BASE TABLE' ORDER BY 3 DESC;
The following example shows the output of running the previous example on the Northwind database.
example=# SELECT cnt_rows(table_schema, table_name)
FROM information_schema.tables
WHERE table_schema NOT IN ('pg_catalog', 'information_schema')
AND table_type='BASE TABLE'
ORDER BY 3 DESC;
table_schema | table_name | cnt_rows
--------------+------------------------+----------
public | order_details | 2155
public | orders | 830
public | customers | 91
public | products | 77
public | territories | 53
public | us_states | 51
public | employee_territories | 49
public | suppliers | 29
public | employees | 9
public | categories | 8
public | shippers | 6
public | region | 4
public | customer_customer_demo | 0
public | customer_demographics | 0
(14 rows)
Timeouts
The COUNT(*)
query may time out in case of large tables. The following two options are recommended for such use cases.
Create a function
Create a function and execute the query using the function which uses an implicit cursor.
CREATE OR REPLACE FUNCTION row_count(tbl regclass)
RETURNS setof int AS
$func$
DECLARE
_id int;
BEGIN
FOR _id IN
EXECUTE 'SELECT 1 FROM ' || tbl
LOOP
RETURN NEXT _id;
END LOOP;
END
$func$ LANGUAGE plpgsql;
In this case, the query would be:
select count(*) from row_count('tablename');
This query may take some time to complete. You can increase the client-side timeout to something higher, such as 10 minutes, using the YB-TServer flag --client_read_write_timeout_ms=600000
.
The following example is another workaround for running COUNT(*)
in ysqlsh:
create table test (id int primary key, fname text);
insert into test select i, 'jon' || i from generate_series(1, 1000000) as i;
create table dual (test int);
insert into dual values (1);
explain select count(*) from test cross join dual;
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=15202.50..15202.51 rows=1 width=8)
-> Nested Loop (cost=0.00..12702.50 rows=1000000 width=0)
-> Seq Scan on test (cost=0.00..100.00 rows=1000 width=0)
-> Materialize (cost=0.00..105.00 rows=1000 width=0)
-> Seq Scan on dual (cost=0.00..100.00 rows=1000 width=0)
yb_hash_code function
Use yb_hash_code()
to run different queries that work on different parts of the table and control the parallelism at the application level.
Refer to Distributed parallel queries for additional information on running COUNT(*)
on tables using yb_hash_code()
.