Verify migration YSQL

This page documents the preview version (v2.23). Preview includes features under active development and is for development and testing only. For production, use the stable version (v2024.1). To learn more, see Versioning.

What follows are some steps that you can use to verify that your migration was successful.

Verify database objects

  • Verify that all the tables and indexes have been created in YugabyteDB.
  • Ensure that triggers and constraints are migrated and are working as expected.

Verify row counts for tables

Run a COUNT(*) command to verify that the total number of rows match between the source database and YugabyteDB.

Use a PLPGSQL function to do the following:

  1. Create the following function to print the number of rows in a single table:

    create function
    cnt_rows(schema text, tablename text) returns integer
    as
    $body$
    declare
      result integer;
      query varchar;
    begin
      query := 'SELECT count(1) FROM ' || schema || '.' || tablename;
      execute query into result;
      return result;
    end;
    $body$
    language plpgsql;
    
  2. Run the following command to print the sizes of all tables in the database.

    SELECT cnt_rows(table_schema, table_name)
        FROM information_schema.tables
        WHERE table_schema NOT IN ('pg_catalog', 'information_schema')
        AND table_type='BASE TABLE'
        ORDER BY 3 DESC;
    

The following example shows the output of running the previous example on the Northwind database.

example=# SELECT cnt_rows(table_schema, table_name)
    FROM information_schema.tables
    WHERE table_schema NOT IN ('pg_catalog', 'information_schema')
    AND table_type='BASE TABLE'
    ORDER BY 3 DESC;
 table_schema |       table_name       | cnt_rows
--------------+------------------------+----------
 public       | order_details          |     2155
 public       | orders                 |      830
 public       | customers              |       91
 public       | products               |       77
 public       | territories            |       53
 public       | us_states              |       51
 public       | employee_territories   |       49
 public       | suppliers              |       29
 public       | employees              |        9
 public       | categories             |        8
 public       | shippers               |        6
 public       | region                 |        4
 public       | customer_customer_demo |        0
 public       | customer_demographics  |        0
(14 rows)

Timeouts

The COUNT(*) query may time out in case of large tables. The following two options are recommended for such use cases.

Create a function

Create a function and execute the query using the function which uses an implicit cursor.

CREATE OR REPLACE FUNCTION row_count(tbl regclass)
    RETURNS setof int AS
$func$
DECLARE
    _id int;
BEGIN
    FOR _id IN
        EXECUTE 'SELECT 1 FROM ' || tbl
    LOOP
        RETURN NEXT _id;
    END LOOP;
END
$func$ LANGUAGE plpgsql;

In this case, the query would be:

select count(*) from row_count('tablename');

This query may take some time to complete. You can increase the client-side timeout to something higher, such as 10 minutes, using the YB-TServer flag --client_read_write_timeout_ms=600000.

The following example is another workaround for running COUNT(*) in ysqlsh:

create table test (id int primary key, fname text);
insert into test select i, 'jon' || i from generate_series(1, 1000000) as i;
create table dual (test int);
insert into dual values (1);
explain select count(*) from test cross join dual;
                                QUERY PLAN
---------------------------------------------------------------------------
 Aggregate  (cost=15202.50..15202.51 rows=1 width=8)
   ->  Nested Loop  (cost=0.00..12702.50 rows=1000000 width=0)
         ->  Seq Scan on test  (cost=0.00..100.00 rows=1000 width=0)
         ->  Materialize  (cost=0.00..105.00 rows=1000 width=0)
               ->  Seq Scan on dual  (cost=0.00..100.00 rows=1000 width=0)

yb_hash_code function

Use yb_hash_code() to run different queries that work on different parts of the table and control the parallelism at the application level.

Refer to Distributed parallel queries for additional information on running COUNT(*) on tables using yb_hash_code().