Create indexes and track the progress

Understand how YugabyteDB creates indexes without affecting ongoing queries

You can add a new index to an existing table using the YSQL CREATE INDEX statement. YugabyteDB supports online index backfill, which is enabled by default. Using this feature, you can build indexes on tables that already have data, without affecting other concurrent writes. YugabyteDB also supports the CREATE INDEX NONCONCURRENTLY statement to disable online index backfill.

Tracking index creation

The current state of an index backfill can be viewed by executing the pg_stat_progress_create_index view to report the progress of the CREATE INDEX command execution. The view contains one row for each backend connection that is currently running a CREATE INDEX command, and the row entry is cleared after the completion of the command execution.

The pg_stat_progress_create_index view can provide the following details:

  • Number of rows processed during an index backfill.
  • The current phase of the command is either initializing or backfilling.
  • Index progress report for all the different configurations of an index or index build such as non-concurrent index builds, GIN indexes, partial indexes, and include indexes.

Columns such as lockers_total, lockers_done, current_locker_pid, blocks_total, and blocks_done in the pg_stat_progress_create_index view do not apply to YugabyteDB and always have null values.

Example

The following example demonstrates the possible phases (initializing, backfilling) for the CREATE INDEX operation using the pg_stat_progress_create_index view.

Before you start

The examples will run on any YugabyteDB universe.
To create a universe, see Set up YugabyteDB universe.

  1. From your local YugabyteDB installation directory, connect to the YSQL shell, and create an index on an existing table as follows:

    CREATE TABLE test(id int);
    
  2. Populate the table with some data.

    INSERT INTO test(id) SELECT n FROM generate_series(1,1000000) AS n;
    
  3. On a separate parallel YSQL connection on the same node, select from the view to see the progress of the command in a repeated fashion with \watch 1 as follows:

    SELECT tbl.relname as tblname, idx.relname as indexname, command, phase, tuples_total, tuples_done
    FROM pg_stat_progress_create_index
    INNER JOIN pg_class as tbl on tbl.oid = relid
    INNER JOIN pg_class as idx on idx.oid = index_relid
    where tbl.relname = 'test';
    \watch 1
    

    Initially, you will see an empty output like this.

       tblname | indexname | command | phase | tuples_total | tuples_done
      ---------+-----------+---------+-------+--------------+-------------
    
  4. Now create an index on the id column.

    CREATE INDEX idx_id ON test(id);
    
  5. On the other terminal, you will see the progress on the index creation first with the first phase, initializing as:

     tblname | indexname |          command          |    phase     | tuples_total | tuples_done
    ---------+-----------+---------------------------+--------------+--------------+-------------
     test    | idx_id    | CREATE INDEX CONCURRENTLY | initializing |            0 |           0
    

    And then you will see the index backfilling happen as:

      tblname | indexname |          command          |    phase    | tuples_total | tuples_done
     ---------+-----------+---------------------------+-------------+--------------+-------------
      test    | idx_id    | CREATE INDEX CONCURRENTLY | backfilling |            0 |           0
    

    You will see the tuples_done count increasing as the backfilling progresses. When the backfilling is done, you will see the tuples_done count to be updated correctly.

      tblname | indexname |          command          |    phase    | tuples_total | tuples_done
     ---------+-----------+---------------------------+-------------+--------------+-------------
      test    | idx_id    | CREATE INDEX CONCURRENTLY | backfilling |            0 |     1000000
    

Memory usage

Backfilling consumes some amount of memory. The memory consumption is directly proportional to the data size per-row, the number of write operations batched together, and the number of parallel backfills. You can view the approximate memory usage by executing the following SQL statement via ysqlsh.

SELECT
 indexname, tablet, now()-backend_start started, pg_size_pretty(allocated_mem_bytes) mem, pg_size_pretty(rss_mem_bytes) rss
 FROM (
  SELECT
   allocated_mem_bytes, rss_mem_bytes, backend_start, query_start, query
   ,regexp_replace(query,'(BACKFILL INDEX) ([0-9]*) .* PARTITION (.*);','\2')::oid  AS indexoid
   ,regexp_replace(query,'(BACKFILL INDEX) ([0-9]*) .* PARTITION (.*);','\3')::text AS tablet
  FROM pg_stat_activity
  WHERE query LIKE 'BACKFILL INDEX %'
 ) pg_stat_activity
 LEFT OUTER JOIN ( SELECT
  oid AS indexoid, relname AS indexname FROM pg_class
 ) pg_class
 USING(indexoid);

This should give you an output similar to the following when an index is being backfilled.

 indexname | tablet  |     started     |  mem  |  rss
-----------+---------+-----------------+-------+-------
 idx_id    | x'aaaa' | 00:00:04.895382 | 27 MB | 25 MB

Caveats

  • In YugabyteDB, the pg_stat_progress_create_index view is local; it only has entries for CREATE INDEX commands issued by local YSQL clients.

  • In PostgreSQL, tuples_done and tuples_total refer to the tuples of the index. However, in YugabyteDB, these fields refer to the tuples of the indexed table. This discrepancy is only observed for partial indexes, where the reported progress is less than the actual progress. tuples_total is an estimate that is retrieved from pg_class.reltuples.

  • In YugabyteDB, tuples_done and tuples_total are not displayed (set to null) for temporary indexes.

Learn more