Primary keys
This page documents the preview version (v2.23). Preview includes features under active development and is for development and testing only. For production, use the stable version (v2024.1). To learn more, see Versioning.
The Primary Key constraint is a means to uniquely identify a specific row in a table via one or more columns. To define a primary key, you create a constraint that is, functionally, a unique index applied to the table columns. It is crucial to choose and design the primary key of the table for several reasons.
- Uniqueness. The Primary key is a column or a set of columns that act as the unique identifier for the rows of the table. This is essential to identify a row uniquely across the different nodes in the cluster.
- Data distribution. In YugabyteDB, data is distributed based on the primary key. In hash sharding, the data is distributed based on the hash of the Primary key. In range sharding, it is based on the actual value of the primary key.
- Data ordering. The table data is internally ordered based on the primary key of the table. In hash sharding, the data is ordered based on the hash of the Primary key. In range sharding, it is ordered on the actual value of the primary key.
Definition of the primary key
Primary keys can be defined when the table is defined using the PRIMARY KEY (columns)
clause. They can also be added after table creation using the ALTER TABLE statement, but this is not recommended as adding a primary key after data has been loaded could be an expensive operation to re-order and re-distribute the data.
In hash sharding the primary key definition has the following format:
PRIMARY KEY ((columns), columns)
-- [SHARDING] [CLUSTERING]
The first set of columns, typically referred to as sharding columns, is used for the distribution of the rows. The second set of columns, referred to as Clustering columns, defines the ordering of rows with the same sharding values.
In range sharding, the primary key has the following format:
PRIMARY KEY (columns)
-- [CLUSTERING]
The order of the keys matters a lot in range sharding. The data is distributed and ordered based on the first column, and for rows with the same first column, the rows are ordered on the second column, and so on.
Syntax and examples
Before you start
The examples will run on any YugabyteDB universe.
To create a universe, see Set up YugabyteDB universe.
Primary key for a single column
Most commonly, the primary key is added to the table when the table is created, as demonstrated by the following syntax:
CREATE TABLE (
column1 data_type PRIMARY KEY,
column2 data_type,
column3 data_type,
…
);
The following example creates the employees
table with employee_no
as the primary key, which uniquely identifies an employee.
CREATE TABLE employees (
employee_no integer PRIMARY KEY,
name text,
department text
);
Primary key over multiple columns
The following syntax can be used to define a primary key for more than one column:
CREATE TABLE (
column1 data_type,
column2 data_type,
column3 data_type,
…
PRIMARY KEY (column1, column2)
);
The following example creates the employees
table in which the primary key is a combination of employee_no
and name
columns:
CREATE TABLE employees (
employee_no integer,
name text,
department text,
PRIMARY KEY (employee_no, name)
);
CONSTRAINT
YSQL assigns a default name, in the format tablename_pkey
, to the primary key constraint. For example, the default name is employees_pkey
for the employees
table. If you need a different name, you can specify it using the CONSTRAINT
clause, as per the following syntax:
CONSTRAINT constraint_name PRIMARY KEY(column1, column2, ...);
The following example demonstrates the use of CONSTRAINT
to change the default name:
CONSTRAINT employee_no_pkey PRIMARY KEY(employee_no);
ALTER TABLE
Use the ALTER TABLE
statement to create a primary key on an existing table with following syntax:
ALTER TABLE table_name ADD PRIMARY KEY (column1, column2);
The following example creates the employees
table first and then alters it to add a primary key on the employee_no
column:
CREATE TABLE employees (
employee_no integer,
name text,
department text
);
ALTER TABLE employees ADD PRIMARY KEY (employee_no);
The following example allows you to add an auto-incremented primary key to a new column on an existing table using the GENERATED ALWAYS AS IDENTITY
property:
CREATE TABLE sample(c1 text, c2 text);
ALTER TABLE sample ADD COLUMN ID INTEGER;
ALTER TABLE sample ALTER COLUMN ID set NOT NULL;
ALTER TABLE sample ALTER COLUMN ID ADD GENERATED ALWAYS AS IDENTITY;
ALTER TABLE sample ADD CONSTRAINT sample_id_pk PRIMARY KEY (ID);
Insert values into the sample
table and check the contents.
yb_demo=# INSERT INTO sample(id, c1, c2)
VALUES (1, 'cat' , 'kitten'),
(2, 'dog' , 'puppy'),
(3, 'duck' , 'duckling');
yb_demo=# SELECT * FROM sample;
c1 | c2 | id
--------+-----------+----
cat | kitten | 1
dog | puppy | 2
duck | duckling | 3
(3 rows)
Trying to insert values for id
into the sample
results in the following error, as the auto-increment property is now set.
yb_demo=# INSERT INTO sample(id, c1, c2)
VALUES (4, 'cow' , 'calf'),
(5, 'lion', 'cub');
ERROR: cannot insert into column "id"
DETAIL: Column "id" is an identity column defined as GENERATED ALWAYS.
HINT: Use OVERRIDING SYSTEM VALUE to override.