Locality-optimized geo-partitioning

Geo partition for compliance in Multi-Active global applications

Data residency laws (such as the GDPR) require data of citizens or residents to be collected, processed, and stored inside the country. Multi-national businesses must operate under local data regulations that dictate how the data of a nation's residents must be stored inside its borders. Re-architecting your storage layer and applications to support these regulations could be a very daunting task.

Suppose you want to store data of users from different countries (for example, US, Germany, India) in the same table. To comply with national data-protection laws, or to reduce latency for the users in those countries, you can store the rows in their respective regions, similar to the following illustration.

Application instances are active in all regions, do consistent reads, and data is partitioned but applications operate on the entire dataset.

User data stored within their country's boundaries

To do this, YugabyteDB supports Row-level geo-partitioning. This combines two well-known PostgreSQL concepts, partitioning, and tablespaces.

Setup

To set up a local universe, refer to Set up a local YugabyteDB universe.

Setup

To set up a cluster, refer to Set up a YugabyteDB Aeon cluster.

Setup

To create a universe, refer to Create a multi-zone universe.

Suppose you want to store the data of users from the US and Europe on their respective continents. For this, you set up a replication factor 3 cluster spread across 2 geographies, US and Europe. To make sure each partition acts like a global database, you opt for 3 regions in each of the geographies.

RF3 cluster spanning 2 regions

Partition your data

For this example, create a table of users that you are going to partition by the geo field.

CREATE TABLE users (
    id INTEGER NOT NULL,
    geo VARCHAR,
) PARTITION BY LIST (geo);

Partition your data for users in the US and Europe to ensure that the application in us-east uses the us partition, and the application in eu-west uses the europe partition.

The tablespace definitions are discussed in the next section.

--  US partition table
CREATE TABLE us PARTITION OF users (
   id, geo, PRIMARY KEY (id HASH, geo)
) FOR VALUES IN ('us') TABLESPACE us;

--  Europe partition table
CREATE TABLE eu PARTITION OF users (
   id, geo, PRIMARY KEY (id HASH, geo)
) FOR VALUES IN ('eu') TABLESPACE eu;

Partition your database

Replica placement

Configure your us partition leader preference to place the leader in us-east-1, one replica in us-east-1 (nearby region) and the other replica in us-east-2. Placing one replica in the same region as the leader ensures that the local replica is up-to-date and will be quickly promoted to leader in case the leader fails. This way , all the replicas of the US users are located in the US regions.

--  tablespace for us data
CREATE TABLESPACE us WITH (
    replica_placement='{"num_replicas": 3,
    "placement_blocks":[
        {"cloud":"aws","region":"us-east-1","zone":"us-east-1a","min_num_replicas":1,"leader_preference":1},
        {"cloud":"aws","region":"us-east-2","zone":"us-east-1c","min_num_replicas":1,"leader_preference":2},
        {"cloud":"aws","region":"us-central-1","zone":"us-east-2b","min_num_replicas":1,"leader_preference":3}
        ]}'
);

Place US replicas

Similarly, set up your europe partitions in eu-west-1 and eu-west-2.

--  tablespace for Europe data
CREATE TABLESPACE eu WITH (
    replica_placement='{"num_replicas": 3,
    "placement_blocks":[
        {"cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","min_num_replicas":1,"leader_preference":1},
        {"cloud":"aws","region":"eu-west-2","zone":"eu-west-1c","min_num_replicas":1,"leader_preference":2},
        {"cloud":"aws","region":"eu-central-1","zone":"eu-west-2a","min_num_replicas":1,"leader_preference":3}
    ]}'
);

Place EU replicas

This ensures all the EU user data is located in Europe.

Low latency

Consider the US application. As you have placed the us partition leader in us-east-1, it has a low read latency of 2ms. As this partition has a replica in a nearby region, the write latency is also low (<10ms).

US application

Similarly, the Europe application also has low read and write latencies.

Europe application

This pattern helps applications running in different regions to have low read and write latency, as they are reading and writing data to nearby partitions. At the same time, you are complying with local data protection laws by keeping the citizen data inside the country boundaries.

Global tables

You can add other tables without partitioning to be present across both geographies. This way, you can have both global tables and partitioned tables in your setup.

Failover

This pattern is resilient to region failure. When any of the regions hosting one of the partition leaders fails, the partition follower in another zone would immediately be promoted to leader and the application can continue without any data loss.

Failover

Locality-optimized geo-partitioning

Setup

Partition your data

Replica placement

Low latency

Global tables

Failover

Learn more