Key concepts

Glossary of key concepts

ACID

ACID stands for Atomicity, Consistency, Isolation, and Durability. These are a set of properties that guarantee that database transactions are processed reliably.

  • Atomicity: All the work in a transaction is treated as a single atomic unit - either all of it is performed or none of it is.
  • Consistency: A completed transaction leaves the database in a consistent internal state. This can either be all the operations in the transactions succeeding or none of them succeeding.
  • Isolation: This property determines how and when changes made by one transaction become visible to the other. For example, a serializable isolation level guarantees that two concurrent transactions appear as if one executed after the other (that is, as if they occur in a completely isolated fashion).
  • Durability: The results of the transaction are permanently stored in the system. The modifications must persist even in the instance of power loss or system failures.

YugabyteDB provides ACID guarantees for all transactions.

CDC - Change data capture

CDC is a software design pattern used in database systems to capture and propagate data changes from one database to another in real-time or near real-time. YugabyteDB supports transactional CDC guaranteeing changes across tables are captured together. This enables use cases like real-time analytics, data warehousing, operational data replication, and event-driven architectures.

Cluster

A cluster is a group of nodes on which YugabyteDB is deployed. The table data is distributed across the various nodes in the cluster. Typically used as Primary cluster and Read replica cluster.

Sometimes the term cluster is used interchangeably with the term universe. However, the two are not always equivalent, as described in Universe.

DocDB

DocDB is the underlying document storage engine of YugabyteDB and is built on top of a highly customized and optimized verison of RocksDB.

Fault domain

A fault domain is a potential point of failure. Examples of fault domains would be nodes, racks, zones, or entire regions.

Hybrid time

Hybrid time/timestamp is a monotonically increasing timestamp derived using Hybrid Logical clock. Multiple aspects of YugabyteDB's transaction model are based on hybrid time.

Isolation levels

Transaction isolation levels define the degree to which transactions are isolated from each other. Isolation levels determine how changes made by one transaction become visible to other concurrent transactions.

YugabyteDB offers 3 isolation levels, Serializable, Snapshot and Read committed in the YSQL API and one isolation level, Snapshot in the YCQL API.

Leader balancing

YugabyteDB tries to keep the number of leaders evenly distributed across the nodes in a cluster to ensure an even distribution of load.

Leader election

Amongst the tablet replicas, one tablet is elected leader as per the Raft protocol.

Master server

The YB-Master service is responsible for keeping system metadata, coordinating system-wide operations, such as creating, altering, and dropping tables, as well as initiating maintenance operations such as load balancing.

The master server is also typically referred as just master.

MVCC

MVCC stands for Multi-version Concurrency Control. It is a concurrency control method used by YugabyteDB to provide access to data in a way that allows concurrent queries and updates without causing conflicts.

Namespace

A namespace refers to a logical grouping or container for related database objects, such as tables, views, indexes, and other database constructs. Namespaces help organize and separate these objects, preventing naming conflicts and providing a way to control access and permissions.

A namespace in YSQL is referred to as a database and is logically identical to a namespace in other RDBMS (such as PostgreSQL).

A namespace in YCQL is referred to as a keyspace and is logically identical to a keyspace in Apache Cassandra's CQL.

Node

A node is a virtual machine, physical machine, or container on which YugabyteDB is deployed.

OID

Object Identifier (OID) is a unique identifier assigned to each database object, such as tables, indexes, views, functions, and other system objects. They are assigned automatically and sequentially by the system when new objects are created.

While OIDs are an integral part of PostgreSQL's internal architecture, they are not always visible or exposed to users. In most cases, users interact with database objects using their names rather than their OIDs. However, there are cases where OIDs become relevant, such as when querying system catalogs or when dealing with low-level database operations.

OIDs are unique only within the context of a specific universe and are not guaranteed to be unique across different universes.

Primary cluster

A primary cluster can perform both writes and reads, unlike a read replica cluster, which can only serve reads. A universe can have only one primary cluster. Replication between nodes in a primary cluster is performed synchronously.

Raft

Raft stands for Replication for availability and fault tolerance. This is the algorithm that YugabyteDB uses for replication guaranteeing consistency.

Read replica cluster

Read replica clusters are optional clusters that can be set up in conjunction with a primary cluster to perform only reads; writes sent to read replica clusters get automatically rerouted to the primary cluster of the universe. These clusters enable reads in regions that are far away from the primary cluster with timeline-consistent data. This ensures low latency reads for geo-distributed applications.

Data is brought into the read replica clusters through asynchronous replication from the primary cluster. In other words, nodes in a read replica cluster act as Raft observers that do not participate in the write path involving the Raft leader and Raft followers present in the primary cluster.

Rebalancing

Rebalancing is the process of keeping an even distribution of tablets across the nodes in a cluster.

Region

A region refers to a defined geographical area or location where a cloud provider's data centers and infrastructure are physically located. Typically a region consists of one or more zones. Examples of regions include us-east-1 (Northern Virginia), eu-west-1 (Ireland), and us-central1 (Iowa).

Replication factor (RF)

The number of copies of data in a YugabyteDB universe. YugabyteDB replicates data across zones (or fault domains) in order to tolerate faults. Fault tolerance (FT) and RF are correlated. To achieve a FT of k nodes, the universe has to be configured with a RF of (2k + 1).

The RF should be an odd number to ensure majority consensus can be established during failures.

Sharding

Sharding is the process of mapping a table row to a tablet. YugabyteDB supports 2 types of sharding, Hash and Range.

Tablet

YugabyteDB splits a table into multiple small pieces called tablets for data distribution. The word "tablet" finds its origins in ancient history, when civilizations utilized flat slabs made of clay or stone as surfaces for writing and maintaining records.

Tablets are also referred as shards.

Tablet leader

In a cluster, each tablet is replicated as per the replication factor for high availability. Amongst these tablet replicas one tablet is elected as the leader and is responsible for handling writes and consistent reads. The other replicas as termed followers.

Tablet splitting

When a tablet reaches a threshold size, it splits into 2 new tablets. This is a very quick operation.

Transaction

A transaction is a sequence of operations performed as a single logical unit of work. YugabyteDB provides ACID guarantees for transactions.

TServer

The YB-TServer service is responsible for maintaining and managing table data in the form of tablets, as well as dealing with all the queries.

Universe

A YugabyteDB universe comprises one primary cluster and zero or more read replica clusters that collectively function as a resilient and scalable distributed database.

Sometimes the terms universe and cluster are used interchangeably. However, the two are not always equivalent, as a universe can contain one or more clusters.

xCluster

xCluster is a type of deployment where data is replicated asynchronously between two universes - a primary and a standby. The standby can be used for disaster recovery. YugabyteDB supports transactional xCluster .

YCQL

Semi-relational SQL API that is best fit for internet-scale OLTP and HTAP apps needing massive write scalability as well as blazing-fast queries. It supports distributed transactions, strongly consistent secondary indexes, and a native JSON column type. YCQL has its roots in the Cassandra Query Language.

YQL

The YugabyteDB Query Layer (YQL) is the primary layer that provides interfaces for applications to interact with using client drivers. This layer deals with the API-specific aspects such as query/command compilation and the run-time (data type representations, built-in operations, and more).

YSQL

Fully-relational SQL API that is wire compatible with the SQL language in PostgreSQL. It is best fit for RDBMS workloads that need horizontal write scalability and global data distribution while also using relational modeling features such as JOINs, distributed transactions, and referential integrity (such as foreign keys). Note that YSQL reuses the native query layer of the PostgreSQL open source project.

Zone

Typically referred as Availability Zones or just AZ, a zone is a datacenter or a group of colocated datacenters. Zone is the default fault domain in YugabyteDB.