Observability in YugabyteDB

Observability

Monitoring, alerting, and analyzing metrics

Observability refers to the extent to which the internal state and behavior of a system can be understood, monitored, and analyzed from the outside, typically by developers and DevOps. It focuses on providing insight into how a system is performing, what is happening inside it, and how it is interacting with its environment.

The goal of observability is to make it easier to diagnose and resolve issues, optimize performance, and gain insights into the system's behavior. It is especially important in modern, complex, and distributed systems, where understanding the interactions between different services and components can be challenging. DevOps Research and Assessment (DORA) research shows that a comprehensive monitoring and observability solution, along with several other technical practices, positively contributes to the management of software infrastructure.

YugabyteDB provides several components and features that you can use to actively monitor your system and diagnose issues quickly.

Metrics

Use metrics to track trends and identify performance issues, and manage the system's performance and reliability.

YugabyteDB exports various metrics, which are effectively quantitative measurements of the cluster's performance and behavior. These metrics include details on latency, connections, cache, consensus, replication, response times, resource usage, and more:

Alerting and monitoring

Monitoring involves continuously checking the system's health and performance and notifying stakeholders if any issues arise. For this, you can set up automated alerts based on predefined thresholds or conditions. All metrics exposed by YugabyteDB are exportable to third-party monitoring tools like Prometheus and Grafana which provide industry-standard alerting functionalities.

Both YugabyteDB Anywhere and YugabyteDB Managed provide a full suite of alerting capabilities for monitoring.

Visualization and analysis

YugabyteDB provides dashboards that include charts, graphs, and other visual representations of the system's state and performance. yugabyted starts a web-UI on port 15433 that displays different charts for various metrics.

You can also export the metrics provided by YugabyteDB onto third-party visualization tools like Prometheus and Grafana as per the needs of your organization.

Both YugabyteDB Anywhere and YugabyteDB Managed come with a full suite of visualizations to help you monitor your cluster and troubleshoot issues.

Logging

Logs from different services, such as the YB-TServer and YB-Master provide a historical record of what has happened and can be very helpful in debugging and troubleshooting. These logs are rotated regularly, based on their size as configured. See Logs management.

Query-level metrics

The following table describes views in YSQL you can use to monitor and tune query performance.

View Description
pg_stat_activity View and analyze live queries
yb_terminated_queries Identify terminated queries
pg_stat_progress_copy Get the status of a COPY command execution
pg_locks Get information on locks held by a transaction
pg_stat_statements Get query statistics (such as the time spent by a query)
yb_local_tablets Get YSQL/YCQL and tablet metadata details

To get more details about the various steps of a query execution, use the Explain Analyze command.

Active Session History

Active Session History (ASH) offers insight into current and past system activity by periodically sampling session behavior in the database. ASH functionality extends to YSQL, YCQL, and YB-TServer processes, and helps you to conduct analytical queries, perform aggregations, and troubleshoot performance issues.

Learn more