Logging Performance-Information in a Distributed Database

 TL;DR: 

We store the yb_Active_Session_History (ASH) data in a table, whereby every node inserts its own data. To see data from other nodes, we query the table. We tried to minimize the inter-host communiction and avoid additional mechanisms like FDW or dedicated RPCs. And we keep the data (history) for possible future usage. The main disadvantages are that (1) there is "insert activity", (2) that here is a possible delay due to the sampling-interval, and (3) this concept depends on the correct working of most of the components on a node.


Background: Keep History.

When using Yugabyte (link), a distributed database, and trying to keep track of the workload-history, e.g. the performance-data, Postgres and Yugabyte make internal data available via views, and notably YB added the view yb_active_session_history. This data can be used to find bottlenecks and hotspots in your system.

You can find the (first version of) ASH documentatin here: https://docs.yugabyte.com/preview/explore/observability/active-session-history/

We wanted to keep some of it for later use, to be able to look back at data from yesterday, and to keep data over server-reboots.


Per node data needs to be Unified.

Yugabyte is a Distributed database but the data in the views is still "local" to each node. We wanted a more Unified view to combine data from all the nodes. 

Franck Pachot fixed this already by using a Foreign Data Wrapper (FDW) and a Union-All to put the data in a single view using UNION-ALL, which is explained and demonstrated here:

https://dev.to/yugabyte/active-session-history-ash-in-yugabytedb-39ic

He then followed it up by a very cool visualization using grafana here:

https://dev.to/yugabyte/find-hotspots-with-yugabyte-active-session-history-45db

The ASH-data, and the concepts used by Franck will help you find bottlenecks and hotspots in your system.  There are some Great Possibilities here, and I hope others can build on his ideas to create all sorts of cool diagnostics tools.

But we still wanted a little more. For example, Franks solution still needs the FDW, it needs to call out to all nodes whenever someone queries the GV$ (Global Views).  And the GV views throw an error when one of the nodes is unreachable or simply down.

And we wanted to store data in tables for "historic" investigations.


Storing data in Tables, readable anywhere.

Instead of using FDW to query other nodes, we let each node store its own data. We insert data from the view yb_active_session_history into a table and add the nodename or hostname, the result from select host from yb_servers();

Sessions on other hosts will be able to see this data after commit, and it only needs the (Postgres-) SQL interface to access this data globally from the whole cluster, with any query-tool of your choice (psql or ysqlsh in our case).

There is the ASH data, plus another three views for which we like to keep the history:

yb_active_session_history

yb_local_tablets

pg_stat_statements

pg_stat_activity

Each of these warrants a discussion on its own, but that is for the moment out of scope. First let us describe the concept in more detail. (later posts can elaborate and point to repository with example code...)

To capture the data, we created functions that query+insert the data from the views, together with sample_time and nodename (hostname). The functions run at a "smaple interval" and we check for existing data and avoid "doubles". From the moment the data is inserted (committed) it is visible by any connection with access to the tables.

The Sample_Interval is important because it determines the maximum delay for data to become visible. A short interval will cause more overhead, but a long interval will hamper immediate-monitoring. The execution of the insert-functions was mostly below 1sec and we found intervals of 30s, 60s or 180s workable, depending on the situation. 


Pros and Cons

Pro: Simple code, use of existing tools. The big Advantage is that there is no need for FDW, no mappings, no imports of remote objects.

Pro: Robust Execution, local to each node. There is no dependency on other nodes, and no impact when nodes leave or join the cluster.

Pro: Creation of History, in tables. Data is now saved over longer periods of time. The amount of history is no longer constraint by buffer-sizes, and data is saved over node-outages. This also allows for better post-mortem investigation of problems.

Con: Inserts needed before data is visible. The additional table-insert activity should be kept to a minimum. We replaced the FDW-calls by insert- and select-queries on tables. We have not yet tested this on a system with large nodes or under heavy load other than some pg_bench or self-made artificial million-insert type of jobs. Collecting the data was generally in the 1-sec-per-node range. 

Con: Sample-interval is the max time before data becomes available for inspection on other nodes. When looking at a live-problem, an interval of 30sec or more may be considered too much. But alternatively, a high-frequence query of FDW or other mechanisms may also be a problem.

Con: Dependency on working node(s). This concept needs the full stack of postgres and tserver to be working in order for the logging to work and to allow the inspection of results. For more urgent troubleshooting, other approaches may be needed.

note1: I would expect others to come up with more pros and cons..

note2: At some point, we would consider adding some network-monitoring, heartbeat-time-logs,  node-outage-logging, or other connectivity monitoring, but that is way out of scope for the moment.


Closing Remarks.

Expect some discussion. 

Work in Progress... will try to create a repository with (readable) demo-code ASAP.






Comments

Popular posts from this blog

yugabyte : Finding the data, in a truly Distributed Database

Testing Resilience of CockroachDB - Replacing 1 node.

yugabyte terminology and related information