Logging Performance-Information in a Distributed Database
TL;DR: We store the yb_Active_Session_History (ASH) data in a table, whereby every node inserts its own data. To see data from other nodes, we query the table. We tried to minimize the inter-host communiction and avoid additional mechanisms like FDW or dedicated RPCs. And we keep the data (history) for possible future usage. The main disadvantages are that (1) there is "insert activity", (2) that here is a possible delay due to the sampling-interval, and (3) this concept depends on the correct working of most of the components on a node. Background: Keep History. When using Yugabyte (link), a distributed database, and trying to keep track of the workload-history, e.g. the performance-data, Postgres and Yugabyte make internal data available via views, and notably YB added the view yb_active_session_history. This data can be used to find bottlenecks and hotspots in your system. You can find the (first version of) ASH documentatin here: https://docs.yugabyte.com/preview/explor...