Monitoring a Disributed Database (notes)

[not published yet - need more work in "locally available" and "need to combine"]

TL;DR: Keeping an eye on a database that runs on multiple servers (nodes, containers....), possibly in different locations challenging. An SQL-database with logging-data will help. I'll describe two possible mechanisms with some pros and cons.


Background: The Distributed database, multiple nodes.

Distributed, hence it runs on multiple servers (or containers, or whatever entity that can provide the compute + memory + storage you need). The processing of calls, queries is also distributed. 

And any given SQL, even if it concerns just 1 records, is likely to touch several nodes: Query, Catalog, Storage, Consensus, Master, Replicas... And a query that uses joins and filters, and thus touches muliple tables and many records, is bound to need calls, RPCs, to several nodes. This is illustrated by the fact that the  "explain-plans" of both Yugabyte and Cockroach are already more complicated than those of just postgres. 

To monitor and diagnose such a system, you will need to combine data from several nodes. But the view, the information available on each node is almost always limited to the "local info". Combining data from multiple nodes means more RPCs or Transport of data. Something that distributed databases are good at, but they generally dont want to create more "overhead".

The collected data will typically be used for 3 things: a) live-monitoring, b) trend-analysis, and c) drill down to troubleshoot.


Part-solution: pg_stats and similar mechanisms.

The existing postgres-tooling (pg_stat_activity) is a good start. This "tooling" is able to look inside the pg- engine and report the ongoing  But these tools can generally not see outside the server they run on. PG_STATS on yugabyte, and the yugabyte "active_session_history" are both limited to the local node only. Hence, combining with data from other nodes requires extra work.


Combining: You almost always need data from Multiple Nodes.

Activity in distributed systems is .. distributed. Hence an action, or an event, generally has its root-cause, its top-level-event, located on another node. And similarly, a query that takes a long time, may have a whole tree of RPCs to other nodes below. To get a good view of things, you need to combine information from several nodes.


Solution 1: External: some system will call, polling, each node.

An external database does RPCs to collect data from each node in turn, and stores the data on a separate system, not part of the main database or cluster. This can be done by running a (small, postgres) database that is equipped to call every node in the cluster in turn, to collect, store and present data. 

Typically: a Postgres database with FDWs  (Foreign Data Wrappers) could do this job using scheduled jobs (calls) to each node in turn and insert the data in tables for keeping. The central database can then query the data it wants to see, combining information from multiple nodes.


Solution 2: Internal: Each node does Save to Database.

Each node Saves the locally available data to tables in the database itself. Interested parties can pick it up for display using SQL connections to the database itself. As a refinement, the tables could be defined with affinity to specific nodes to isolate them from the nodes doing real work.

Typically: Each node would schedule a stored procedure to run insert-into statements to store the locally available information in tables. In a distributed database, that data would be instantly visible on commit, and retrievable for all nodes in the cluster. Any node can query and combine the data it wants to see.


What is the best way ? It Depends.

Depending on...

Do you want to maintain a separate component ?

Do you prefer a stand-alone system that can stay up when your main database is down ? 

How much time do you want to spend in setup and operation ? 

Would you rather (me Lazy) just use what is available and deploy a few stored-procs to start immediately ? 


Whichever method is generally best, I dont know yet. Being lazy, and coming from a mostly single-node background, and grown up with oracle-statspack, I would start using method 2: have each node store its own data, and combine only as needed when the data is committed. 

Others may opt to create a "monitor node", or a "system-management-console", possibly just a container, that integrates a database, the needed FDWs, and one or more graphical tools (Prometheus + Grafana!). Main benefit is that the stand-alone system can be used to monitor more systems, and will likely remain "available" even if your main database is in trouble.


And there are of course many other possibilities than just the two I described above. 

Interesting to see what will develop.

Comments

Popular posts from this blog

yugabyte : Finding the data, in a truly Distributed Database

Testing Resilience of CockroachDB - Replacing 1 node.

yugabyte terminology and related information