[Draft] Postgres - Yugabyte: logging of sessions, and (active-) session-history
[DRAFT - for discussion only]
TL;DR: After examining the possibilities of pg_stat_... and combining it with yb_active_session_history (ASH), I have some ideas and suggestions to help with finding hotspots and heavy statements.
Background:
My unstoppable urge to investigate databases via SQL, and a focus on "knowing what happens" with connections and queries.
Philosophy: Queries are the root of all (user-)activity and (user-)problems. Queries come from application-connections or sessions. And in a Distributed Database (DD), the sessions and their queries can cause work on any node in the cluster. Hence you need to be able to "track them down".
In PostgreSQL we have pg_stat (_activity, _statements) for that.
Yugabyte added a feature called Active Session History (ASH), and I am combining that with information from pg_stat_activity and pg_stat_statements to try and identify "heavy queries" and relate them back to session, and thus to users and applications.
Capturing the Data : Sessions and ASH...
For starters, I re-used some of my PG knowledge to Identify and capture a Session.
To define Session: The process that connect an application to the database. The Session is the "cause of all activity" (excluding backgrounds for a moment...). But in YBDB, sessions can appear on any tserver (-node, -host) in a cluster.
The identifying attributes of a Session are (my guess) tserver, pid and backend_start.
The application connects "at a host" which generates pid for the postgreSQL process (local to the host or the tserver). But because pid can be re-used, I we need the backend_start to identify the process precisely. Other attributes, such as client_addr and client_port are (I think) dependent on host+pid+backend_start. This would define the entity/table which I called "session master": ybx_sess_mst.
The "Session" must now be connected to "active_session_history" to identify ASH-records "caused by this session". This can be done via client_add and client_port, which are contained in what is currently the client_node_ip (text).
An ASH record is further connected to
- Tserver: on which it "runs", which is often different from the tserver were the session originates. This is were we start tracing activity from one session into the multiple tservers involved in processing the queries.
- Query (via query_id or root-req). An ASH records seems to only have one (or none) query-ids that "cause the activity", possibly connected via Root-Request.
- Tablet (via wait_ev_aux). An ASH record may act on one (or none) tablet, supposedly related on the same tserver as where the ASH-record is found.
If I create entities (tables) representing session, ash, and tablets, I end up with approximately the following ERD:
Note: My actual tables are slightly more complicated and also include logging per-query (pg_stat_stmnts), per-database (pg_stat_database), per session, and per master and tserver (a.k.a. per host or node, information that has to be scraped from stdout by the yb-admin tool..).
Note also that I have ignored "root request id" (RR) for the moment. In future versions I suppose RR is the connecting entity that links ASH to a Single Query. But I am not 100% sure of that, yet.
Note: A table of type _mst is master-record and contains data that is less bound to a log-time. A table of type _log contains "polled" data. The more elaborate datamodel contains additional log-tables for most of the "master" tables: ybx_sess_log, ybx_tsrv_log etc...
The log-records from any mst will allow the building of a history (and a nice graph...).
Reporting
From this point on, I can devise several "reports", for example:
1) Find the ASH-records for a given session (host, pid, start), in chronological order, for as far as the data is captured by the polling of ASH....
2) Find the various sessions (and their app/origin) that work on a given tablet or table.
3) Identify the queries that seem to cause most load, for further examination, and find out which sessions (hosts, app) they originate from.
A lot, maybe all, of this information can also be generated from clever queries on yb_act_sess_hist, but having the separate entities defined helps both the thinking-process and allows for more simple logging and reporting. Hence I try to capture the data from ASH and yb_local_tablets, pg_stat_activity and pg_stat_query into tables.
That information can help me to identify "heavy load" or problem-queries. Furthermore, by identifying those entities and attributes, I can devise better graphs (grafana!) and build a logging system to capture the available data and present it later for analysis.
Illustration, just from queries on (logged-)ASH data:
Note: Include here some illustrating Examples!
Concluding Remarks (This is all still a Draft!):
I like the ASH capability of YBDB. Despite some drawbacks (polling, unclear ID of sessions, unclear attribution of activity from request-0000), it allows me to dig into the history of sessions and to locate "effort spent" by various application-connections.
Questions (several... and more):
Q1: Could the field client_node_ip be split into client_addr + port?
Q2: Is Root_Request meant to link Session, Query and ASH? Should RR be a separate Entity? Would it merit its own "view" so SQL can interrogate it ?
Q3: How to handle sessions that connect + disconnect in between polling and thus never get noticed ?
Q4: yb_local_tablets is very local to the tserver.
Q4a: It will help to have datid and oid of the table available with the yb_local_tablet to better link it to a table (if the table is in the same datid!)
Q4b: Similarly, .. are pg_stat_stmnts relatable to datid ?
More To Be Done...
- Extract, Examine and log the data for a "root_request_id". Currently I don't have a separate table(s) for that yet. I am debating if it needs one. My current understanding is: RR represents the execution of a single SQL from a Session.
- Better, pre-built, scripts to dig into a given query-id or Root-Request. For example to present the chronological "work done" by the session of RR.
- The 00000s: Better examination of the work done by background processes, the ones represented by RR and Top_Level_Nodes of "zeros". How much of the work do those represent, and is there any way to measure it, break it down, and maybe control it ?
- Log more session-data: We could take a copy of (some of) the data in pg_stat_activity to build a history of the session. Suggested table: ybx_sess_log. Possible relevant data to be taken from pg_stat_activity: allocated_mem, rss_mem, wait_event, wait_event_type. The "downside" of sampling this data is, that at the moment, I can do "only sampling". We miss sessions that connect/disconnect inside a polling interval. Any option to save session-info on disconn ? Overhead!!!
- More data to examine on surrounding entities: Tablets (tblt_uuid, using yb_local_tablets), Tserver (ts_uuid), master (ms_uuid). From Tablets, I am notably asking Yugabyte to add colums to yb_local_tablets to identify Leader/Follower and to identify "state". the "tombstoned" tablets seem to come up and confuse the results of queries on yb_local_tablets.
Comments
Post a Comment