yugbyte IO monitoring and load-balance verification.

TL;DR: I've experimented with a the running of a "Distributed Database", and concluded that the load indeed gets distributed over several nodes. I'm also discovering how to use the tools. And I "Double Check" as much of the information as I can. There are always some Surprises, and Questions...


Background

The nature of Distributed is to Spread the load. I'm going to try and observer some of that distributed load. Currently still mainly in the Explore and Verify stage of my discovery. I'm deliberately going to cause what is probably an unbalanced-load, and see if I can find + fix that. 

Note that my setup is still primitive: my nodes are docker-containers running the template-downloadable image on a macbook (described here). I am currently looking to find and test the Concepts and Principles behind yugabyte, rather than doing real-world testing.

 

Tooling and monitoring: yb-admin, yugatool, dsar:

I built a primitive inserter to connect and store 1 record per second in a table called "t". By counting the records (group by timestamp), I can spot crude anomalies in the functioning of the database. More on that possibly in later blogs..

To monitor the up/down status of components, I use yb-admin and yugatool.

yb-admin : The Swiss Army Knife of yugabyte (link). Check out the doc-page and run it with option --help to find the literally 100 operations it is capable of. 

yugatool: A separate downloadable tool from yugabyte (link), which notably provides a nicely viewable screen with overview of a cluster:


Check the yugatool screen above to see: 3 master nodes, and 7 tserver-nodes. Note that the master-nodes also appear in the tserver-list: The master-nodes run both the master-process to coordinate activity, and the tserver-process to process DML (ins/upd/del) and Query-work. Notice this cluster is very quiet, mostly zeros in the read/write columns. I'd like to change that...

And I started using a utility called dsar by Frits Hoogland. The dsar gives me an impression of how busy the nodes in the cluster are. The name dsar probably stands for : Distributed-System Activity Report as inspired by classic unix sar and sadc from Sebastien Godard.

Example of dsar:

In the screenshot above I can see, from the right-most column, that node3, node4 and node8 are busy doing some IOPS, which concurs with me inserting 10K records in a table called t. More on that later.


Verify... Trust no1.

So just to re-verify: a Table in yugabyte is kept (split) into 1 or more Tablets, and Tablets are replicated to 3 different TServers (in case of the dflt RF-3).

Good Excuse to put an X-file reference in... 


Now.... I can recommend anyone to try something similar to what I did: 

1. Create a cluster with >3 nodes. I created 7 nodes, but this demo will work with 4 nodes as well.

2. Create a table with option "split into 1 Tablets". This single tablet, at RF=3, will be replicated to three of the nodes.

3. Now insert a load of data, say 10K records or more. Pick some insert or update that will run at least 15sec on your systems, to generate some significant CPU+IO. And observe the CPU-activity, the IO or the disk-space consumption on all nodes. 

4. yugatool with option cluster_info will show "writes" on 1 of the nodes. In my case, node8: right-most column: 


We can now see 240 Writes on an otherwise quiet cluster. Funny enough, yugatool only reports write-activity on 1 of the nodes, I had expected 3 of the nodes to get busy. RF=3, so I should be getting 3 copies, right...?

5. If you also use any of the tools du, iostat or dsar, they will actually show IO-activity on Three of the nodes. In my case node3, node4 and node8. Check the dsar output above, where 3 of the 7 nodes got busy...

From dsar I learned that there are actually 3 nodes getting busy during my large insert. And you can confirm this with iostat, or du, or any other tool. This "big insert" actually generated more or less equal activity on 3 of the nodes. Those are probably the nodes where the tablet-replicas are held. 

6. Now to confirm: I can use yb-admin to find out what the tablet is below table t, using the command yb-admin to list_tablets. And once I have the tablet uuid, I can use that to run yb-admin with the command: list_tablet_servers to find the leader+followers for the given Tablet. This is how I ran both commands:


You now see that Table t is held in a (single) Tablet with uuid=995e044....

And that this Tablet, uuid=995e044.... is replicated over nodes with IPs 172.20.0.4, -8, and -3,  which correspond to : node4, node8 and node3. It also states that node8 is the leader for this tablet, which explains why yugatool reported the IO on node8. Why didn't yugatool also report the IO on the follower-nodes: node3 and node4 ? Something to ask yugabyte...

You can also verify this information on Table + Tablet + TServers using the web-guis at ports 9000 and 7000 but on a larger database, this takes a bit of Control-F and scrolling.

7. More Verification: When I repeated this with the Table t created by default, e.g. split into 7 Tablets, one tablet per available node, the write-load reported by yugatool, and the IO-activity reported by dsar was nicely divided over all 7 nodes... 

And That is how you want a Distributed Database to look: Distributed.


Learnings so far

Firstly, I would encourage anyone to repeat this, and do similar checking on other systems. Please verify this on Your System in Your Environment, and with Your Data... 

In a previous blog, I stated that in some cases, I didnt like the too-many-tablets (Too much of a Good thing), but here I demonstrated that too-few tablets can lead to un-balanced load. Large Tables should probably be well-sharded, distributed.

Furthermore, even though yugabyte is very "Distributed", you should still check for un-balanced activity. A node that appears Idle may or may not actually be quiet because some of the replication activity seems to happen under the radar. Especially if you try to "place tablets" by using non-default options (such as Split Into...). 

I think in the demo above, I demonstrated both an unbalanced-load (caused very deliberately by me), and some non-reported activity (found +/- by accident). 

I was a little surprised when yugatool reported IO on just a single TServer (node8), when I know the tablet should be replicated over 3 nodes, and I had earlier used du (the unix disk usage tool) to verify that indeed, disk-usage is increasing and data gets added on 3 of the servers.

And the yb-admin tool also seems to report mainly the "Leader-IO". Here is a screenshot of yb-admin while I'm inserting 10K records in a Table t which is held in just a single Tablet, where node8 is the leader:

yb-admin lists approximately the same data as yugatool when using the command  list_tablet_servers. We see the "writes" column up for node8, and it also seems to not-report the IO activity generated by the follower-tablets. 


More...

There is always more to say about tooling and observations, but I'll keep that for later. First I want to understand the basics. 

Geo-partitioning is also something I should look into. Spreading/distributing data (and load) over far-out datacentres will probably come with some interesting challenges. But again, I want to get the basics first.

Also, I can't wait to remove + replace some of the components: this "Distributed Database" is supposed to be Resilient, and has kept up very well with my "abuse". It does not seem to have too many SPOFs (Single Point of Failures). 

And I'm curious what happens if I keep creating tables with just 1 or very few tablets: will all tablets end up on the same 3 nodes or will they get spread out to try and balance the load. Something to Test.

On the tooling, I'm slightly annoyed by the very long command-lines need. I tend to define aliases and set env-variables, such as $MASTERS to hold my the masterlist. But if I abbreviate a long yb-admin command and options into alias "ybadm", I hide a lot of the details and screenshots become more cryptic.


The whole concept of a Distributed Database looks promising. 

I'm going to try to find time to explore further...


------- End of Blog - possibly some notes and questions below -------


Questions below, possibly for the Yugabyters:


Q: Is it correct that yb-admin and yugatool only report IO from "Leader" tablets TServers? And if so, why do yb-admin and yugatool seem to under-report or not-report the IO from Follower-Tablets on the following TServers ? When using to just 1 tablet (possibly a colocation=true), this gives the impression of un-balanced load, while in the background 2 or 4 additional tablets are being used ? 

Illustration: writing to a table "split into 1 tablets", shows the write-IO concentrated on host 172.20.0.8, whereas the activity on nodes 3 and 4 (the followers) is also happening. (screenshots yugatool and yb-admin above)


Q: Would it be sensible to report the Masters and TServers ordered by ip-address or hostname ?  Or even by uptime ?


Comments

Popular posts from this blog

yugabyte : Finding the data, in a truly Distributed Database

Testing Resilience of CockroachDB - Replacing 1 node.

yugabyte terminology and related information