Yugabute distributed database Some Resilience testing

TL;DR: Yugabyte is a Distributed Database. I destroy some nodes to see what happens. The Database Survives.

Later: Master-nodes are particularly Important. 


Background

Failure of nodes, or failure of connectivity (network-to-nodes) is probably the most common in distributed system. I'm going to try a few simple things like taking out a node, and trying to replace a node.

Note that my setup is still primitive: my nodes are docker-containers running the template-downloadable image on a macbook (described here). I am currently looking to find and test the Concepts and Principles behind yugabyte, rather than doing real-world testing.

So what happens if I remove 1 or 2 nodes ... ? 


Overview and Setup

The creation of my cluster is now smoothly scripted (link to script?), and by using yugatool, I can see that my cluster looks like this:

We see 3 Masters (bcse RF=3), and a total of 7 Tservers. On fresh-create of the cluster, the top node is the leader of the masters, IP=172.20.0.3. In docker it is known as node3, and hostname is set to node3. I made sure last digit of IP = node-nr to easily recognise nodes.

Also, in the same docker network, I created an additional node: nodeX. This node allows me to connect to the other nodes, using yugatool and ysqlsh to inspect the cluster from "outside". NodeX serves as my client- and monitor-platform inside the network.

In my database, I have created 2 tables. Table t7, is creeated with no options and resulted in a Table that got split (sharded) into 7 Tablets, with one tablet-leader at every node. And for every tablet there will be 2 replicas, tablet-followers, at various other nodes. I've seen that happen and documented in earlier blogs (link). If I insert a large amount of data into t7, I observe CPU and IO activity on all nodes: it can be a Nicely Distributed Database! 

I also created Table t, which I deliberately (un)sharded using the option "split into 1 tablets". And that the replicas of that single Tablet should be on only 3 of my nodes. 

Let's inspect with yb-admin:


First command shows a "list_tablets" for t7, and yes it has 7 tablets (shards), with a different Leader-node for each tablet (I wouldnt mind seeing that list ordered by host or by tablet-uuid...).

Now notice Table t only has only in 1 Tablet. And I when I ask yb-admin for the "list_tablet_servers uuid..." for that Tablet, it shows me the leader (node3), and the followers (nodes8 and node7). So I now know on which nodes the replicas of t1 are kept. And I could verify that by inserting 10K records into t, on which I noticed the IO using both yugatool and dsar (as demonstrated in earlier blog). 

This single-tablet-table will allows me to specifically target the three tserver nodes that process the replicas of the shard of that table.

Gratuitous remark about this setup: "Dont do this at Work", The whole setup runs on an old macbook MBP i7, 2013. I do not pretend this is a professional setup at all. It is for exploration purposes only (link to earlier blog). I'm currently looking for Understanding, not for Performance yet.


Some running program to use the Database...

To have some semblance of simulated connection, I have a script that can run on nodeX or on the hosting mac to insert records into t. This way I can have a running test-connection. in my case, I run an insert-query into Table t, every once per second. This will allow me a crude measurement of "availability".

I dont use a load-balancer or anything like pg_bouncer (yet), so I'm using a connect URI to allow the ysqlsh client to find any node alive. This works reasonably well:

ysqlsh postgresql://yugabyte@node2:5433,node3:5433,node4:5433,node4:5433 ...

In a shell-script, I let it loop to connect 1/second, and insert a record into t. Primitive, but sufficient for the moment.


Stopping a node: Resilience !

First thing I tried: stop some nodes.

First I stopped node3. I know that node3 is the "leader" for the single Tablet where Table t is stored, and I know that node3 is part of the Master-list.

docker stop node3

Results: Very Little... Because, this is a Distributed Databaase, and it seems to behave fairly Resilient.

Acutally, the yugatool, yb-admin, and the insert-loop did notice. yugatool now shows node3 as missing from the Masterlist, and with "alive=false" in the TServer list:



And the insert-loop that was trying to insert 1-per-second, has been hiccuping, and there is a gap from 09:13:01 to 09:13:24: 


For about 23 seconds, the inserts have failed (I actually noticed the terminal-screen with the loop hanging...), but at 09:13:24, the inserts started looping again. Without me Interfering. It just Continued, no intervention was needed here.
(In practice, I would expect a resilient application to have some timeout-mechanims and error-detection, probably failover and/or  re-try mechanisms).

Let's see what happed to our Table, who supposedly "lost a tablet-leader". yb-admin can tell us something:



Notice how the single-Tablet under Table t, now has a new Leader: node8 was elected as new Leader. Node3, who is down/dead/broken is now listed as a follower, node7 is still following. 

So far, Fairly Resilient, I'd say.

But there is more: after about 15min of me trying to write this report, the Tablet shows a different config: node4 has replaced node3 in the list of followers, check the new output from yb-admin "list_tablet_servers" again:


It seems the coordinating mechanisms, Masters or Tablet-raft-group, have decided that node3 wont come back and have replicated the tablet to node4. This restores the desired RF=3 for this Tablet. (I need to RTFM on this, I recall a timeout of 15min before it is decided a TServer wont come back).

We notice: Resilience, and some Automatic-Repair.

Incidentally, an I tried this as well: if you stop a TServer-node that has no business with the Table t, you wont notice a thing, not even a glitcht in the insert-statements.
Here is the "cluster_info" after I kicked out node3 (documented above) plus node5 and node6. The insert loop is still running, no glitch. Try it for yourself ! 


Check the alive=false at nodes 3, 5, 6. 
Database still works, but here is a lot to experiment here.



Verdict so far: Resilient against node-faillure. 

Caveat: "Resilient, so far". Because I will keep trying to break it..

I would encourage others to try this, and do more of this testing. What happens if you rapidly break 2 of the 3 nodes that support a Tablet? What if a majority (quorum) of the masters break (I can guess...). What kind of timeouts and repair-mechanisms are in place - the RTFM will tell me, I need to read...


Questions and More

Next attempts will probably be breaking the Master nodes. I have a suspicion that those are the more vulnerable part: There are only 3, and whereas Tservers seem to automatically take over each others work when killed, the Master-list looks more "fixed".

----------- End of this Blog-post - Possible Questions and remarks below --------


Q: what mechanism moves the tablet-replicas to available nodes (probably an RTFM I need to look for).

Q: why does one of the tservers not pick up master role if masters < 3 ? 

Q: startup 3 masters, kill 1, then test and start node4.. does it become master ? 

Q: what is master_uuid in yugabyted ? where does it point ?

Q: how come master-nodes seem to alternate as leader ? what makes the leader-role flip from 1 node to the next ? and is it round-robin ? 

Q: yugatool seems to fail for a small interval when the master-leader falls away. Correct ? The yugatool seems to rather displays <error, nothing> instead of reporting dubious data until a clear master-leader is elected ? 

Q: stopping tservers.. sometimes the re-start of a t-server fails.. but re-trying a few times generally fixes that. Funny ? 

Q;  when creating several tables "split into 1 tablets", they all seem to end up on same leader-ts and follower ts.. 1) verify and 2) is this intended ? default ?  

Q: node2, on starting he cluster, he first-master-node, has to run for tservers to join ???  Test! But sometimes the Master-Leader role seems to shift with no apparent cause?

Q: [local problem my nodes?] after stop+start of nodes 1by1, some nodes do not accept ysqlsh connections: postgres didnt start ? possibly disk full. which logs to check ? Fixed by 1)removing large table, and 2) re-starting nodes. Suspect disk full.

Q: re-IPs.. It is not clear to me if YB always identifies a node by UUID(can it?), by hostname (probably not) or by ip (v4) address.. I would prefer to use hostnames consistently, but a lot of the examples use IP.  (and the tools yb-admin and yugatool dont order by uuid, or by IP, so I end up “searching” a lot throught the lists - minor niggle, but yeah..…)

Q: when (re) starting my nodes, in order, not all nodes seem to get postgres up and listening at their port. Anything I can do or check to investigate this ? 

  --- end --- 


Comments

Popular posts from this blog

yugabyte : Finding the data, in a truly Distributed Database

Testing Resilience of CockroachDB - Replacing 1 node.

yugabyte terminology and related information