Yugabyte - Testing: a 7 node cluster can survive on 2 nodes.
TL;DR: Just playing: I reduced a 7 node cluster back to 2 nodes. It Still runs!
Background:
For previous experiments, I was setting up my 7 node test-system, and did some careful checking with bringing down nodes (link). At the end of that test I had 4 of the 7 nodes still running. At the end of my test, yugatool cluster_info looked like this, with nodes 3, 6, and 5 no longer alive:
After write-up of the previous test, I had "forgotten" my other 4 nodes were still running. When I noticed the terminal window with yugatool-display, I just wanted to see how far I could get...
Just Playing: removing nodes...
So I had a cluster, with 4 remaining nodes. And I also knew that my critical test-table was held in a single tablet, replicated over node4, node8 and node7. Re-confirmed by this screenshot of previous blog, showing where the (single) Tablet of Table t is replicated:
From what I suspected, the nodes 2 and 4 had to stay up to keep a majority (quorum) of yb-Master-processes, minimum of 2 in an RF=3 cluster. But node7 and node8 were only there to support yb-TServer-processes.
I started by sending stop to node8: this would cause a hiccup in the inserts and hopefully prompt the Tablet to elect a new leader.
That worked. Nothing broke. Inserts kept looping at about 1/sec, as designed.
Then I was left with nodes 2, 4, and 7. And to my luck, node4 had become leader for the Tablet of Table t. I got a coffee, and waited until node2 was also made a follower for the tablet, and then stopped node7, the last remaining node that wasn't in the Masterlist.
The cluster then looked like this:
And the two remaining nodes where both part of the Masterlist, e.g. were running yb-master processes. And the yb-TServer processes on the two remaining nodes were both holding one copy each of the Tablet under t1:
The test-insert-loop from the observer-node was still running.
My Cluster survived even when back to just 2 nodes.
Lesson: Surprisingly Resilient.
Even with minimal nodes, this cluster had survived. However, I'm not so sure if the Cluster would be able to handle this abuse if I killed the nodes in rapid-succession, but that is something for a future test.
Also note: This only worked because the two nodes I kept running were both in the Master-List. I'm fairly sure that whenever remaining nodes in the Master-list loose quorum, the cluster will stop. Will Test Later.
------- end of blogpost ---- afterthought form previous blogpost, still Nice see. -----
Q: Can I get the listings from yb-admin and yugatool ordered by hostname, it confuses me to have to search for each IP every time to check which nodes are up or not. When I select from yb_servers() I can at least use order by, but the CLI tools dont have that option (or not that I know of).
Comments
Post a Comment