Yugabyte - Testing: a 7 node cluster can survive on 2 nodes.

 TL;DR: Just playing: I reduced a 7 node cluster back to 2 nodes. It Still runs!


Background: 

For previous experiments, I was setting up my 7 node test-system, and did some careful checking with bringing down nodes (link). At the end of that test I had 4 of the 7 nodes still running.  At the end of my test, yugatool cluster_info looked like this, with nodes 3, 6, and 5 no longer alive:

After write-up of the previous test, I had "forgotten" my other 4 nodes were still running. When I noticed the terminal window with yugatool-display, I just wanted to see how far I could get...


Just Playing: removing nodes...

So I had a cluster, with 4 remaining nodes. And I also knew that my critical test-table was held in a single tablet, replicated over node4, node8 and node7. Re-confirmed by this screenshot of previous blog, showing where the (single) Tablet of  Table t is replicated:


From what I suspected, the nodes 2 and 4 had to stay up to keep a majority (quorum) of yb-Master-processes, minimum of 2 in an RF=3 cluster. But node7 and node8 were only there to support yb-TServer-processes.

I started by sending stop to node8: this would cause a hiccup in the inserts and hopefully prompt the Tablet to elect a new leader. 

That worked. Nothing broke. Inserts kept looping at about 1/sec, as designed.

Then I was left with nodes 2, 4, and 7. And to my luck, node4 had become leader for the Tablet of Table t. I got a coffee, and waited until node2 was also made a follower for the tablet, and then stopped node7, the last remaining node that wasn't in the Masterlist. 

The cluster then looked like this:

And the two remaining nodes where both part of the Masterlist, e.g. were running yb-master processes. And the yb-TServer processes on the two remaining nodes were both holding one copy each of the Tablet under t1:

The test-insert-loop from the observer-node was still running. 

My Cluster survived even when back to just 2 nodes. 


Lesson: Surprisingly Resilient.

Even with minimal nodes, this cluster had survived. However, I'm not so sure if the Cluster would be able to handle this abuse if I killed the nodes in rapid-succession, but that is something for a future test.

Also note: This only worked because the two nodes I kept running were both in the Master-List. I'm fairly sure that whenever remaining nodes in the Master-list loose quorum, the cluster will stop. Will Test Later.


------- end of blogpost ---- afterthought form previous blogpost, still Nice see. ----- 


Q: Can I get the listings from yb-admin and yugatool ordered by hostname, it confuses me to have to search for each IP every time to check which nodes are up or not. When I select from yb_servers() I can at least use order by, but the CLI tools dont have that option (or not that I know of).



Comments

Popular posts from this blog

yugabyte : Finding the data, in a truly Distributed Database

Testing Resilience of CockroachDB - Replacing 1 node.

yugabyte terminology and related information