Yugabyte Major Upgrade part-3: setup and preparation
[DRAFT] TL;DR: I managed to upgrade yugabyte from pg11 to pg15, using just the container images. This page describes the system and the preparations.
Spoiler: Yes! worked. But feel free to use / comment / improve on this.
Background.
As described in earlier blogposts, I am in the process of upgrading a yugabyte RDBMS from one Major Version to the next. Insert link p1, and link p2...
Previous posts describe the summary of steps (p1), and the background to some of my choices (p2).
Now let me describe how this system was built and prepared for upgrade. The next blog (p4) will contain the report of the actual upgrade.
Setup: create the containers.
Normally I try not to modify a given container-image. Depending on how "strict" you are in containerisation, I do break some of the rules. I took a container image from the official repository (link), but I modified the run-command.
My command notably allows me to issue a "yugabyted stop" without the container coming down. This way I have more liberty to "shell into the container" and run other commands from there. I also often add a few of my custom-script to the containers, and notably edit the .bash_profile and add some environment variables (scripts for sadc and housekeeping, and .profile out of scope, maybe later).
Here is my command to start the first container for YBDB, call it node2:
docker run -d --network yb_net --hostname node2 --name node2 \
-p5432:5433 -p7002:7000 -p9002:9000 \
-p12002:12000 -p13002:13000 -p13432:13433 -p15432:15433 \
-v /Users/pdvbv/yb_data/node2:/root/var \
-v /Users/pdvbv/yb_data/sa:/var/log/sa \
yugabytedb/yugabyte:2024.2.2.3-b1 tail -f /dev/null
Notice how I use containername and hostname.
The mountpoint /root/var is mapped to a data-dir on the host, and each node will have its own directory with data.
And notice how port 5432 is mapped to 5432, and port 7000 is mapped to 7002 (7000 + the the node nr). This allows me to create a number of containers (nodes, hosts) in the same docker-host or docker-desktop.
For node3 and node4, the ports will map slightly different:
docker run -d --network yb_net --hostname node3 --name node3 \
-p5433:5433 -p7003:7000 -p9003:9000 \
-p12003:12000 -p13003:13000 -p13433:13433 -p15433:15433 \
-v /Users/pdvbv/yb_data/node3:/root/var \
-v /Users/pdvbv/yb_data/sa:/var/log/sa \
yugabytedb/yugabyte:2024.2.2.3-b1 \
tail -f /dev/null
docker run -d --network yb_net --hostname node4 --name node4 \
-p5434:5433 -p7004:7000 -p9004:9000 \
-p12004:12000 -p13004:13000 -p13434:13433 -p15434:15433 \
-v /Users/pdvbv/yb_data/node4:/root/var \
-v /Users/pdvbv/yb_data/sa:/var/log/sa \
yugabytedb/yugabyte:2024.2.2.3-b1 \
tail -f /dev/null
This allows me to create a number of containers (nodes, hosts) in the same docker-host or docker-desktop, each with their set of ports and mountpoints.
The volume-mount for /var/log/sa-data is a special case. out of scope for now.
In reality, to eliminate the risk of typos, I have a script that generates and executes these commands by using variables $nodenr and $hname as parameters, to eliminate the risk of typos.
Think of these commands as starting "a cluster of small servers" with the YBDB software pre-installed and ready run.
Setup: Start the Database.
Once the containers are created I can use them to start a YBDB cluster. Before I do that I put down two flagfiles, and add some stuff to the .bash_profile (something for a later blog, but not that interesting).
Then I do the first start of the database, in this example withy 3 nodes and the default RF=3:
docker exec node2 yugabyted start --advertise_address=node2 \
--tserver_flags=flagfile=/home/yugabyte/yb_tsrv_flags.conf \
--master_flags=flagfile=/home/yugabyte/yb_mast_flags.conf
docker exec node3 yugabyted start --advertise_address=node3 --join=node2 \
--tserver_flags=flagfile=/home/yugabyte/yb_tsrv_flags.conf \
--master_flags=flagfile=/home/yugabyte/yb_mast_flags.conf
docker exec node4 yugabyted start --advertise_address=node4 --join=node2 \
--tserver_flags=flagfile=/home/yugabyte/yb_tsrv_flags.conf \
--master_flags=flagfile=/home/yugabyte/yb_mast_flags.conf
Notice I use the (optional) flagfiles to pass parameters. Yugabyted saves the configuration in a file, yubabyted.conf, and this means that on subsequent starts, I can just issue the more simple yugabyted start (without options, which make it great to run adhoc from command-line or docker-exec).
The result is a running RDBMS that I can use for OLTP. For testing I tend to run a cronjob (every x minutes) and some loop-scripts to simulate a light load. These running jobs tell me if my Zero-Downtime goal is really achieved (cronjob and scripts etc are out of scope for now..maybe later).
Of course I can generate the commands for more nodes if I need to. Hence creating a 6-node cluster was easy.
Now to prepare the major-upgrade....
Configuration: use blacklist to isolate masters.
Because the Masters, all 3 of them, have to be upgraded first before any yb-tservers are upgraded, I had to isolate the masters.
For the upgrade I created an additional 3 containers to form a cluster of 6. This allowed me choose three of them to run "master-only", and three or more different containers to run yb-tserver.
At the start, my cluster looked like this:
Notice the 3 masters, and 6 storage nodes. The SST files, and thus the storage-load are evenly distributed over all nodes.
Next, I blacklist the first 3 nodes:
yb-admin -master_addresses $MASTERS change_blacklist ADD node2
Repeat for nodes 3 and 4... then wait a few min for storage to re-distribute.. And after some time, the overview of yb-tservers lookse like this:
Notice how now the nodes 2-3-4 do no longer contain any stored data? Removal of any of the yb-tservers on those nodes will not cause under-replication or outage of storage, hence those node can now run master-only.
Scripts: to start master-only.
I generated the script to start the master-nodes, and stored it on the mounted volume, next to the yugabyted.conf file. This way, it will survive the replacement of the container-imges and be mounted on the new containers, ready for use.
ps -ef | grep master > /root/var/conf/runm.sh
This saves a rather long commandline into a scriptfile. I edited the script to add nohup so it looks like:
nohup <looong commandline> &
Repeat this step for node3 and node4 (and test!). You will need the master-start script there to restart the master when the container/software is upgraded.
To test this script, stop the software on node2:
yugabyted stop
Then start the master (old version) using the script and check everything to see if the master has actually re-joined the cluster:
Notice: we have again 3 masters, one of which was recently started (manually via the script).
Notice: we have 1 tserver with status DEAD because we didnt re-start the tserver on node2.
Notice: we still have node3 and node4 with no SST files. Because nodes 2, 3, and 3 are blacklisted and (verified!).Those nodes do not contain tablets, thus we can afford to have all three of those tservers down and not have under-replication.
Ideally, we stop yugabyte and use the script to start master-process on node3 and node4 as well once the software/container for those is replaced (just to know the script works...)
I would also check if the parameter for major_upgrade (contained in the flagfiles!) is correctly set in all components, for example by using:
I would normally have a loop to check this on all nodes and all components. But notice for example that the parameter gets picked up automatically on every (re)start becasue it is saved in the flagfile, and all components are always (re)started using this flagfile. For those components that do not have the parameter set, you can also set/insert the parameter by using yb-ts-cli set_flag .
(Yes, I do check / monitor / test / verify a lot: In DBA-sysadmin-world, the paranoid survive... ).
If all of the above is correct, and if your RDBMS is still processing SQL (you checked, right? - but it wont do DDL), you are ready to upgrade...
OK - Let's Go.
Next, we do the upgrade by replacing the containers, one by one ... (blog p4)
Appendix: some Additional suggestions:
From reading and scripting this upgrade, I have a few suggestion.
S1 : Extra Master nodes, reduce data-redistribution.
If you have a cluster with data stored over N nodes, you may want to add just 3 nodes and use blacklist and change_master_config to designate those nodes as masters. This will save on the time it takes to re-distribute the storage-load away from the master-only nodes. After the upgrade, you can migrate the master-processes away from the additional nodes, and remove them from the cluster. To me, this seemed to to be more complicated, as the ADD/REMOVE of master-nodes includes quite some manual work with yb-admin. I'm lazy.
S2: yugabyted start --master_only
As I didnt like the long command-line to start the master-processes. Hence I would like to have the option in yugabyted (the py script) to just start a master. It would avoid the need for a separate script to start the master-process. And by using the options stored in yugbyted.conf, it would reduce the risk of typos in parameters. However, the yugabyted-script was intended for "simple use" and already has sufficient options + complications as it is.
S3: list_all_tablet_servers: Order by host.
The result of the command is sometimes confusing, as the order of the tablet_servers seems to be random. Hence node2 can be top, middle of last in the list, and the location is not consistent.
S4: Promote the use of flagfiles over command-line parameters.
I found it safer to put the major_version parameter in the flagfile, as my regular start-command (yugabyted start) would not automatically "keep" the parameter over restarts. Similar, using the flagfile for starting the master also gave me that safety. The injection via yb-ts-cli set_flag tended to get overlooked. The CLI option is good to have, but more tricky in daily (manual) usage.
In general, I think readable parameter-files are a better way to set and keep parameters than to use long (scripted) CLI-commandlines.
-- definite end of blogpost p3 ---
Comments
Post a Comment