Posts

Showing posts from September, 2023

Distributed - but when do we over-react

[DRAFT!]  TL;DR: Distributing a Database helps with Sharding and Resilience. But do I really want my data spread out over 10s or 100s of "nodes" ?  Background. Been investigating Distributed Databases. Finding: Scale out means Fragmentation of Data. Adding nodes to a cluster can help to improve both  Resilience,  by having at least 2 spare copies and a voting mehanism, and  Scalability,  by adding more workers, more CPU, more memory. But the downside that I see (or imagine) is that my data gets too fragmented, too much spread out.  Solution: Shared Disks ?  For Resilience, I would prefer to keep 3, 5, or maybe even 7 copies of my data. But not more.  Too many copies mean too much overhead on writing and voting. For Sharding, I want the data spread, but not fragmented over separated storage components any more than necessary. For some "scale out" situations, I may want more CPU and more Memory, but maybe not a further fragmentation of my data ...

Testing CockroachDB, more on Resilience ...

Image
 TL;DR: I'm simulating the faillure of more than 1 node and find some weird behaviour ? I'm learning how to operate a cluster. Repair is not always completely automatic? Spoiler: The cluster can remain Resilient, but you need to keep monitoring your cluster closely. There are some things I cannot quite explain, yet.  Do I need to RTFM more ?  => Edit: Yes. See answer to question below. Background: Distributed Databases are Resilient. I'm experimenting with a distributed database, where every data-item is replicated 3x (replication factor of 3) over a 3 node cluster. Data remains "available" as long as every object has a quorum, in this case 2 votes out of 3.  From previous experiments (link), I already know that an outage of 1 node is not a problem, but an outage of 2 nodes out of 3 is fatal. In previous experiment I showed how I could "fix" the loss of a single node by adding another, fourth, node to the cluster to bring the cluster back to 3-nodes a...

Testing Resilience of CockroachDB - Replacing 1 node.

Image
 TL;DR: I'm using a multi-node cluster to see what happens if I remove and add nodes. Spoiler: it is Resilient, but within limits. The Replication-Factor determines how many nodes you can loose without impacting availability. And there is a more or less long period of vulnerability after losing a node: Some ranges will run witout their "3rd replica" for a while, which makes the system vulnerable until all the under-replicated ranges are fixed. Background: A distributed database uses Replication of data to protect against failure. In the case of cockroach, the default replication factor is 3 and it uses the raft-protocol to determine who is the "leaseholder" and whether a large-enough (majority) of replicas is available to continue, the Quorum. Cockroach also features a more or less "automatic repair" to cover situations where a node disappears. Testing: 3 nodes, a ha-proxy, and a running insert. Nodes are called roach1 ... roach3, and additional nodes...

Checking Cockroach : Distributed, Replicated, Sharded.

Image
 TL;DR: Some simple testing of data in a table to see how it is "replicated" and "sharded". Spoiler: cockroachDB replicates from the start, but it only shards (splits) when necessary. Hence on default settings you need to add about 512M to a table before any sharding (splitting) occurs. It seems to work Fine! Background: Distributed Databases, Replication and Sharding. The two different aspects covered by a Distributed Database are 1) Replication and 2) Distribution/Sharding. Replication: Offering resilience against outage (storage-, server- or network-faillure), and Sharding. Replication means there are multiple copies of the data in different storage-areas, or even in different physical locations. Sharding: Offering scalability to more-users and/or parallel processing of large sets. By having data in more manageable chunks (shards, partitions), it can be handled by multiple processes in parallel. Distribution or Sharding (can) offer both higher multi-user capacity...

Exploring Cockroach Distributed Database - 7 nodes, 9 nodes even, in containers.

Image
TL;DR: I managed to create a 7-node, and even a 9 node playground with a CockroachDB cluster running on my old macbook. There is a Lot to Explore but it looks Very Interesting. Background: "Distributed Databases". For a good while now we are looking at "serverless" databases. Either some DBaaS running at a (cloud-) provider, or a locally created cluster, which mostly run in k8s or docker. Note: Nobody seems to realise that for Truly Performant database access, the less layers, and the less chattyness, The Better. But then again, to obtain Resilience and Scale-Out: you need multi-node deployment and some form of replication, redundancy, or Sharding. To be able to properly Guide Customers in Database Deployments, I want to know how these system "Tick". I want to know what the strong- and weak points of "distributed" are. For this reason, I'm experimenting with some of the systems out there.  Setup, using containers in Docker. Note: My primitiv...

CockroachDB : Cleverly implemented Explain.

Image
TL;DR: While playing with CockroachDB, I came across this very neat feature. There is a built-in Advisor, and it can even provide you with a diagram. Not sure how good this will be on really difficult queries/problems, but the Concept is Very Neat.   Background: Still playing around with "serverless" and "distributed" databases. While I was re-editing some of my test-scripts I came across one Really Neat Trick from the CRDB explain command (cockroach doesnt accept pl-pgsql, hence need to bring everything down to straight SQL, quite a bit of rework..). Just thought I'd show the neat-part of the explain. More in-dept investigation has to wait until I know (and understand) a little more of the concepts behind cockroachDB. TestCase: Explain on a very Simple Query. Because I want to simple to see if I can really understand the basic concept and terminology, I have some very simple query-tests to start with. Mostly just to see how any RDBMS explain-facility will reac...

CockroachDB as database. First Look.

Image
 TL;DR: CockroachDB was Easy to get up and running. It has a lot of the "features" that a mature RDBMS should have, and it seems quite Resilient, and smooth at that. But it doesn't quite feel like PostgreSQL. Background Because I'm exploring database options for "serverless" and "cloud native" environments, I've done some playing around with CockroachDB (link). To keep things simple, I deployed in docker containers. Connecting was tried via the cockroach-sql client and via psql and DBeaver from my macbook. Resilience testing so far only consisted of killing nodes. Inspection of data and system was mainly via SQL and CLI.  Three Nodes, running in Docker I used this set of instructions to start a 3-node cluster using my docker-desktop installation.  What strikes me immediately: The CR container is quite small, at 500M it is about the size of the official pg container. The containers of yugabyte and orale are much bigger: 2G and 10G respectively....