Posts

Showing posts with the label Distributed Databases

Checking Cockroach : Distributed, Replicated, Sharded.

Image
 TL;DR: Some simple testing of data in a table to see how it is "replicated" and "sharded". Spoiler: cockroachDB replicates from the start, but it only shards (splits) when necessary. Hence on default settings you need to add about 512M to a table before any sharding (splitting) occurs. It seems to work Fine! Background: Distributed Databases, Replication and Sharding. The two different aspects covered by a Distributed Database are 1) Replication and 2) Distribution/Sharding. Replication: Offering resilience against outage (storage-, server- or network-faillure), and Sharding. Replication means there are multiple copies of the data in different storage-areas, or even in different physical locations. Sharding: Offering scalability to more-users and/or parallel processing of large sets. By having data in more manageable chunks (shards, partitions), it can be handled by multiple processes in parallel. Distribution or Sharding (can) offer both higher multi-user capacity...

yugbyte IO monitoring and load-balance verification.

Image
TL;DR: I've experimented with a the running of a "Distributed Database", and concluded that the load indeed gets distributed over several nodes. I'm also discovering how to use the tools. And I "Double Check" as much of the information as I can. There are always some Surprises, and Questions... Background The nature of Distributed is to Spread the load. I'm going to try and observer some of that distributed load. Currently still mainly in the Explore and Verify stage of my discovery. I'm deliberately going to cause what is probably an unbalanced-load, and see if I can find + fix that.  Note that my setup is still primitive: my nodes are docker-containers running the template-downloadable image on a macbook (described here) . I am currently looking to find and test the Concepts and Principles behind yugabyte, rather than doing real-world testing.   Tooling and monitoring: yb-admin, yugatool, dsar: I built a primitive inserter to connect and store 1 re...

yugabyte : Finding the data, in a truly Distributed Database

Image
 TL;DR: I dig deeper into yugabyte to find/verify how the various components work, and where the data is kept. I find, with a little digging, that the data is Indeed Distributed and Replicated. Nice! Background In a previous post, I did some RTFM and identified various components that make up a running yugabyte "universe" or a cluster. Now it is time to find the actual data, and explore some of the tools.  hidden-agenda, wishlist: I would really like to be able to "query" this data straight from psql or cql, but that would require additions by yugabyte similar to how postgres exposes its catalog. In a distributed  environment, this is more of a challenge than in a "monolith" where everything is nicely in 1 place. Setup and Tools My "cluster" consists of 4 to 6 nodes running in docker. The setup is described in an earlier blogpost, and was a good starting point. But by now I begin to see I might want to use specific nodes for master and for tserve...

Distributed databases, how many shards and where are they

Image
[+/- Draft! Beware: Work In Progress] TL;DR: Trying to find out how to shard data on Yugabyte (link). I find a lot of "moving parts" in the YB-Universe, and try to explain and simplify. For Deep-Down-Engineers, YB-ers: check the questions at bottom. Background Future of Databases is ... Serverless, possibly sharded. But Sharding is something for Very Large sets. An average (serverless) database that comes form "the real world"  doesnt need 1000s of shards... IIMHO, it needs "ACID" and Relational Capabilities First. Yugabyte does this, and potentially "serverless" to make the database more Resilient, and more Scalable (on demand-scaling?) and overall Easier to Operate. By experimenting with a 6-node database, I try to observe the sharding, and might try to draw some conclusions or "good practices" from what I see. My "cluster" is running in docker-containers hence K8s or other container-systems will also work. After the first e...

Distributed data(base), some simple experiments.

Image
 TL;DR: in distributed databases (Example: Yugabyte), it helps to know how to define your tables. The default behaviour is Optimised (sharded, distributed) for Very Large Tables. But Small tables also need attention. Too Much of a Good Thing...  Background Distributed Databases are The Future. That is why I began to experiment with Yugabyte. I managed to create a 6-node (yes, Six Nodes) cluster in no time . And because Yugabyte is fully Postgres Compatible my good-old pg-scripts work straight away.  From the install-story, I found that by default all my tables seem sharded over 6 tablets and that was something I wanted to investigate futher. So, Let's Play.... The Demo I needed some demo-tables first. With YB comes the "northwind" demo (link) . This demo was promptly installed from the command-prompt from any of the nodes (regardless of which node: in my case they are all equal). I shell-ed into the container, and typed # Yugabyted demo connect And there it was. I also us...