Distributed - but when do we over-react
[DRAFT!] TL;DR: Distributing a Database helps with Sharding and Resilience. But do I really want my data spread out over 10s or 100s of "nodes" ? Background. Been investigating Distributed Databases. Finding: Scale out means Fragmentation of Data. Adding nodes to a cluster can help to improve both Resilience, by having at least 2 spare copies and a voting mehanism, and Scalability, by adding more workers, more CPU, more memory. But the downside that I see (or imagine) is that my data gets too fragmented, too much spread out. Solution: Shared Disks ? For Resilience, I would prefer to keep 3, 5, or maybe even 7 copies of my data. But not more. Too many copies mean too much overhead on writing and voting. For Sharding, I want the data spread, but not fragmented over separated storage components any more than necessary. For some "scale out" situations, I may want more CPU and more Memory, but maybe not a further fragmentation of my data ...