CockroachDB : Cleverly implemented Explain.
TL;DR: While playing with CockroachDB, I came across this very neat feature. There is a built-in Advisor, and it can even provide you with a diagram.
Not sure how good this will be on really difficult queries/problems, but the Concept is Very Neat.
Background:
Still playing around with "serverless" and "distributed" databases. While I was re-editing some of my test-scripts I came across one Really Neat Trick from the CRDB explain command (cockroach doesnt accept pl-pgsql, hence need to bring everything down to straight SQL, quite a bit of rework..).
Just thought I'd show the neat-part of the explain. More in-dept investigation has to wait until I know (and understand) a little more of the concepts behind cockroachDB.
TestCase: Explain on a very Simple Query.
Because I want to simple to see if I can really understand the basic concept and terminology, I have some very simple query-tests to start with. Mostly just to see how any RDBMS explain-facility will react to basic problems.
Table t01 looks like this:
The table has an id (sequenced) PK, and some columns: date, numeric, text, that I can use for simple searches, possibly add some indexes to test the reactions. Very Basic. I tend to put 1, 2, 10 or 100 or 1000s of rows in and then just see what explain tells me.
In this case, I did select with a few where clauses and asked for "Explain" to see what I could learn from the work that supposedly goes on in the back of the engine(s).
Check the explain output for a date-range condition:
When selecting on the date-field, dt, the first thing the "explain" told me was that it did a "FULL SCAN", and that I could add an index (as if I didnt know...). It was even aware of the amount: adding amount to the index would make it a covering index.
I took the advice, but only some of it.
I also had the good-fortune that an old colleague told me to to RTFM the explain-syntax from the cockroach site. And my next attempt was:
1) Add the index, but not quite the one it had advised, and
2) Ask for analyze-distsql (distributed sql, e.g. what a distributed database is supposed to do). That gave me a heap of interesting information:
Impressive. We notice the SQL-engine now tells us the following:
1- It does a scan of the index
2- It does an index join to the table (I deliberately didn't include the amount in the index to see what would happen).
3- It groups the data (e.g. calculate the sum)
All that was more or less expected. But the surprise was in nr 4...
4-it produces an URL called "Diagram".
A URL is course, wants to go on the internet. So.. I bravely cp-paste the URL (link here?) into a browser and Voila:
A simple diagram and some more information about my SQL. This looks useful, both to help my simple query and to allow me a small peek inside the "distributed" part of cockroach.
Quite a nice Feature. I need to try out with a few more complicated joins...
And I wonder what else I will find when I get the occasion to dig into crdb. Maybe the typing and tweaking to convert my test-scripts to cockroach might be worth it..
Update: It gets Better: I managed to enter about a million rows in the table (patience, macbook...), and at sizes of 512M the CRDB engine decides to "split ranges". After two such splits (e.g. I now have 3 ranges, 3 "shards" if you like) the data is nicely divided over my multi-node cluster, and a Full Scan (count query, no index), suddenly looks like this:
The CRDB engine (and its query-optimiser) have decided they should divide this work over 3 nodes, they probably call it "pushing down the predicate", and the explain-plan shows it. This is beginning to look like a "distributed" system, and the diagram shows this. Very Neat!
Lesson: Keep Testing, Tweaking, and RTFM.
Glad I took the time to RTFM. There is always a next Cool Thing you dont know yet... And this is one thing I would recommend to everyone, and to other RDBMS-vendors and builders.
I can just imagine some PostgreSQL DBAs (greybeards and rookies alike) ding a pg_dump, read some of the data into cockroach, and explain a series of SQLs just to see what comes out.
Caveat1: I havent tried any really complicated SQL yet. And I suspect, from what I know of "distributed databases" that some of the advice may be biased towards either large-sets or sharded-tables and overloaded (covering) indexes.
Caveat2: By giving that URL to the internet, you Will be Sending some of your "information" out into the world-wide-web. I can imagine cockroach or others just sitting there, laughing at "Look what ridiculous SQL we got today".
But so far, Kuddos to cockroach for this neat feature.
---- end of Blogpost --- notes, questions below ----
Maybe time for one of my favorite memes..
Comments
Post a Comment