CockroachDB : Cleverly implemented Explain.

TL;DR: While playing with CockroachDB, I came across this very neat feature. There is a built-in Advisor, and it can even provide you with a diagram.

Not sure how good this will be on really difficult queries/problems, but the Concept is Very Neat.  


Background:

Still playing around with "serverless" and "distributed" databases. While I was re-editing some of my test-scripts I came across one Really Neat Trick from the CRDB explain command (cockroach doesnt accept pl-pgsql, hence need to bring everything down to straight SQL, quite a bit of rework..).

Just thought I'd show the neat-part of the explain. More in-dept investigation has to wait until I know (and understand) a little more of the concepts behind cockroachDB.


TestCase: Explain on a very Simple Query.

Because I want to simple to see if I can really understand the basic concept and terminology, I have some very simple query-tests to start with. Mostly just to see how any RDBMS explain-facility will react to basic problems.

Table t01 looks like this:

The table has an id (sequenced) PK, and some columns: date, numeric, text, that I can use for simple searches, possibly add some indexes to test the reactions. Very Basic. I tend to put 1, 2, 10 or 100 or 1000s of rows in and then just see what explain tells me. 

In this case, I did select with a few where clauses and asked for "Explain" to see what I could learn from the work that supposedly goes on in the back of the engine(s).

Check the explain output for a date-range condition:

When selecting on the date-field, dt, the first thing the "explain" told me was that it did a "FULL SCAN", and that I could add an index (as if I didnt know...). It was even aware of the amount: adding amount to the index would make it a covering index.

I took the advice, but only some of it. 

I also had the good-fortune that an old colleague told me to to RTFM the explain-syntax from the cockroach site. And my next attempt was:

1) Add the index, but not quite the one it had advised, and 

2) Ask for analyze-distsql (distributed sql, e.g. what a distributed database is supposed to do). That gave me a heap of interesting information:

Impressive. We notice the SQL-engine now tells us the following: 

1- It does a scan of the index

2- It does an index join to the table (I deliberately didn't include the amount in the index to see what would happen).

3- It groups the data (e.g. calculate the sum)

All that was more or less expected. But the surprise was in nr 4...


4-it produces an URL called "Diagram".

A URL is course, wants to go on the internet. So.. I bravely cp-paste the URL (link here?) into a browser and Voila:


A simple diagram and some more information about my SQL. This looks useful, both to help my simple query and to allow me a small peek inside the "distributed" part of cockroach.

Quite a nice Feature. I need to try out with a few more complicated joins...

And I wonder what else I will find when I get the occasion to dig into crdb. Maybe the typing and tweaking to convert my test-scripts to cockroach might be worth it.. 


Update: It gets Better: I managed to enter about a million rows in the table (patience, macbook...), and at sizes of 512M the CRDB engine decides to "split ranges". After two such splits (e.g. I now have 3 ranges, 3 "shards" if you like) the data is nicely divided over my multi-node cluster, and a Full Scan (count query, no index), suddenly looks like this:

The CRDB engine (and its query-optimiser) have decided they should divide this work over 3 nodes,  they probably call it "pushing down the predicate", and the explain-plan shows it. This is beginning to look like a "distributed" system, and the diagram shows this. Very Neat!


Lesson:  Keep Testing, Tweaking, and RTFM.

Glad I took the time to RTFM. There is always a next Cool Thing you dont know yet... And this is one thing I would recommend to everyone, and to other RDBMS-vendors and builders. 

I can just imagine some PostgreSQL DBAs (greybeards and rookies alike) ding a pg_dump, read some of the data into cockroach, and explain a series of SQLs just to see what comes out.

Caveat1: I havent tried any really complicated SQL yet. And I suspect, from what I know of "distributed databases" that some of the advice may be biased towards either large-sets or sharded-tables and overloaded (covering) indexes. 

Caveat2: By giving that URL to the internet, you Will be Sending some of your "information" out into the world-wide-web. I can imagine cockroach or others just sitting there, laughing at "Look what ridiculous SQL we got today".

(What Database Folks do when they see Dev-written SQL...)

But so far, Kuddos to cockroach for this neat feature.


---- end of Blogpost --- notes, questions below ---- 


Maybe time for one of my favorite memes..











Comments

Popular posts from this blog

yugabyte : Finding the data, in a truly Distributed Database

Testing Resilience of CockroachDB - Replacing 1 node.

yugabyte terminology and related information