Query optimizer not using compound index

I have a fairly simple table with three fields, all three of which are part of a compound NOTUNIQUE SBTREE index. For sake of example, let’s say these fields are x, y, and t, where x and y are STRINGs and t is a LONG.

This statement will instantly return a record (it appears to be the first matching record, but since I’m not supplying an ORDER BY clause, I believe I can’t depend on that):

SELECT FROM table WHERE x = 'a' AND y = 'b' LIMIT 1

Doing an EXPLAIN shows that it’s using the index and only reading one record.

But if I do this instead:

SELECT FROM table WHERE x = 'a' AND y = 'b' ORDER BY t LIMIT 1

The query takes forever and reads all 600,000+ records in the table (that match x and y).

My real objective is this (fetching the “last” record based on the value of t):

SELECT FROM table WHERE x = 'a' AND y = 'b' ORDER BY t DESC LIMIT 1

But that behaves the same way as the first ORDER BY query. Adding the DESC does indeed return the last record that matches x and y, but it does so by doing a table scan.

I’ve also tried selecting directly from the index and I’ve tried using MATCH, but neither of those approaches were successful, because there does not appear to be a way to control the ORDER, so as to get the last record instead of the first.

My ultimate objective is to quickly fetch the last record in this table, knowing x and y, but not knowing t. I’ve found several queries that produce that result, but all of them are essentially doing a table scan, which at several minutes in duration, I can’t afford.

Am I missing something here, or is there no way to quickly find the last record in a compound index?

Hi @eric24

Can you please provide a sample SQL script (schema and data creation) to reproduce the problem?

Thanks

Luigi

@luigidellaquila - Sorry for the delay, as I was traveling last week. Here is the SQL to create and test this issue.

SQL batch to create the test table, index, and a few records:

DROP CLASS xTestTable IF EXISTS UNSAFE;
CREATE CLASS xTestTable CLUSTERS 2;
CREATE PROPERTY xTestTable.x STRING (MANDATORY TRUE);
CREATE PROPERTY xTestTable.y STRING (MANDATORY TRUE);
CREATE PROPERTY xTestTable.t LONG (MANDATORY TRUE);
CREATE INDEX xIndex ON xTestTable (x, y, t) NOTUNIQUE;
INSERT INTO xTestTable CONTENT {"x": "a", "y":"b", "t": 101};
INSERT INTO xTestTable CONTENT {"x": "a", "y":"b", "t": 102};
INSERT INTO xTestTable CONTENT {"x": "a", "y":"b", "t": 103};
INSERT INTO xTestTable CONTENT {"x": "a", "y":"b", "t": 104};
INSERT INTO xTestTable CONTENT {"x": "a", "y":"b", "t": 105};
INSERT INTO xTestTable CONTENT {"x": "a", "y":"b", "t": 106};
INSERT INTO xTestTable CONTENT {"x": "a", "y":"b", "t": 107};
INSERT INTO xTestTable CONTENT {"x": "a", "y":"b", "t": 108};

First query:

SELECT FROM xTestTable WHERE x = 'a' AND y = 'b' LIMIT 1

Result:

{
            "@type": "d",
            "@version": 0,
            "documentReads": 1,
            "fullySortedByIndex": false,
            "documentAnalyzedCompatibleClass": 1,
            "recordReads": 1,
            "fetchingFromTargetElapsed": 0,
            "indexIsUsedInOrderBy": false,
            "compositeIndexUsed": 1,
            "current": "#61:0",
            "involvedIndexes": [
                "xIndex"
            ],
            "limit": 1,
            "evaluated": 1,
            "user": "#5:0",
            "elapsed": 0.132182,
            "resultType": "collection",
            "resultSize": 1,
            "@fieldTypes": "documentReads=l,documentAnalyzedCompatibleClass=l,recordReads=l,fetchingFromTargetElapsed=l,compositeIndexUsed=l,current=x,involvedIndexes=e,evaluated=l,user=x,elapsed=f"
        }

Second query:

SELECT FROM xTestTable WHERE x = 'a' AND y = 'b' ORDER BY t LIMIT 1

Result:

{
            "@type": "d",
            "@version": 0,
            "documentReads": 8,
            "fullySortedByIndex": false,
            "documentAnalyzedCompatibleClass": 8,
            "recordReads": 8,
            "fetchingFromTargetElapsed": 0,
            "indexIsUsedInOrderBy": false,
            "compositeIndexUsed": 1,
            "current": "#64:3",
            "involvedIndexes": [
                "xIndex"
            ],
            "limit": 1,
            "orderByElapsed": 0,
            "evaluated": 8,
            "user": "#5:0",
            "elapsed": 0.214341,
            "resultType": "collection",
            "resultSize": 1,
            "@fieldTypes": "documentReads=l,documentAnalyzedCompatibleClass=l,recordReads=l,fetchingFromTargetElapsed=l,compositeIndexUsed=l,current=x,involvedIndexes=e,orderByElapsed=l,evaluated=l,user=x,elapsed=f"
        }

Notice that with the addition of the ORDER BY in the second query, all 8 records are read, and while the index is “involved”, it is not used in the ORDER BY nor is it fully sorted by the index.

Adding ORDER BY t DESC makes no difference (the query strategy is the same). And this was my original objective–to be able to quickly fetch the last entry in the index.

Note that I’ve also tried querying the index itself, but I also get the same results.

I’ve also noticed that in general, compound indexes aren’t used as efficiently as expected. On a much larger table, which has other fields as well, SELECT FROM xTestTable WHERE x = 'a' AND y = 'b' returns 32 records (confirmed by the EXPLAIN), but SELECT FROM xTestTable WHERE x = 'a' AND y = 'b' AND 'q' = 15 reads 642 records. I realize that since “q” isn’t part of the compound index, that ODB will have to scan for this field, but I would expect it to only scan a maximum of 32 records, since that is the total number with matching “x” and “y” values (the first two fields in the compound index).

PS - I’m using version 2.2.29, but I’m going to try the same test described above on 3.x.

Hmm. Running EXPLAIN on 3.0.18 on either query (with or without the ORDER BY or the DESC) returns this:

+ FETCH FROM INDEX xIndex
  x = 'a' AND y = 'b'
+ EXTRACT VALUE FROM INDEX ENTRY
  filtering clusters [16,15]
+ FILTER ITEMS BY CLASS 
  xTestTable
+ LIMIT ( LIMIT 1)

I’m not sure how to interpret this output. I’ve tried both EXPLAIN and PROFILE, but I don’t see any information on records read or time elapsed.

NOTE: We will be moving to 3.x soon, but I’m still hoping to get 2.2.x to work for this query (indexed ORDER BY DESC) in the meantime.

@luigidellaquila - Any thoughts on this?

Hi @eric24

V 2.2 query executor is definitely tricky, so we will hardly have a fix for this in that version.
About v 3.0, it is acting as expected, ie. it is using the index for searching

+ FETCH FROM INDEX xIndex
  x = 'a' AND y = 'b'

and it is not adding further ORDER BY query execution steps, so it’s implicitly using the index also for sorting.

There is a bug in v 3.0.18 for query PROFILE, this is why you do not see the execution time statistics. It’s already fixed in v 3.0.22, so I’d suggest to upgrade

Thanks

Luigi