Record Pagination using multiple-column sort

#1

The docs show a simple example of using RID-LIMIT for pagination:
https://orientdb.com/docs/2.2.x/Pagination.html

That works because the default sort order is the RID. This concept is also easily extended to sort by any other field (by including an ORDER BY clause) instead of the RID (although that field must be unique for this to work properly). But what if the “unique” field is actually two (or more fields)? Just including these fields in the ORDER BY and in the WHERE clause gives unexpected results, because the “>” comparison is considered separately on each field.

So, is there a way to specify that the ORDER BY and the field comparisons in the WHERE clause are considered “together” (i.e. as a compound field index)?

#2

Sometimes by formulating a question, the answer is evident.

You answered your question implicitly: »But what if the “unique” field is actually two (or more fields)«

Yust apply an index to order_by

#3

Yes, I tried doing that, but it doesn’t work any better. The problem is that a paginated query necessarily must start with a query that finds the first desired record, then go subsequent queries with a slightly modified WHERE clause that attempts to start the next query at the record following the last record from the previous result “page”. Doing this using any single unique field works exactly as expected:

  • First query: SELECT FROM Table WHERE key >= 100 LIMIT 20
  • Second query: SELECT FROM Table WHERE key >= 120 LIMIT 20
  • Third query: SELECT FROM Table WHERE key >= 140 LIMIT 20

This example assumes we are sorting/paging using a unique (and preferably indexed) field called “key” that contains numeric values from 100 to 999. But if you have two (or more) field that make up the unique “key”, both (all) have to be included in the WHERE clause:

  • First query: SELECT FROM Table WHERE a >= 10 AND b >= 100 LIMIT 20
  • Second query: SELECT FROM Table WHERE a >= 12 AND b >= 107 LIMIT 20
  • Third query: SELECT FROM Table WHERE a >= 13 AND b >= 122 LIMIT 20

In this example, the unique key must included fields “a” and “b”. As each set of 20 records is returned, we have a new value for “a” and “b” (just like we did for “key” in the first example), but the subsequent queries don’t start and the “next” record. I tried this roughly as shown and by querying directly on the compound index that includes fields “a” and “b”. Both methods produce the same (wrong) results.

I believe the reason is that the WHERE clause is satisfied by “either” field comparison being true, rather than “both”. If I modify the two-field example so that “a” is a fixed value (i.e. we’re only looking for records where a=10 and only paging on “b”), then it works again, but of course we’re back to paging on a single field.

I feel like the WHERE clause that does “greater-than-or-equal” comparisons on the two fields isn’t behaving like it should.

Bottom line, I’m looking for an example of doing pagination with more than one unique field. Maybe there’s a different approach, but none of the methods that work for a single field have worked for multiple fields.

#4

Have you tried the »RID-Limit« effort noted in the documentation?

#5

Yes, but in order for RID-LIMIT to work, you have to be sorting by the RID (or not sorting, where the default record order is by incrementing RID value). If you sort by any other field (or set of fields), then including “@RID > x” doesn’t mean anything, since the returned records so far are not in RID order.

PS - The SKIP-LIMIT method does work regardless of the sort, but it’s completely unusable for large record sets due to its poor performance (exactly the place you’d generally want to use pagination).

#6

Hi Eric,
did some experiments.

Hope you understand some ruby

pagination excersise
22.04.(19:09:40) INFO->Connected to database temp
22.04.(19:09:40) INFO->CREATE CLASS E 
22.04.(19:09:40) INFO->CREATE CLASS V 
22.04.(19:09:40) INFO->CREATE CLASS pagination EXTENDS V
22.04.(19:14:29) INFO-> CREATE PROPERTY pagination.col1 STRING 
22.04.(19:14:29) INFO-> CREATE PROPERTY pagination.col2 INTEGER 
22.04.(19:14:29) INFO-> CREATE PROPERTY pagination.col3 STRING 
22.04.(19:14:29) INFO-> CREATE PROPERTY pagination.col4 INTEGER 
22.04.(19:14:29) INFO->CREATE INDEX composite ON pagination(col1, col2, col3) DICTIONARY
22.04.(19:14:29) INFO->Index on pagination based on composite created.

thats the database

to fill with data

def initialize_pagination	
  generate_word = -> { ('a'..'z').to_a.shuffle[0,5].join }
  generate_number = -> { rand 999 }

 (1 .. 200).each do |i|
	  Pagination.create col1: generate_word[], 
			col2: generate_number[],
			col3: generate_word[],
			col4: i
  end
end

To query

def paginate last_result =  []
   start = if last_result.empty?
	     "#-1:-1"
	   else 
		last_result.last.to_orient
	   end

Pagination.query_database " select from ( select from pagination  order by col2 ) where @rid > #{start} limit 10" 

end

This are the results

22.04.(19:20:18) INFO-> select from ( select from pagination  order by col2 ) where @rid > #-1:-1 limit 10

first list of returnd rid’s, then returned col2 (part of index and ordered column)

["#30:11", "#31:5", "#27:19", "#27:1", "#26:11", "#27:12", "#29:17", "#32:6", "#29:20", "#27:17"]
[1, 6, 8, 16, 20, 21, 32, 39, 48, 58]

second call --> paginate( paginate )

22.04.(19:20:18) INFO-> select from ( select from pagination  order by col2 ) where @rid > #-1:-1 limit 10
22.04.(19:20:18) INFO-> select from ( select from pagination  order by col2 ) where @rid > #27:17 limit 10
["#30:11", "#31:5", "#27:19", "#29:17", "#32:6", "#29:20", "#31:24", "#29:2", "#27:23", "#32:18"]
[1, 6, 8, 32, 39, 48, 76, 84, 95, 95]

Surprisingly, a completely different result - but its starting exactly as the first attempt

Then a third round: ( paginate ( paginate (paginate)))

22.04.(19:20:18) INFO-> select from ( select from pagination  order by col2 ) where @rid > #-1:-1 limit 10
22.04.(19:20:18) INFO-> select from ( select from pagination  order by col2 ) where @rid > #27:17 limit 10
22.04.(19:20:18) INFO-> select from ( select from pagination  order by col2 ) where @rid > #32:18 limit 10
[157, 258, 420, 497, 693, 928]
["#32:24", "#32:23", "#32:22", "#32:20", "#32:19", "#32:21"]

Now, only 6 records are returned.

Started the test several times. The results are similar

Another run:

22.04.(19:30:05) INFO-> select from ( select from pagination  order by col2 ) where @rid > #-1:-1 limit 10
["#29:18", "#31:18", "#30:9", "#30:22", "#32:1", "#30:13", "#30:2", "#26:18", "#29:16", "#28:15"]
[2, 2, 4, 18, 26, 31, 47, 51, 58, 65]
22.04.(19:30:05) INFO-> select from ( select from pagination  order by col2 ) where @rid > #-1:-1 limit 10
22.04.(19:30:05) INFO-> select from ( select from pagination  order by col2 ) where @rid > #28:15 limit 10
["#29:18", "#31:18", "#30:9", "#30:22", "#32:1", "#30:13", "#30:2", "#29:16", "#28:19", "#32:13"]
[2, 2, 4, 18, 26, 31, 47, 58, 65, 68]
22.04.(19:30:05) INFO-> select from ( select from pagination  order by col2 ) where @rid > #-1:-1 limit 10
22.04.(19:30:05) INFO-> select from ( select from pagination  order by col2 ) where @rid > #28:15 limit 10
22.04.(19:30:05) INFO-> select from ( select from pagination  order by col2 ) where @rid > #32:13 limit 10
[70, 81, 151, 260, 299, 375, 443, 500, 515, 843]
["#32:23", "#32:15", "#32:24", "#32:17", "#32:14", "#32:20", "#32:21", "#32:22", "#32:16", "#32:18"]

Its definitely no reliable pagination.

I wonder, what causes the missleading results.

(if needed, I can provide the rspec-testfile suitable for ActiveOrient)

#7

I’m believe the nested select may be part of the problem. Try without, as in:
select from pagination where @rid > #-1:-1 limit 10

This should return the first 10 rows, in RID order (which is also the default sort order if none is specified). Your second select can use the last RID returned. I would expect this to work. You can also do this:
select from pagination where col2 > 0 order by col2 limit 10

I would also expect this to work, because it uses col2 in the where and order clauses (same as using @rid), and is still only referring to a single property.

But if you try this using two properties (both included in the where and order clauses, with or without a compound index), it will not return useful data (in fact, very similar to what you’re seeing from your tests).

#8

The subquery aimes to respect the given index. You can order the columns the way you want and operate with the result-set. This is subject for pagination and should work as described in the docs. That was the plan.

However, changing the scope, ie dropping the subquery layer does not change anything.

#9

Interesting. So two questions:

  1. Does the subquery approach not run the entire subquery first and then select a subset of it? Or is the query optimizer smart enough to reduce the number of reads performed by the subquery? If not, then this would seem to be a very expensive paging method for large numbers of records.

  2. Based on what you’re seeing with either query syntax, do you agree that pagination is “broken”?

PS - I get identical results with ODB 2.2.x and 3.0.x.

#10

Hi,
theoretically, one could query the index.

Select from INDEX::composite where @rid > #-1:-1 (and so on)

Unfortunately, I was not able to create any valid query of this type

#11

Yes, I tried querying on the composite index too. It is my understanding that the only where clause options you have on a compound key is “where key between […] and […]” or “where key = […]”. For pagination (where you really need “>”), I tried using “between” with the lower bounds of the key set to a known starting point and the upper bounds set to a known set of “max values”; then with each subsequent call updating the lower bounds based on the “next” record’s values (i.e. if my desired page size is 10, I actually fetched 11 records, dropped the 11th record from the result but used it for my updated lower-bounds key values). Unfortunately, this produces the same kind of results you mentioned earlier. It is my belief that the where clause is evaluated incorrectly, in that it returns a “hit” on the first partial match of the composite key, rather than evaluating the entire key.

Note that the use of @rid her as the “pagination pointer” won’t work here either, because the RIDs returned will (or should) be in the order of the composite index, not sequential RID order.

1 Like
#12

May be due to this bug :slight_smile:

#13

I don’t think so. While I’m not familiar with the Java driver that is used in the bug report, the code example doesn’t appear o do anything with the RID (i.e. WHERE @rid > #n:m). So unless that’s being done internally to the driver, I wouldn’t expect the example given to work.

What I’ve seen, using OrientJS under node.js, is that pagination works fine, as long as you include the field you are paginating by (any unique field, not just @rid) in the WHERE and ORDER BY clauses. The problem I’m seeing is more related to composite indexes than pagination (although the desired end-goal is to use more than one field for pagination, which is where a composite index that contains those fields comes in).

#14

@luigidellaquila: Do you have any thoughts on this? I’m at a dead end without being able to paginate using more than one field in the WHERE and ORDER BY. I realize I could use SKIP/LIMIT, but the performance of that approach is unworkable for large numbers of records.

#15

@wolf4ood: Do you have any thoughts on this? Am I completely missing something? It seems like a pretty serious functional gap to only be able to efficiently paginate (i.e. not using SKIP/LIMIT) on a single field.

#16

Hi @eric24

I see you already considered the usage of composite indexes, that is the natural way of doing it.
You just have to take into consideration how keys are sorted in the indexes and what kind of “efficient” queries you can run on them.

The keys are sorted in a natural way, starting from the first property, eg.

[A, B]
[0, 0]
[0, 1]
[0, 5]
[0, 8]
[1, 0]
[1, 3]
[2, 4]
[3, 0]

For an efficient query, you have to use equality operator(s) for the first condition(s) and you can use a range operator only for the last condition, eg.

SELECT FROM V WHERE A = 0 AND B > 1 LIMIT 10

or

SELECT FROM V WHERE A > 0 LIMIT 10

but not

SELECT FROM V WHERE A > 0 AND B > 1 LIMIT 10

in this case, only the “A” condition will be indexed.

In V 2.2 you cannot do complex AND/OR combinations (or better, you can do it, but the query executor is not smart enough to use all the indexes), but in v 3.0 you can definitely write a query as follows:

suppose you just reached [0, 5] and you want to retrieve next five elements:

SELECT FROM V WHERE (A = 0 AND B > 5) OR A > 0  LIMIT 10

Thanks

Luigi

#17

@luigidellaquila: Thanks for jumping in. And yes, I do understand all of that. But index or not, it is still only possible to paginate by a single property, whether that property is the @rid or some other unique value. That’s my challenge. My original assumption around a composite index was that the “combination” of the properties in the index would be treated as a single unique value, so that it would be possible to do:

SELECT FROM V WHERE A > n AND B > m LIMIT 10

Without that ability, B by itself would have to be unique, but it’s often not.