Questions on Information of OrientDB


#1

Dear OrientDB developers

Hello, I am Jiaqi Zuo, a master student from Carnegie Mellon University. I am doing a course project: writing an on-line encyclopedia of database management systems. Here is the link: https://dbdb.io/

I am really interested in OrientDB and am willing to summarize some information about OrientDB. However, I meet the following problems and I cannot find relevant information in the latest documentation. Thus, I come here and hope you can help me get better knowledge of OrientDB. The problems are as follows:

  1. Does OrientDB support foreign key constraints?
  2. Does OrientDB support query compilation e.g., code generation or JIT optimizations? How does it do this?
  3. Query execution? What query processing model does OrientDB support (e.g., iterator vs. vectorized)? What kind of intra-query parallelism does it support?
  4. What kind of storage models does OrientDB support (e.g., N-ary Storage Model, Decomposition Storage Model)?
  5. How is OrientDB’s underlying storage manager implemented?

I will be very appreciated if you can help me with those problems.

Best regards
Jiaqi


#2

Hi @Alvin

I’ll try to answer your questions:

  1. OrientDB does not support customer foreign key constraints, ie. you cannot define a user-level attribute as a foreign key. Each record in OrientDB has a record id (RID) that represents the physical position of the record. Link attributes (that contain RIDs or collections of RIDs) can be used as foreign keys to represent relationships

  2. If you mean bytecode generation, the answer is no. OrientDB query execution planner generates a query execution plan (that is the actual query executor) made of pre-defined query steps (plain Java components); query execution plans are cached, so that they do not have to be recalculated for multiple executions of the same query. For now we do not do anything specific for JIT ottimization (we are evaluating GraalVM, but it’s still just an idea), so we rely on the normal JVM JIT mechanisms.

  3. OrientDB query executor is designed as an iterator-based process, but some query execution plan components do some pre-elaboration and batch processing that can be considered a vectorized strategy.

  4. OrientDB storage is essentially an N-ary storage, data clusters are made of two files, the first one is a cluster position map (ie. a set of entries with fixed size, made of a RID and a physical pointer to the actual data content); the second one contains the actual records, each record is a sequence of key/value pairs.

  5. This is a pretty complex and broad topic, what kind of info do you need? Tx coordination? WAL? Locking?

Thanks

Luigi


#3

Hi @luigidellaquila

Thank you for helping with my problems. That really helps me a lot.

  1. I have seen the LINK concept in the documentation. What is the difference between LINK and foreign keys?

  2. & 3. Those helps me a lot about query compilation and query execution! Would you mind providing me more information about query engines in OrientDB e.g. some documentations, if possible?

  3. That information does solve my problem.

  4. Sorry for the confusion. Let me clarify. There are actually two small questions here. First question is about file storage, that is, how does storage engine organize pages (e.g. unordered collection of pages or some ordering organization)? Second question about the page layout (records stored in pages). I think part of answers are covered in 4. I just want to ensure I understand your answer correctly. Does storage engine store log records in some pages?

Again, thank you for answering me those questions. I am really appreciated.

Best regards
Jiaqi


#4

Hi @Alvin

  1. There are two main differences, first of all a LINK refers to a RID, that is in some way an internal information and cannot be assigned/manipulated by the user, and then it is a physical pointer, while a foreign key is a logical relationship

2 & 3. all the docs about query execution are here https://orientdb.com/docs/3.0.x/sql/ but I’m afraid the execution planning is not very detailed

  1. The storage is physically divided in multiple files, each file contains data only for a specific “class” (ie. data type). The pages in the data files are not ordered.
    Yes, records are stored in pages of 64Kb, if a record does not fit in a page, it is split on multiple pages.

Thanks

Luigi


#5

Hi @luigidellaquila

Thank you for your answer. That helps me a lot. I am very appreciated.

Best regards
Jiaqi


#6

Hi @luigidellaquila

After discussing with instructors, I still have two small questions about OrientDB. If you can help me with them, I will be very appreciated.

  1. Query Compilation. Since OrientDB supports SQL syntax, how does it compile and execute SQL query? Will it generate Java code on the fly and then run in the JVM?

  2. As you said previously, some query execution plan components do some pre-elaboration and batch processing. Is it pre-fetching method? Is it possible for you to mention some of those query execution plan components?

Best regards
Alvin


#7

Hi @Alvin

  1. no, it does not compile to Java bytecode, it just creates an execution plan, that is a pipeline made of Java objects (eg. fetch from class, filter, calculate projections)

  2. Some components do batch calculation by design, eg. the aggregate projection calculation and the ORDER BY; some components are designed to support some pre-fetching in the future (eg. fetch from class/index) but for now they only do one-by-one fetching

Thanks

Luigi


#8

Thank you!

Best regards
Alvin