Gremlin query engine implementation for Orientdb-tp3-3.0.11 not working as expected?


#1

Basically I am trying to construct a graph using gremlin . net c# library. I am combining multiple queries into one for node creation to reduce queries send to the server as well as query execution time.
eg:
g.addV(‘Project’).property(‘id’, ‘…’).property(‘Type’, ‘…’).property(‘Name’, ‘…’).addV(‘Project’).property(‘id’, ‘…’).property(‘Type’, ‘…’).property(‘Name’, ‘…’).addV(‘Project’).property(‘id’, ‘…’).property(‘Type’, ‘…’).property(‘Name’, ‘…’).addV(‘Project’).property(‘id’, ‘…’).property(‘Type’, ‘…’).property(‘Name’, ‘…’)…

I have tried to send the same query using gremlin . net library to gremlin server from Orientdb-tp3-3.0.11 package, as well as using orientDB studio, both are working fine.

The issue I am facing is, if I send multiple queries, let’s say, for each time I send 10 queries, each for creating 100 nodes using the above sample query, so total should be 1000 nodes. But it only creates 500+, or 700+. It is inconsistent, and happen randomly. Nothing captured at server log. When I add on hold (sleep) for few seconds before sending the next query, it is better, still missing some nodes, but at least within 50 for my case. still same issue. I tried to reduce the number for each query (50), and with on hold, it is then working as expected. But without on hold, it is still the same, missing lots of nodes.

Can I check what will be the constraints for query using gremlin server:

  1. Max query length?
  2. Handle multiple query?
  3. Other factors?

Any other suggestions for best practice on creating graph with over million nodes and edges?

Thank you.


#2

Hi @minjian.chen

do you have some code to replicate this issue?

Thanks


#3

Hi @wolf4ood,

Basically, I am trying to create a graph with 20k nodes and 80k edges. I just read data from my database and construct the combined query in the format above, and then execute query:
private static Task<ResultSet> SubmitRequest(string query, bool waitForResult)
{
Task<ResultSet> resultSet;

        try
        {
            resultSet = gremlinClient.SubmitAsync<dynamic>(query);
            System.Threading.Thread.Sleep(100);

            while (waitForResult && !resultSet.IsCompleted)
            {
                
            }
        }
        catch (ResponseException e)
        {
            Console.WriteLine("\tRequest Error with status code: " + e.StatusCode);
            PrintStatusAttributesWithRetry(e.StatusAttributes);
            return null;
        }
        return resultSet;
    }

I tried to wait for the result to complete in order to reduce server overload too.

After playing around with the gremlin-server.yaml config file, I managed to make it work. However, if combine too many query, even still within maxContentLength, server will give java.lang.StackOverFlowError. Not sure what caused it and how to solve it yet.

The best I can achieve is to combine 80 queries in one, and let the system sleep for 0.1s, and I can construct the graph in about 3mins without missing any nodes or edges. As this is not my worst case yet, so I am trying to improve the process further.

Hope you can give me some suggestions.

Thank you.