Low replication performance


#1

Hi,OrientDB experts:

I set up a three docker of a group ,one master and two replicas, found out the low performance in replication, and I got some information in orientdb.err:

1.When I stoped the APP,and found the information,There was no too many insert statements on the run,but
cpu soar 90%
2018-12-03 15:29:07:669 WARNI [node140155] Timeout (330002ms) on waiting for synchronous responses from nodes=[node140154] responsesSoFar=[] request=(id=0.157975 task=upd_db_status) [ODistributedDatabaseImpl]
2.
2018-12-03 15:20:21:918 WARNI Reached limit of retry for commit tx:0.152023 forcing database re-install [ODatabaseDocumentDistributed]Exception 27CF469B in storage plocal:/databases/first_orientdb_database: 3.0.11 - Veloce (build 4a3b7acf5bdffc997f786197a6f896f8d3f16604, branch 3.0.x)
3.
2018-12-03 15:27:22:011 INFO [node140154]<-[node140155] Copying remote database ‘first_orientdb_database’ to: /tmp/orientdb/install_first_orientdb_database_server1.zip [OHazelcastPlugin]

This is my conf file:

{
“autoDeploy”: true,
“readQuorum”: 1,
“writeQuorum”: 1,
“executionMode”: “asynchronous”,
“readYourWrites”: true,
“newNodeStrategy”: “static”,
“servers”: {
“host-10-181-140-155”: “master”,
“host-10-181-140-154”: “replica”,
“host-10-187-13-94”: “replica”
},
“clusters”: {
“internal”: {
},
“*”: {
“servers”: [“node140155”,“node1394”,“node140154”]
}
}
}

Looking forward for your reply.

Thank you!


#2

Hi @zhang_shuchen,

what version are you using? What are the memory settings? Can you try put the writeQuorum to “majorityi” ?

Thx
Regards,
Michela


#3

I am using 3.0.11,the memory settings are “-Xms8G -Xmx8G -Dstorage.diskCache.bufferSize=10000 -XX:+PerfDisableSharedMem” ,and there is a setting of “writeQuorum”: “majority”,Should I set some hazelcast variables to improve replication performance?

Thank you
Regards


#4

Hi, I try set three docker with same setting

{
“autoDeploy”: true,
“readQuorum”: 1,
“writeQuorum”: “majority”,
“executionMode”: “asynchronous”,
“readYourWrites”: true,
“newNodeStrategy”: “static”,
“servers”: {
“node140155”: “master”,
“node140154”: “replica”,
“node1394”: “replica”
},
“clusters”: {
“internal”: {
},
“*”: {
“servers”: [“node140155”,“node1394”,“node140154”]
}
}
}

and I get message from orient-server.log.0,seem that replicia re-fetch the whole databases from master,Is that true ? and, how do I reduce the “Time out acquire lock for resource” and “Reached limit of retry for commit”

2018-12-04 10:43:17:725 INFO No reason to make fuzzy checkpoint [OLocalPaginatedStorage]Error beginning timed out transaction: 2.64040
com.orientechnologies.common.concur.lock.OLockException: Time out acquire lock for resource: ‘#60:214
at com.orientechnologies.common.concur.lock.OSimpleLockManagerImpl.lock(OSimpleLockManagerImpl.java:37)
at com.orientechnologies.orient.server.distributed.impl.ONewDistributedTxContextImpl.lock(ONewDistributedTxContextImpl.java:51)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.acquireLocksForTx(ODatabaseDocumentDistributed.java:537)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.internalBegin2pc(ODatabaseDocumentDistributed.java:724)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.commit2pc(ODatabaseDocumentDistributed.java:650)
at com.orientechnologies.orient.server.distributed.impl.task.OTransactionPhase2Task.execute(OTransactionPhase2Task.java:92)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$1.call(ODistributedAbstractPlugin.java:648)
at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:69)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeOnLocalNode(ODistributedAbstractPlugin.java:644)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.onMessage(ODistributedWorker.java:347)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.run(ODistributedWorker.java:121)
Error beginning timed out transaction: 2.64040
com.orientechnologies.common.concur.lock.OLockException: Time out acquire lock for resource: ‘#60:214
at com.orientechnologies.common.concur.lock.OSimpleLockManagerImpl.lock(OSimpleLockManagerImpl.java:37)
at com.orientechnologies.orient.server.distributed.impl.ONewDistributedTxContextImpl.lock(ONewDistributedTxContextImpl.java:51)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.acquireLocksForTx(ODatabaseDocumentDistributed.java:537)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.internalBegin2pc(ODatabaseDocumentDistributed.java:724)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.commit2pc(ODatabaseDocumentDistributed.java:650)
at com.orientechnologies.orient.server.distributed.impl.task.OTransactionPhase2Task.execute(OTransactionPhase2Task.java:92)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$1.call(ODistributedAbstractPlugin.java:648)
at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:69)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeOnLocalNode(ODistributedAbstractPlugin.java:644)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.onMessage(ODistributedWorker.java:347)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.run(ODistributedWorker.java:121)
Error beginning timed out transaction: 2.64040
com.orientechnologies.common.concur.lock.OLockException: Time out acquire lock for resource: ‘#60:214
at com.orientechnologies.common.concur.lock.OSimpleLockManagerImpl.lock(OSimpleLockManagerImpl.java:37)
at com.orientechnologies.orient.server.distributed.impl.ONewDistributedTxContextImpl.lock(ONewDistributedTxContextImpl.java:51)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.acquireLocksForTx(ODatabaseDocumentDistributed.java:537)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.internalBegin2pc(ODatabaseDocumentDistributed.java:724)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.commit2pc(ODatabaseDocumentDistributed.java:650)
at com.orientechnologies.orient.server.distributed.impl.task.OTransactionPhase2Task.execute(OTransactionPhase2Task.java:92)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$1.call(ODistributedAbstractPlugin.java:648)
at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:69)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeOnLocalNode(ODistributedAbstractPlugin.java:644)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.onMessage(ODistributedWorker.java:347)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.run(ODistributedWorker.java:121)
Error beginning timed out transaction: 2.64040
com.orientechnologies.common.concur.lock.OLockException: Time out acquire lock for resource: ‘#60:214
at com.orientechnologies.common.concur.lock.OSimpleLockManagerImpl.lock(OSimpleLockManagerImpl.java:37)
at com.orientechnologies.orient.server.distributed.impl.ONewDistributedTxContextImpl.lock(ONewDistributedTxContextImpl.java:51)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.acquireLocksForTx(ODatabaseDocumentDistributed.java:537)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.internalBegin2pc(ODatabaseDocumentDistributed.java:724)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.commit2pc(ODatabaseDocumentDistributed.java:650)
at com.orientechnologies.orient.server.distributed.impl.task.OTransactionPhase2Task.execute(OTransactionPhase2Task.java:92)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$1.call(ODistributedAbstractPlugin.java:648)
at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:69)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeOnLocalNode(ODistributedAbstractPlugin.java:644)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.onMessage(ODistributedWorker.java:347)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.run(ODistributedWorker.java:121)
Error beginning timed out transaction: 2.64040
com.orientechnologies.common.concur.lock.OLockException: Time out acquire lock for resource: ‘#60:214
at com.orientechnologies.common.concur.lock.OSimpleLockManagerImpl.lock(OSimpleLockManagerImpl.java:37)
at com.orientechnologies.orient.server.distributed.impl.ONewDistributedTxContextImpl.lock(ONewDistributedTxContextImpl.java:51)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.acquireLocksForTx(ODatabaseDocumentDistributed.java:537)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.internalBegin2pc(ODatabaseDocumentDistributed.java:724)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.commit2pc(ODatabaseDocumentDistributed.java:650)
at com.orientechnologies.orient.server.distributed.impl.task.OTransactionPhase2Task.execute(OTransactionPhase2Task.java:92)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$1.call(ODistributedAbstractPlugin.java:648)
at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:69)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeOnLocalNode(ODistributedAbstractPlugin.java:644)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.onMessage(ODistributedWorker.java:347)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.run(ODistributedWorker.java:121)
Error beginning timed out transaction: 2.64040
com.orientechnologies.common.concur.lock.OLockException: Time out acquire lock for resource: ‘#60:214
at com.orientechnologies.common.concur.lock.OSimpleLockManagerImpl.lock(OSimpleLockManagerImpl.java:37)
at com.orientechnologies.orient.server.distributed.impl.ONewDistributedTxContextImpl.lock(ONewDistributedTxContextImpl.java:51)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.acquireLocksForTx(ODatabaseDocumentDistributed.java:537)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.internalBegin2pc(ODatabaseDocumentDistributed.java:724)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.commit2pc(ODatabaseDocumentDistributed.java:650)
at com.orientechnologies.orient.server.distributed.impl.task.OTransactionPhase2Task.execute(OTransactionPhase2Task.java:92)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$1.call(ODistributedAbstractPlugin.java:648)
at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:69)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeOnLocalNode(ODistributedAbstractPlugin.java:644)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.onMessage(ODistributedWorker.java:347)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.run(ODistributedWorker.java:121)
Error beginning timed out transaction: 2.64040
com.orientechnologies.common.concur.lock.OLockException: Time out acquire lock for resource: ‘#60:214
at com.orientechnologies.common.concur.lock.OSimpleLockManagerImpl.lock(OSimpleLockManagerImpl.java:37)
at com.orientechnologies.orient.server.distributed.impl.ONewDistributedTxContextImpl.lock(ONewDistributedTxContextImpl.java:51)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.acquireLocksForTx(ODatabaseDocumentDistributed.java:537)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.internalBegin2pc(ODatabaseDocumentDistributed.java:724)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.commit2pc(ODatabaseDocumentDistributed.java:650)
at com.orientechnologies.orient.server.distributed.impl.task.OTransactionPhase2Task.execute(OTransactionPhase2Task.java:92)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$1.call(ODistributedAbstractPlugin.java:648)
at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:69)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeOnLocalNode(ODistributedAbstractPlugin.java:644)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.onMessage(ODistributedWorker.java:347)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.run(ODistributedWorker.java:121)
Error beginning timed out transaction: 2.64040
com.orientechnologies.common.concur.lock.OLockException: Time out acquire lock for resource: ‘#60:214
at com.orientechnologies.common.concur.lock.OSimpleLockManagerImpl.lock(OSimpleLockManagerImpl.java:37)
at com.orientechnologies.orient.server.distributed.impl.ONewDistributedTxContextImpl.lock(ONewDistributedTxContextImpl.java:51)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.acquireLocksForTx(ODatabaseDocumentDistributed.java:537)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.internalBegin2pc(ODatabaseDocumentDistributed.java:724)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.commit2pc(ODatabaseDocumentDistributed.java:650)
at com.orientechnologies.orient.server.distributed.impl.task.OTransactionPhase2Task.execute(OTransactionPhase2Task.java:92)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$1.call(ODistributedAbstractPlugin.java:648)
at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:69)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeOnLocalNode(ODistributedAbstractPlugin.java:644)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.onMessage(ODistributedWorker.java:347)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.run(ODistributedWorker.java:121)
Error beginning timed out transaction: 2.64040
com.orientechnologies.common.concur.lock.OLockException: Time out acquire lock for resource: ‘#60:214
at com.orientechnologies.common.concur.lock.OSimpleLockManagerImpl.lock(OSimpleLockManagerImpl.java:37)
at com.orientechnologies.orient.server.distributed.impl.ONewDistributedTxContextImpl.lock(ONewDistributedTxContextImpl.java:51)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.acquireLocksForTx(ODatabaseDocumentDistributed.java:537)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.internalBegin2pc(ODatabaseDocumentDistributed.java:724)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.commit2pc(ODatabaseDocumentDistributed.java:650)
at com.orientechnologies.orient.server.distributed.impl.task.OTransactionPhase2Task.execute(OTransactionPhase2Task.java:92)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$1.call(ODistributedAbstractPlugin.java:648)
at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:69)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeOnLocalNode(ODistributedAbstractPlugin.java:644)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.onMessage(ODistributedWorker.java:347)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.run(ODistributedWorker.java:121)
Error beginning timed out transaction: 2.64040
com.orientechnologies.common.concur.lock.OLockException: Time out acquire lock for resource: ‘#60:214
at com.orientechnologies.common.concur.lock.OSimpleLockManagerImpl.lock(OSimpleLockManagerImpl.java:37)
at com.orientechnologies.orient.server.distributed.impl.ONewDistributedTxContextImpl.lock(ONewDistributedTxContextImpl.java:51)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.acquireLocksForTx(ODatabaseDocumentDistributed.java:537)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.internalBegin2pc(ODatabaseDocumentDistributed.java:724)
at com.orientechnologies.orient.server.distributed.impl.ODatabaseDocumentDistributed.commit2pc(ODatabaseDocumentDistributed.java:650)
at com.orientechnologies.orient.server.distributed.impl.task.OTransactionPhase2Task.execute(OTransactionPhase2Task.java:92)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin$1.call(ODistributedAbstractPlugin.java:648)
at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:69)
at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.executeOnLocalNode(ODistributedAbstractPlugin.java:644)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.onMessage(ODistributedWorker.java:347)
at com.orientechnologies.orient.server.distributed.impl.ODistributedWorker.run(ODistributedWorker.java:121)

2018-12-04 10:43:58:544 WARNI Reached limit of retry for commit tx:2.64040 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:43:58:550 INFO [node1394] Current node is a REPLICA for database ‘first_orientdb_database’ [ODistributedAbstractPlugin$3]
2018-12-04 10:43:58:551 WARNI [node1394]->[[node140155, node140154]] requesting delta database sync for ‘first_orientdb_database’ on local server… [OHazelcastPlugin]
2018-12-04 10:44:00:939 INFO Found 7 records changed in last 16 operations [OLocalPaginatedStorage]
2018-12-04 10:44:00:939 INFO [node1394] Executing the realignment of the last records modified before last close [#55:67, #57:673, #57:885, #59:2, #60:214, #61:387, #66:639]… [OHazelcastPlugin]
2018-12-04 10:44:01:028 INFO [node1394] Realignment completed. [OHazelcastPlugin]
2018-12-04 10:44:01:030 INFO [node1394]->[node140155] Requesting database delta sync for ‘first_orientdb_database’ LSN=OLogSequenceNumber{segment=175, position=33237540}… [OHazelcastPlugin]
2018-12-04 10:44:01:037 INFO [node1394]<-[node140155] Received updated status node1394.first_orientdb_database=SYNCHRONIZING [OHazelcastPlugin]
2018-12-04 10:44:01:040 INFO [node1394] Distributed servers status (*=current @=lockmgr[node1394]):

[OHazelcastPlugin]
2018-12-04 10:45:16:388 WARNI Reached limit of retry for commit tx:2.76671 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:18:389 WARNI Reached limit of retry for commit tx:2.76661 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:18:389 WARNI Reached limit of retry for commit tx:2.76831 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:20:391 WARNI Reached limit of retry for commit tx:2.76801 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:20:391 WARNI Reached limit of retry for commit tx:2.76821 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:21:391 WARNI Reached limit of retry for commit tx:2.76651 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:22:391 WARNI Reached limit of retry for commit tx:2.76841 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:22:391 WARNI Reached limit of retry for commit tx:2.76851 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:22:391 WARNI Reached limit of retry for commit tx:2.76701 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:22:391 WARNI Reached limit of retry for commit tx:2.76901 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:22:391 WARNI Reached limit of retry for commit tx:2.76881 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:22:392 WARNI Reached limit of retry for commit tx:2.76911 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:23:392 WARNI Reached limit of retry for commit tx:2.76761 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:23:392 WARNI Reached limit of retry for commit tx:2.76741 forcing database re-install [ODatabaseDocumentDistributed]
2018-12-04 10:45:23:392 WARNI Reached limit of retry for commit tx:2.76981 forcing database re-install [ODatabaseDocumentDistributed]Error beginning timed out transaction: 2.77391
com.orientechnologies.common.concur.lock.OLockException: Time out acquire lock for resource: ‘#58:881
at com.orientechnologies.common.concur.lock.OSimpleLockManagerImpl.lock(OSimpleLockManagerImpl.java:37)