Delta sync failing due to WAL

Hello,

We are running a distributed setup, currently with one master and two replicas. When restarting all the servers (i.e. shutdown all of them and then start up one by one), we almost always face the below issue. Essentially the write ahead logs that are saved seem to be cleared out quickly, I can mostly only find one at about 4 to 8 Kb. I attached the configs for all wal-related settings below, I believe all/most should be default. The question is, what settings should we change to allow them to successfully perform delta-syncs?

2020-03-03 11:43:49:056 INFO  [01901]->[01900] Requesting database delta sync for 'esebtor-allevents' LSN=OLogSequenceNumber{segment=44, position=4515}... [OHazelcastPlugin]
2020-03-03 11:43:49:068 WARNI [01901]<-[01900] Error on installing database delta for 'esebtor-allevents' (err=Requested delta sync with LSN=OLogSequenceNumber{segment=44, position=4515} but found the following error: Requested database delta sync with LSN=OLogSequenceNumber{segment=44, position=4515} but not found in database) [OHazelcastPlugin]
2020-03-03 11:43:49:068 INFO  [01901] Requesting full sync for database 'esebtor-allevents'... [OHazelcastPlugin]

|44 |storage.useWAL |true |
|45 |storage.useCHMCache |true |
|46 |storage.wal.syncOnPageFlush |true |
|47 |storage.wal.cacheSize |65536 |
|48 |storage.wal.bufferSize |64 |
|49 |storage.wal.segmentsInterval |30 |
|50 |storage.wal.fileAutoCloseInterval |10 |
|51 |storage.wal.segmentBufferSize |32 |
|52 |storage.wal.maxSegmentSize |-1 |
|53 |storage.wal.maxSegmentSizePercent |5 |
|54 |storage.wal.minSegSize |6144 |
|55 |storage.wal.maxSize |-1 |
|56 |storage.wal.allowDirectIO |true |
|57 |storage.wal.commitTimeout |1000 |
|58 |storage.wal.shutdownTimeout |10000 |
|59 |storage.wal.fuzzyCheckpointInterval |300 |
|60 |storage.wal.reportAfterOperationsDuringRestore |10000 |
|61 |storage.wal.restore.batchSize |1000 |
|62 |storage.wal.readCacheSize |1000 |
|63 |storage.wal.fuzzyCheckpointShutdownWait |600 |
|64 |storage.wal.fullCheckpointShutdownTimeout |600 |
|65 |storage.wal.path |

Thanks,
Sebastian

Hello again,

Maybe someone can elaborate on when a WAL will be cleared? I have found that they usually are cleared at least when all nodes are back in sync. But the problem is, if I shutdown one node and continue inserting data for about one hour or so and then restart the node back up, the WAL has been emptied at some point and doesn’t contain the LSN of when the node was shutdown. The same test works if I only shutdown the node for 15 minutes.

Any pointers would be greatly appreciated!

Thanks,
Sebastian