How can I solve busy time problem in process function?
Solution 1:
Some slowdown is to be expected once RocksDB reaches the point where the working state no longer fits in memory. However, in this case you should be able to dramatically improve performance by switching from ValueState
to MapState
.
Currently you are deserializing and reserializing the entire hashSet for every record. As these hashSets grow over time, performance degrades.
The RocksDB state backend has an optimized implementation of MapState
. Each individual key/value entry in the map is stored as a separate RocksDB object, so you can lookup, insert, and update entries without having to do serde on the rest of the map.
ListState
is also optimized for RocksDB (it can be appended to without deserializing the list). In general it's best to avoid storing collections in ValueState
when using RocksDB, and use ListState
or MapState
instead wherever possible.
Since the heap-based state backend keeps its working state as objects on the heap, it doesn't have the same issues.