Best pratice for having a daily updated cache for multithreading application in Java
In my application, I'd like to have a cache that will be updated daily by an expensive operation (for example, fetching from remote and computing locally). The idea is every day, it will fetch and compute the latest cache for today. And at any time of today, any thread should be able to read from/write to the daily cache. And it's fine to serve the old data when the expensive operation is running daily and should not block any requests at any time.
I have written a simple code to illustrate the idea but not sure if it's the best practice or even correct in terms of multithreading. For example,
- is
volatile
required? - What will happen if there is a cache reassignment finished in the middle of a get or put.
Any suggestion will be much appreciated!
public class DailyCache {
private volatile ConcurrentHashMap<String, String> cache;
public DailyCache() {
cache = expensiveCalculation();
Executors.newScheduledThreadPool(1)
.scheduleAtFixedRate(() -> cache = expensiveCalculation(), 1, 1, TimeUnit.DAYS);
}
public String get(String key) {
return cache.get(key);
}
public void put(String key, String value) {
cache.put(key, value);
}
public ConcurrentHashMap<String, String> expensiveCalculation() {
// an expensive operation to fetch the cache for today
}
}
Executor service
First of all, you need to capture the reference returned from the call Executors.newScheduledThreadPool(1)
. You must store that reference somewhere in your app for eventual shutdown. If you neglect to shut down an executor service, it’s backing pool of threads may continue to run after your app exits, like a zombie 🧟♂️.
And, by the way, for a single-threaded executor service, call the convenience method Executors.newSingleThreadExecutor
.
Replace entries
If you are using a thread-safe collection, then I would replace individual elements during your update. Do you have a need to replace all the map entries simultaneously? If so, say so in your Question.
If you do not have a reason to replace the entire map, change your code to mark your cache
as final
. If you establish the map’s existence well before the first attempt to access it, and you never replace it, then no need to mark it as volatile
.
Another thing: It’s generally best to make your fields the most general type that fits. So in this case, ConcurrentMap
rather than locking yourself into ConcurrentHashMap
.
public class DailyCache {
private final ConcurrentMap< String, String > cache;
private final ScheduledExecutorService ses;
public DailyCache() {
this.cache = new ConcurrentHashMap<>() ;
this.expensiveCalculation();
this.ses = Executors.newSingleThreadExecutor() ;
this.ses.scheduleAtFixedRate(() -> cache = expensiveCalculation(), 1, 1, TimeUnit.DAYS );
}
public String get( String key ) {
return this.cache.get( key );
}
public void put( String key, String value) {
this.cache.put( key, value );
}
public void expensiveCalculation() {
// A series of expensive operations to replace each element/entry of the `Map` cache.
}
public void shutdown() {
// Gracefully shut down the scheduled executor service held in var `ses`.
…
}
}
One advantage to this entry-replacement approach is that fresh data arrives to the user faster, with the early replacements being made available to the calling code while later replacements have yet to be made.
Replace map
volatile
If you have a reason to replace the entire map all at once, rather than replacing individual entries, then you should make the cache
declaration non-final
, and volatile
.
public class DailyCache {
private volatile ConcurrentMap< String, String > cache;
…
But I prefer a different approach. Due to the change in its definition specifically, and the complexity of concurrency in general, I expect the volatile
keyword is not clearly understood by all programmers.
AtomicReference
So I would use an AtomicReference
object instead, to hold a reference to your current map object. Seeing that AtomicReference
declaration should make your intentions obvious to another programmer.
An AtomicReference
is a thread-safe holder for a reference to an object.
-
var ➤ reference ➤ object
With a conventional reference, such as thecache
variable seen above, we are one step away from the map object: Thecache
variable holds a reference which at runtime takes us to theConcurrentMap
object floating within the heap somewhere. -
var ➤ reference ➤ object ➤ reference ➤ object
With anAtomicReference
, we are two steps away from the desired object. Thecache
variable seen below is a reference to an object whose content is the reference to yet another object.
An AtomicReference
is a little weird to think about at first, because in Java we think of a variable like cache
above being the object, even though we know it is one step away from an object. In contrast, an AtomicReference
makes us quite conscious of the fact that we are manipulating a reference to an object containing a reference to an object. For example, notice the .get().get(…)
and .get().put(…)
calls below.
public class DailyCache {
private final AtomicReference < ConcurrentMap< String, String > > cache;
…
Here is the complete class, revised for an AtomicReference
.
public class DailyCache {
private final AtomicReference < ConcurrentMap< String, String > > cache;
private final ScheduledExecutorService ses;
public DailyCache() {
this.cache = new AtomicReference<>( new ConcurrentHashMap< String, String > () ) ;
this.expensiveCalculation() ;
this.ses = Executors.newSingleThreadExecutor() ;
this.ses.scheduleAtFixedRate(() -> cache = expensiveCalculation(), 1, 1, TimeUnit.DAYS );
}
public String get( String key ) {
return this.cache.get().get( key );
}
public void put( String key, String value) {
this.cache.get().put( key, value );
}
public void expensiveCalculation() {
ConcurrentMap< String, String > concurrentMap = … // An expensive operation to produce a new `ConcurrentMap`.
this.cache.set( concurrentMap ) ;
}
public void shutdown() {
// Gracefully shut down the scheduled executor service held in var `ses`.
…
}
}
To answer your questions:
Q: is it "the best practice"
There are No Best Practices. If you haven't done so already, take the time to read that.
This question is unanswerable ... and not even meaningful.
Hint: it is time to remove "best practice" from your vocabulary ... and start questioning the wisdom of people who tell you that something is "best practice".
Q: is
volatile
required?
Maybe. It depends on whether it is possible for the reference in cache
to change.
- If it is possible, then the
volatile
is required. - If it is not possible, then the
volatile
may not be required. But in that case you should declare the variable asfinal
. If you do that, then the JLS guarantees that all threads will see the correct value for variable.
In your code, it looks like you are periodically assigning a new value to cache
. If so, then it needs to be volatile
, or you need some other way to ensure that all worker threads see the updated cache
value. (There are other ways ... but this is starting to smell of premature optimization.)
Q: What will happen if there is a cache reassignment finished in the middle of a get or put.
If the get
or put
call starts before the assignment, then they will definitely operate on the old cache. If not, it is not it will depend on whether the fetch of the cache
occurs before or after the assignment. That is unpredictable.
There is one other thing that you don't seem to have considered. You say:
And it's fine to serve the old data when the expensive operation is running daily and should not block any requests at any time.
But you have not said if it is OK (or not) for the old cache to be updated while the new cache is being built. For example, consider this sequence of events:
- Start rebuilding cache
- Cache rebuilder updates key1 -> value1 in the new cache
- The main application reads and updates key1 -> value2 in the old cache
- The cache is replaced.
Now we have the application running with the new cache, but the new cache contains a value1 for key1 that is older than the most recently used value (value2).
If that breaks your application, then you need a way to solve it. There will be ways ... but it will depend on aspects of your application logic that you haven't mentioned. For example, the scenarios in which regular application threads will do cache put
operations.