Best pratice for having a daily updated cache for multithreading application in Java

In my application, I'd like to have a cache that will be updated daily by an expensive operation (for example, fetching from remote and computing locally). The idea is every day, it will fetch and compute the latest cache for today. And at any time of today, any thread should be able to read from/write to the daily cache. And it's fine to serve the old data when the expensive operation is running daily and should not block any requests at any time.

I have written a simple code to illustrate the idea but not sure if it's the best practice or even correct in terms of multithreading. For example,

  • is volatile required?
  • What will happen if there is a cache reassignment finished in the middle of a get or put.

Any suggestion will be much appreciated!

public class DailyCache {
    private volatile ConcurrentHashMap<String, String> cache;

    public DailyCache() {
        cache = expensiveCalculation();
        Executors.newScheduledThreadPool(1)
                 .scheduleAtFixedRate(() -> cache = expensiveCalculation(), 1, 1, TimeUnit.DAYS);
    }

    public String get(String key) {
        return cache.get(key);
    }

    public void put(String key, String value) {
        cache.put(key, value);
    }

    public ConcurrentHashMap<String, String> expensiveCalculation() {
        // an expensive operation to fetch the cache for today
    }
}

Executor service

First of all, you need to capture the reference returned from the call Executors.newScheduledThreadPool(1). You must store that reference somewhere in your app for eventual shutdown. If you neglect to shut down an executor service, it’s backing pool of threads may continue to run after your app exits, like a zombie 🧟‍♂️.

And, by the way, for a single-threaded executor service, call the convenience method Executors.newSingleThreadExecutor.

Replace entries

If you are using a thread-safe collection, then I would replace individual elements during your update. Do you have a need to replace all the map entries simultaneously? If so, say so in your Question.

If you do not have a reason to replace the entire map, change your code to mark your cache as final. If you establish the map’s existence well before the first attempt to access it, and you never replace it, then no need to mark it as volatile.

Another thing: It’s generally best to make your fields the most general type that fits. So in this case, ConcurrentMap rather than locking yourself into ConcurrentHashMap.

public class DailyCache {
    private final ConcurrentMap< String, String > cache;
    private final ScheduledExecutorService ses;

    public DailyCache() {
        this.cache = new ConcurrentHashMap<>() ;
        this.expensiveCalculation();
        this.ses = Executors.newSingleThreadExecutor() ; 
        this.ses.scheduleAtFixedRate(() -> cache = expensiveCalculation(), 1, 1, TimeUnit.DAYS );
    }

    public String get( String key ) {
        return this.cache.get( key );
    }

    public void put( String key, String value) {
        this.cache.put( key, value );
    }

    public void expensiveCalculation() {
        // A series of expensive operations to replace each element/entry of the `Map` cache.
    }

    public void shutdown() {
        // Gracefully shut down the scheduled executor service held in var `ses`. 
        …
    }
}

One advantage to this entry-replacement approach is that fresh data arrives to the user faster, with the early replacements being made available to the calling code while later replacements have yet to be made.

Replace map

volatile

If you have a reason to replace the entire map all at once, rather than replacing individual entries, then you should make the cache declaration non-final, and volatile.

public class DailyCache {
    private volatile ConcurrentMap< String, String > cache;
…

But I prefer a different approach. Due to the change in its definition specifically, and the complexity of concurrency in general, I expect the volatile keyword is not clearly understood by all programmers.

AtomicReference

So I would use an AtomicReference object instead, to hold a reference to your current map object. Seeing that AtomicReference declaration should make your intentions obvious to another programmer.

An AtomicReference is a thread-safe holder for a reference to an object.

  • var ➤ reference ➤ object
    With a conventional reference, such as the cache variable seen above, we are one step away from the map object: The cache variable holds a reference which at runtime takes us to the ConcurrentMap object floating within the heap somewhere.
  • var ➤ reference ➤ object ➤ reference ➤ object
    With an AtomicReference, we are two steps away from the desired object. The cache variable seen below is a reference to an object whose content is the reference to yet another object.

An AtomicReference is a little weird to think about at first, because in Java we think of a variable like cache above being the object, even though we know it is one step away from an object. In contrast, an AtomicReference makes us quite conscious of the fact that we are manipulating a reference to an object containing a reference to an object. For example, notice the .get().get(…) and .get().put(…) calls below.

public class DailyCache {
    private final AtomicReference < ConcurrentMap< String, String > > cache;
…

Here is the complete class, revised for an AtomicReference.

public class DailyCache {
    private final AtomicReference < ConcurrentMap< String, String > > cache;
    private final ScheduledExecutorService ses;

    public DailyCache() {
        this.cache = new AtomicReference<>( new ConcurrentHashMap< String, String > () ) ;
        this.expensiveCalculation() ;
        this.ses = Executors.newSingleThreadExecutor() ; 
        this.ses.scheduleAtFixedRate(() -> cache = expensiveCalculation(), 1, 1, TimeUnit.DAYS );
    }

    public String get( String key ) {
        return this.cache.get().get( key );
    }

    public void put( String key, String value) {
        this.cache.get().put( key, value );
    }

    public void expensiveCalculation() {
        ConcurrentMap< String, String > concurrentMap = … // An expensive operation to produce a new `ConcurrentMap`.
        this.cache.set( concurrentMap ) ;
    }

    public void shutdown() {
        // Gracefully shut down the scheduled executor service held in var `ses`. 
        …
    }
}

To answer your questions:

Q: is it "the best practice"

There are No Best Practices. If you haven't done so already, take the time to read that.

This question is unanswerable ... and not even meaningful.

Hint: it is time to remove "best practice" from your vocabulary ... and start questioning the wisdom of people who tell you that something is "best practice".

Q: is volatile required?

Maybe. It depends on whether it is possible for the reference in cache to change.

  • If it is possible, then the volatile is required.
  • If it is not possible, then the volatile may not be required. But in that case you should declare the variable as final. If you do that, then the JLS guarantees that all threads will see the correct value for variable.

In your code, it looks like you are periodically assigning a new value to cache. If so, then it needs to be volatile, or you need some other way to ensure that all worker threads see the updated cache value. (There are other ways ... but this is starting to smell of premature optimization.)

Q: What will happen if there is a cache reassignment finished in the middle of a get or put.

If the get or put call starts before the assignment, then they will definitely operate on the old cache. If not, it is not it will depend on whether the fetch of the cache occurs before or after the assignment. That is unpredictable.


There is one other thing that you don't seem to have considered. You say:

And it's fine to serve the old data when the expensive operation is running daily and should not block any requests at any time.

But you have not said if it is OK (or not) for the old cache to be updated while the new cache is being built. For example, consider this sequence of events:

  1. Start rebuilding cache
  2. Cache rebuilder updates key1 -> value1 in the new cache
  3. The main application reads and updates key1 -> value2 in the old cache
  4. The cache is replaced.

Now we have the application running with the new cache, but the new cache contains a value1 for key1 that is older than the most recently used value (value2).

If that breaks your application, then you need a way to solve it. There will be ways ... but it will depend on aspects of your application logic that you haven't mentioned. For example, the scenarios in which regular application threads will do cache put operations.