How did I get this NullReferenceException error here right after the constructor?
It is almost certainly a threading issue - see this question and its accepted answer.
Dictionary<>.Insert()
will throw a NullReferenceException
internally if the dictionary instance is modified from another thread during the insert operation.
As of .NET 4.0 you can use ConcurrentDictionary and avoid the threading issues associated with manipulating the same dictionary from multiple threads simultaneously.
It's only happened once and that method has been called thousands of times since the site went live.
After reading this, I can conclude that, its possible that .NET may have exhausted its memory and it could not create any more Dictionary key, it may not really be anywhere your fault. But yes we did get these kind of errors when we tried to store too much information in session/application variables thus increasing memory footprint of the Web Application. But we got such errors when our numbers went really high, like storing 10,000 items in Dictionary or List etc.
The pattern is good, but you must also realize that we use database to store information in relational format, if we start using memory to store similar things, then we are ignoring powerful database. Database can cache values for you as well.
It might sound silly but we have our windows server restart in every 24 hours, at midnight when there is no traffic. That did help us in getting rid of such errors. We restart our servers regularly at a fix schedule in order to get all the cache/logs cleared.
I can't see anything obvious. I'd run some SQL to check the database for any bad data. The problem may be a freak bug in a related input form. If the code has been run thousands of times without incident until now, i'd wrap some additional exception handling/reporting around the code block in question so you can at least get a staffId if/when it next happens.
You could burn a lot of time on something like this. The most expedient approach may be just to let it fail again under the above/controlled conditions..... assuming the level of disruption it causes is acceptable/manageable/minor.
I appreciate that wont satisfy the immediate need to know but it may be the best way to manage the problem especially with such a low failure rate.