When to use TaskCreationOptions.LongRunning?
I've wondered this for quite a while, but never really found the answer.
I understand that it's a hint for the task scheduler where the task will run on, and that the task scheduler can (or nowadays will?) decide to instantiate a non-thread-pool thread for that task.
What I don't know (and surprisingly can't find nowhere on the internet) is some "rule of thumb" when to specify a task as long-running. Is one second long? 30 seconds? A minute? 5 minutes? Does it have a relation with the amount of tasks the application uses? Should I as programmer do some calculation with the #threads in the thread pool, how many Tasks I create, how many will be long-running at the same time, and based on that make a decision whether to use a long running task?
Hope to learn something here.
Solution 1:
It can be quantified, the threadpool manager adds an extra thread beyond the optimum when the existing tp threads don't complete soon enough. It does this twice a second, up to the maximum set by SetMaxThreads(). Which has a very high default value. The optimum is the number of processor cores the machine has available, 4 is typical. Running more threads than available cores can be detrimental due to the context switching overhead.
It does this based on the assumption that these existing threads don't make progress because they are not executing enough code. In other words, they block on I/O or a lock too much. Such threads therefore don't use the cores efficiently enough and allowing an extra thread to execute is entirely appropriate to drive up processor usage and get more work done.
So it is "long running" when the thread takes more than half a second. Keep in mind that this is a very long time, it equals roughly 4 billion processor instructions on a modern desktop class machine. Unless you are running computationally heavy code like calculating the value of pi to a gazillion digits, thus actually executing those 4 billion instructions, a practical thread can only take this long when it does block too often. Which is very common, something like a dbase query is often slow and executed on a worker thread and consumes little cpu.
It is otherwise up to you to verify that the assumption that the threadpool manager will make is accurate. The task should take a long time because it isn't using the processor efficiently. Task Manager is a simple way to see what the processor cores are doing in your program, albeit that it won't tell you exactly what code they are executing. You'll need a unit test to see the thread executing in isolation. The ultimate and only completely accurate way to tell you that using LongRunning was an appropriate choice is to verify that your app indeed gets more work done.
Solution 2:
Amending Hans' answer which I mostly agree with.
The most important reason to specify LongRunning
is to obtain practically guaranteed and almost immediate execution. There will not be any waiting for the thread pool to issue a thread to your work item. I'm saying "practically" because the OS is free to not schedule your thread. But you'll get some share of the CPU and it's usually not going to take a very long time until it happens.
You're jumping in front of the queue by specifying LongRunning
. No need to wait for the thread pool to issue 2 threads per second if it's under load.
So you would use LongRunning
for things that must happen not necessarily in the most efficient way but in a timely and steady way. For example, some UI work, the game loop, progress reporting, ...
Starting and stopping a thread costs on the order of 1ms of CPU time. This is by far more than issuing thread pool work items. I just benchmarked this to be 3M items issued and completed per second. The benchmark was quite artificial but the order of magnitude is right.
LongRunning
is documented to be a hint but it's absolutely effective in practice. There is no heuristic that takes your hint into account. It's assumed to be correct.
Solution 3:
when to specify a task as long-running
It depends what task is doing. If task contains while(true) {...}
and lives until application shutdown, then it make sense to specify LongRunning
. If you create task to queue some operation and prevent blocking current thread then you don't really care (do not specify anything).
It depends on what other tasks are doing. It doesn't matter if you run few tasks with or without LongRunning
. But it might be a problem to create thousand of tasks, where each demand new thread. Or opposite, you may experience thread starvation without specifying it.
One easy way to think about it is this: do you prefer new task to run in a new thread or you don't care? If first - then use LongRunningOption
. This doesn't mean what task will run in another thread, it's just a good criteria when you have to specify it.
E.g. when using ContinueWith
then LongRunning
is opposite to ExecuteSynchronously
(there is a check to prevent both being specified). If you have multiple continuations, then maybe you want to avoid overhead of queue and run specific continuation in same thread or opposite - you don't want one of continuations to interfere with others and then you can specifically use LongRunning
. Refer to this article (and this) to learn about ExecuteSynchronously
.
Solution 4:
A long running task is one that may enter a waiting state, blocking the thread it runs on, or one that takes too much CPU time (we'll come back to this one).
Some may say this definition is too broad, many tasks would be long running, but think about it, even if the wait is bounded to a small timeout, the task is still not using the CPU effectively. If the amount of these tasks rises, you'll observe they don't scale linearly past the MinWorkerThreads (see ThreadPool.SetMinThreads
), the degradation is really bad.
The approach it to switch all I/O (file, network, DB, etc.) to be asynchronous.
There are also long running tasks due to long CPU-intensive computations.
The approach is to defer computation, such as inserting await Task.Yield()
at certain points, or preferably deferring computation explicitly by scheduling one task after the other, each processing a previously split chunk of data or process a buffer up to a bounded time limit.
The "too much time" is up to you to decide.
When you're in an environment where you're sharing the thread pool, any time is too much time, you must choose a sensible value.
E.g. in ASP.NET under IIS, see what is the average time taken per request for the most common requests. Similarly, in a service where the thread pool is used e.g. for processing a message queue, take the average per message.
More generally, "too much time" is when work items are queued faster than they're processed. There may be bursts of work, so you should average this over the time unit that matters to you, be it a second, a minute, 10 minutes, etc. When you have an SLA, you should have this interval defined somewhere.
After having a sensible value, you must see, in practice, if it's OK to increase it or if you must decrease it. More often, if you may increase it, you're better off not increasing it, unless you can see a significant performance difference. "Significant" means the number of processed items increase more than linearly, so if it's linear (or below linear, it can happen), don't do it.
In my experience, if you have a long running task by any of these definitions, you're usually better off with managing your own thread or set of threads.