Forking vs Threading
I have used threading before in my applications and know its concepts well, but recently in my operating system lecture I came across fork(). Which is something similar to threading.
I google searched difference between them and I came to know that:
- Fork is nothing but a new process that looks exactly like the old or the parent process but still it is a different process with different process ID and having it’s own memory.
- Threads are light-weight process which have less overhead
But, there are still some questions in my mind.
- When should you prefer fork() over threading and vice-verse?
- If I want to call an external application as a child, then should I use fork() or threads to do it?
- While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?
- Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?
Solution 1:
The main difference between forking and threading approaches is one of operating system architecture. Back in the days when Unix was designed, forking was an easy, simple system that answered the mainframe and server type requirements best, as such it was popularized on the Unix systems. When Microsoft re-architected the NT kernel from scratch, it focused more on the threading model. As such there is today still a notable difference with Unix systems being efficient with forking, and Windows more efficient with threads. You can most notably see this in Apache which uses the prefork strategy on Unix, and thread pooling on Windows.
Specifically to your questions:
When should you prefer fork() over threading and vice-verse?
On a Unix system where you're doing a far more complex task than just instantiating a worker, or you want the implicit security sandboxing of separate processes.
If I want to call an external application as a child, then should I use fork() or threads to do it?
If the child will do an identical task to the parent, with identical code, use fork. For smaller subtasks use threads. For separate external processes use neither, just call them with the proper API calls.
While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?
Not entirely sure but I think it's computationally rather expensive to duplicate a process and a lot of subthreads.
Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?
This is false, fork creates a new process which then takes advantage of all features available to processes in the OS task scheduler.
Solution 2:
A forked process is called a heavy-weight process, whereas a threaded process is called light-weight process.
The following are the difference between them:
- A forked process is considered a child process whereas a threaded process is called a sibling.
- Forked process shares no resource like code, data, stack etc with the parent process whereas a threaded process can share code but has its own stack.
- Process switching requires the help of OS but thread switching it is not required
- Creating multiple processes is a resource intensive task whereas creating multiple thread is less resource intensive task
- Each process can run independently whereas one thread can read/write another threads data. Thread and process lecture
Solution 3:
fork()
spawns a new copy of the process, as you've noted. What isn't mentioned above is the exec()
call which often follows. This replaces the existing process with a new process (a new executable) and as such, fork()
/exec()
is the standard means of spawning a new process from an old one.
e.g. that's how your shell will invoke a process from the command line. You specify your process (ls
, say) and the shell forks and then execs ls
.
Note that this operates at a very different level from threading. Threading runs multiple lines of execution intra-process. Forking is a means of creating new processes.