How to make main thread wait for all child threads finish?

I intend to fire 2 threads in the main thread, and the main thread should wait till all the 2 child threads finish, this is how I do it.

void *routine(void *arg)
{
    sleep(3);
}

int main()
{
    for (int i = 0; i < 2; i++) {
        pthread_t tid;
        pthread_create(&tid, NULL, routine, NULL);
        pthread_join(&tid, NULL);  //This function will block main thread, right?
    }
}

In the above code, pthread_join indeed makes main thread wait for the child threads, but the problem is, the second thread won't be created untill the first one finishes. This is not what I want.

What I want is, the 2 threads get created immediatly in the main thread, and then main thread waits for them to finish. Seems like pthread_join cannot do the trick, can it?

I thought, maybe via a semaphore I can do the job, but any other way?


int main()
{
    pthread_t tid[2];
    for (int i = 0; i < 2; i++) {
        pthread_create(&tid[i], NULL, routine, NULL);
    }
    for (int i = 0; i < 2; i++)
       pthread_join(tid[i], NULL);
    return 0;
}

First create all the threads, then join all of them:

pthread_t tid[2];

/// create all threads
for (int i = 0; i < 2; i++) {
    pthread_create(&tid[i], NULL, routine, NULL);
}

/// wait all threads by joining them
for (int i = 0; i < 2; i++) {
    pthread_join(tid[i], NULL);  
}

Alternatively, have some pthread_attr_t variable, use pthread_attr_init(3) then pthread_attr_setdetachedstate(3) on it, then pass its address to pthread_create(3) second argument. Thos would create the threads in detached state. Or use pthread_detach as explained in Jxh's answer.

Remember to read some good Pthread tutorial. You may want to use mutexes and condition variables.

You could use frameworks wrapping them, e.g. Qt or POCO (in C++), or read a good C++ book and use C++ threads.

Conceptually, threads have each their call stack and are related to continuations. They are "heavy".

Consider some agent-oriented programming approach: as a rule of thumb, you don't want to have a lot of threads (e.g. 20 threads on a 10 core processor is reasonable, 200 threads won't be unless a lot of them are sleeping or waiting) and and do want threads to synchronize using mutex and condition variables and communicate and/or synchronize with other threads quite often (several times per second). See also poll(2), fifo(7), unix(7), sem_overview(7) with shm_overview(7) as another way of communicating between threads. In general, avoid using signal(7) with threads (read signal-safety(7)...), and use dlopen(3) with caution (probably only in the main thread).

A pragmatical approach would be to have most of your threads running some event loop (using poll(2), pselect(2), perhaps eventfd(2), signalfd(2), ....), perhaps communicating using pipe(7) or unix(7) sockets. See also socket(7).

Don't forget to document (on paper) the communication protocols between threads. For a theoretical approach, read books about π-calculus and be aware of Rice's theorem : debugging concurrent programs is difficult.