Why can templates only be implemented in the header file?

Solution 1:

Caveat: It is not necessary to put the implementation in the header file, see the alternative solution at the end of this answer.

Anyway, the reason your code is failing is that, when instantiating a template, the compiler creates a new class with the given template argument. For example:

template<typename T>
struct Foo
{
    T bar;
    void doSomething(T param) {/* do stuff using T */}
};

// somewhere in a .cpp
Foo<int> f; 

When reading this line, the compiler will create a new class (let's call it FooInt), which is equivalent to the following:

struct FooInt
{
    int bar;
    void doSomething(int param) {/* do stuff using int */}
}

Consequently, the compiler needs to have access to the implementation of the methods, to instantiate them with the template argument (in this case int). If these implementations were not in the header, they wouldn't be accessible, and therefore the compiler wouldn't be able to instantiate the template.

A common solution to this is to write the template declaration in a header file, then implement the class in an implementation file (for example .tpp), and include this implementation file at the end of the header.

Foo.h

template <typename T>
struct Foo
{
    void doSomething(T param);
};

#include "Foo.tpp"

Foo.tpp

template <typename T>
void Foo<T>::doSomething(T param)
{
    //implementation
}

This way, implementation is still separated from declaration, but is accessible to the compiler.

Alternative solution

Another solution is to keep the implementation separated, and explicitly instantiate all the template instances you'll need:

Foo.h

// no implementation
template <typename T> struct Foo { ... };

Foo.cpp

// implementation of Foo's methods

// explicit instantiations
template class Foo<int>;
template class Foo<float>;
// You will only be able to use Foo with int or float

If my explanation isn't clear enough, you can have a look at the C++ Super-FAQ on this subject.

Solution 2:

It's because of the requirement for separate compilation and because templates are instantiation-style polymorphism.

Lets get a little closer to concrete for an explanation. Say I've got the following files:

  • foo.h
    • declares the interface of class MyClass<T>
  • foo.cpp
    • defines the implementation of class MyClass<T>
  • bar.cpp
    • uses MyClass<int>

Separate compilation means I should be able to compile foo.cpp independently from bar.cpp. The compiler does all the hard work of analysis, optimization, and code generation on each compilation unit completely independently; we don't need to do whole-program analysis. It's only the linker that needs to handle the entire program at once, and the linker's job is substantially easier.

bar.cpp doesn't even need to exist when I compile foo.cpp, but I should still be able to link the foo.o I already had together with the bar.o I've only just produced, without needing to recompile foo.cpp. foo.cpp could even be compiled into a dynamic library, distributed somewhere else without foo.cpp, and linked with code they write years after I wrote foo.cpp.

"Instantiation-style polymorphism" means that the template MyClass<T> isn't really a generic class that can be compiled to code that can work for any value of T. That would add overhead such as boxing, needing to pass function pointers to allocators and constructors, etc. The intention of C++ templates is to avoid having to write nearly identical class MyClass_int, class MyClass_float, etc, but to still be able to end up with compiled code that is mostly as if we had written each version separately. So a template is literally a template; a class template is not a class, it's a recipe for creating a new class for each T we encounter. A template cannot be compiled into code, only the result of instantiating the template can be compiled.

So when foo.cpp is compiled, the compiler can't see bar.cpp to know that MyClass<int> is needed. It can see the template MyClass<T>, but it can't emit code for that (it's a template, not a class). And when bar.cpp is compiled, the compiler can see that it needs to create a MyClass<int>, but it can't see the template MyClass<T> (only its interface in foo.h) so it can't create it.

If foo.cpp itself uses MyClass<int>, then code for that will be generated while compiling foo.cpp, so when bar.o is linked to foo.o they can be hooked up and will work. We can use that fact to allow a finite set of template instantiations to be implemented in a .cpp file by writing a single template. But there's no way for bar.cpp to use the template as a template and instantiate it on whatever types it likes; it can only use pre-existing versions of the templated class that the author of foo.cpp thought to provide.

You might think that when compiling a template the compiler should "generate all versions", with the ones that are never used being filtered out during linking. Aside from the huge overhead and the extreme difficulties such an approach would face because "type modifier" features like pointers and arrays allow even just the built-in types to give rise to an infinite number of types, what happens when I now extend my program by adding:

  • baz.cpp
    • declares and implements class BazPrivate, and uses MyClass<BazPrivate>

There is no possible way that this could work unless we either

  1. Have to recompile foo.cpp every time we change any other file in the program, in case it added a new novel instantiation of MyClass<T>
  2. Require that baz.cpp contains (possibly via header includes) the full template of MyClass<T>, so that the compiler can generate MyClass<BazPrivate> during compilation of baz.cpp.

Nobody likes (1), because whole-program-analysis compilation systems take forever to compile , and because it makes it impossible to distribute compiled libraries without the source code. So we have (2) instead.

Solution 3:

Plenty correct answers here, but I wanted to add this (for completeness):

If you, at the bottom of the implementation cpp file, do explicit instantiation of all the types the template will be used with, the linker will be able to find them as usual.

Edit: Adding example of explicit template instantiation. Used after the template has been defined, and all member functions has been defined.

template class vector<int>;

This will instantiate (and thus make available to the linker) the class and all its member functions (only). Similar syntax works for function templates, so if you have non-member operator overloads you may need to do the same for those.

The above example is fairly useless since vector is fully defined in headers, except when a common include file (precompiled header?) uses extern template class vector<int> so as to keep it from instantiating it in all the other (1000?) files that use vector.

Solution 4:

Templates need to be instantiated by the compiler before actually compiling them into object code. This instantiation can only be achieved if the template arguments are known. Now imagine a scenario where a template function is declared in a.h, defined in a.cpp and used in b.cpp. When a.cpp is compiled, it is not necessarily known that the upcoming compilation b.cpp will require an instance of the template, let alone which specific instance would that be. For more header and source files, the situation can quickly get more complicated.

One can argue that compilers can be made smarter to "look ahead" for all uses of the template, but I'm sure that it wouldn't be difficult to create recursive or otherwise complicated scenarios. AFAIK, compilers don't do such look aheads. As Anton pointed out, some compilers support explicit export declarations of template instantiations, but not all compilers support it (yet?).