How expensive is RTTI?
I understand that there is a resource hit from using RTTI, but how big is it? Everywhere I've looked just says that "RTTI is expensive," but none of them actually give any benchmarks or quantitative data reguarding memory, processor time, or speed.
So, just how expensive is RTTI? I might use it on an embedded system where I have only 4MB of RAM, so every bit counts.
Edit: As per S. Lott's answer, it would be better if I include what I'm actually doing. I am using a class to pass in data of different lengths and that can perform different actions, so it would be difficult to do this using only virtual functions. It seems that using a few dynamic_cast
s could remedy this problem by allowing the different derived classes to be passed through the different levels yet still allow them to act completely differently.
From my understanding, dynamic_cast
uses RTTI, so I was wondering how feasable it would be to use on a limited system.
Regardless of compiler, you can always save on runtime if you can afford to do
if (typeid(a) == typeid(b)) {
B* ba = static_cast<B*>(&a);
etc;
}
instead of
B* ba = dynamic_cast<B*>(&a);
if (ba) {
etc;
}
The former involves only one comparison of std::type_info
; the latter necessarily involves traversing an inheritance tree plus comparisons.
Past that ... like everyone says, the resource usage is implementation specific.
I agree with everyone else's comments that the submitter should avoid RTTI for design reasons. However, there are good reasons to use RTTI (mainly because of boost::any). That in mind, it's useful to know its actual resource usage in common implementations.
I recently did a bunch of research into RTTI in GCC.
tl;dr: RTTI in GCC uses negligible space and typeid(a) == typeid(b)
is very fast, on many platforms (Linux, BSD and maybe embedded platforms, but not mingw32). If you know you'll always be on a blessed platform, RTTI is very close to free.
Gritty details:
GCC prefers to use a particular "vendor-neutral" C++ ABI[1], and always uses this ABI for Linux and BSD targets[2]. For platforms that support this ABI and also weak linkage, typeid()
returns a consistent and unique object for each type, even across dynamic linking boundaries. You can test &typeid(a) == &typeid(b)
, or just rely on the fact that the portable test typeid(a) == typeid(b)
does actually just compare a pointer internally.
In GCC's preferred ABI, a class vtable always holds a pointer to a per-type RTTI structure, though it might not be used. So a typeid()
call itself should only cost as much as any other vtable lookup (the same as calling a virtual member function), and RTTI support shouldn't use any extra space for each object.
From what I can make out, the RTTI structures used by GCC (these are all the subclasses of std::type_info
) only hold a few bytes for each type, aside from the name. It isn't clear to me whether the names are present in the output code even with -fno-rtti
. Either way, the change in size of the compiled binary should reflect the change in runtime memory usage.
A quick experiment (using GCC 4.4.3 on Ubuntu 10.04 64-bit) shows that -fno-rtti
actually increases the binary size of a simple test program by a few hundred bytes. This happens consistently across combinations of -g
and -O3
. I'm not sure why the size would increase; one possibility is that GCC's STL code behaves differently without RTTI (since exceptions won't work).
[1] Known as the Itanium C++ ABI, documented at http://www.codesourcery.com/public/cxx-abi/abi.html. The names are horribly confusing: the name refers to the original development architecture, though the ABI specification works on lots of architectures including i686/x86_64. Comments in GCC's internal source and STL code refer to Itanium as the "new" ABI in contrast to the "old" one they used before. Worse, the "new"/Itanium ABI refers to all versions available through -fabi-version
; the "old" ABI predated this versioning. GCC adopted the Itanium/versioned/"new" ABI in version 3.0; the "old" ABI was used in 2.95 and earlier, if I am reading their changelogs correctly.
[2] I couldn't find any resource listing std::type_info
object stability by platform. For compilers I had access to, I used the following: echo "#include <typeinfo>" | gcc -E -dM -x c++ -c - | grep GXX_MERGED_TYPEINFO_NAMES
. This macro controls the behavior of operator==
for std::type_info
in GCC's STL, as of GCC 3.0. I did find that mingw32-gcc obeys the Windows C++ ABI, where std::type_info
objects aren't unique for a type across DLLs; typeid(a) == typeid(b)
calls strcmp
under the covers. I speculate that on single-program embedded targets like AVR, where there is no code to link against, std::type_info
objects are always stable.
Perhaps these figures would help.
I was doing a quick test using this:
- GCC Clock() + XCode's Profiler.
- 100,000,000 loop iterations.
- 2 x 2.66 GHz Dual-Core Intel Xeon.
- The class in question is derived from a single base class.
- typeid().name() returns "N12fastdelegate13FastDelegate1IivEE"
5 Cases were tested:
1) dynamic_cast< FireType* >( mDelegate )
2) typeid( *iDelegate ) == typeid( *mDelegate )
3) typeid( *iDelegate ).name() == typeid( *mDelegate ).name()
4) &typeid( *iDelegate ) == &typeid( *mDelegate )
5) {
fastdelegate::FastDelegateBase *iDelegate;
iDelegate = new fastdelegate::FastDelegate1< t1 >;
typeid( *iDelegate ) == typeid( *mDelegate )
}
5 is just my actual code, as I needed to create an object of that type before checking if it is similar to one I already have.
Without Optimisation
For which the results were (I've averaged a few runs):
1) 1,840,000 Ticks (~2 Seconds) - dynamic_cast
2) 870,000 Ticks (~1 Second) - typeid()
3) 890,000 Ticks (~1 Second) - typeid().name()
4) 615,000 Ticks (~1 Second) - &typeid()
5) 14,261,000 Ticks (~23 Seconds) - typeid() with extra variable allocations.
So the conclusion would be:
- For simple cast cases without optimisation
typeid()
is more than twice faster thandyncamic_cast
. - On a modern machine the difference between the two is about 1 nanosecond (a millionth of a millisecond).
With Optimisation (-Os)
1) 1,356,000 Ticks - dynamic_cast
2) 76,000 Ticks - typeid()
3) 76,000 Ticks - typeid().name()
4) 75,000 Ticks - &typeid()
5) 75,000 Ticks - typeid() with extra variable allocations.
So the conclusion would be:
- For simple cast cases with optimisation,
typeid()
is nearly x20 faster thandyncamic_cast
.
Chart
The Code
As requested in the comments, the code is below (a bit messy, but works). 'FastDelegate.h' is available from here.
#include <iostream>
#include "FastDelegate.h"
#include "cycle.h"
#include "time.h"
// Undefine for typeid checks
#define CAST
class ZoomManager
{
public:
template < class Observer, class t1 >
void Subscribe( void *aObj, void (Observer::*func )( t1 a1 ) )
{
mDelegate = new fastdelegate::FastDelegate1< t1 >;
std::cout << "Subscribe\n";
Fire( true );
}
template< class t1 >
void Fire( t1 a1 )
{
fastdelegate::FastDelegateBase *iDelegate;
iDelegate = new fastdelegate::FastDelegate1< t1 >;
int t = 0;
ticks start = getticks();
clock_t iStart, iEnd;
iStart = clock();
typedef fastdelegate::FastDelegate1< t1 > FireType;
for ( int i = 0; i < 100000000; i++ ) {
#ifdef CAST
if ( dynamic_cast< FireType* >( mDelegate ) )
#else
// Change this line for comparisons .name() and & comparisons
if ( typeid( *iDelegate ) == typeid( *mDelegate ) )
#endif
{
t++;
} else {
t--;
}
}
iEnd = clock();
printf("Clock ticks: %i,\n", iEnd - iStart );
std::cout << typeid( *mDelegate ).name()<<"\n";
ticks end = getticks();
double e = elapsed(start, end);
std::cout << "Elasped: " << e;
}
template< class t1, class t2 >
void Fire( t1 a1, t2 a2 )
{
std::cout << "Fire\n";
}
fastdelegate::FastDelegateBase *mDelegate;
};
class Scaler
{
public:
Scaler( ZoomManager *aZoomManager ) :
mZoomManager( aZoomManager ) { }
void Sub()
{
mZoomManager->Subscribe( this, &Scaler::OnSizeChanged );
}
void OnSizeChanged( int X )
{
std::cout << "Yey!\n";
}
private:
ZoomManager *mZoomManager;
};
int main(int argc, const char * argv[])
{
ZoomManager *iZoomManager = new ZoomManager();
Scaler iScaler( iZoomManager );
iScaler.Sub();
delete iZoomManager;
return 0;
}
It depends on the scale of things. For the most part it's just a couple checks and a few pointer dereferences. In most implementations, at the top of every object that has virtual functions, there is a pointer to a vtable that holds a list of pointers to all the implementations of the virtual function on that class. I would guess that most implementations would use this to either store another pointer to the type_info structure for the class.
For example in pseudo-c++:
struct Base
{
virtual ~Base() {}
};
struct Derived
{
virtual ~Derived() {}
};
int main()
{
Base *d = new Derived();
const char *name = typeid(*d).name(); // C++ way
// faked up way (this won't actually work, but gives an idea of what might be happening in some implementations).
const vtable *vt = reinterpret_cast<vtable *>(d);
type_info *ti = vt->typeinfo;
const char *name = ProcessRawName(ti->name);
}
In general the real argument against RTTI is the unmaintainability of having to modify code everywhere every time you add a new derived class. Instead of switch statements everywhere, factor those into virtual functions. This moves all the code that is different between classes into the classes themselves, so that a new derivation just needs to override all the virtual functions to become a fully functioning class. If you've ever had to hunt through a large code base for every time someone checks the type of a class and does something different, you'll quickly learn to stay away from that style of programming.
If your compiler lets you totally turn off RTTI, the final resulting code size savings can be significant though, with such a small RAM space. The compiler needs to generate a type_info structure for every single class with a virtual function. If you turn off RTTI, all these structures do not need to be included in the executable image.