What's the performance cost of "include" in PHP?

Just wondering if anyone has information on what "costs" are associated with including a LARGE (600K or more) php file containing 100s of class files. Does it really make much difference in comparison to autoloading individual files that for instance searches across several directories before finding a match?

Would having APC caching on make this cost negligible?


Solution 1:

Basically, the cost of including one big file depend on your usecase. Let's say you have a large file with 200 classes.

If you only use 1 class, including the large file will be more expensive than including a small class file for that individual class.

If you use all 200 classes, including the large file will be significantly less expensive than including 200 small files.

Where the cutoff lies is really system dependent. I would imaging that it would be somewhere around the 50% mark (where if you're using less than 100 classes in any one request, autoload).

And using APC will likely shift the breakeven point closer to less classes (so without, 100 classes used might be the breakeven point, but with it might be at 50 classes used) since it makes the large single include much cheaper, but only lowers the overhead of each individual smaller include slightly.

The exact break-even points will be 100% system dependent (how fast is your disk I/O, how fast are your processors, how much memory, etc). So the only way to know for sure on your platform is to test.

However, more is at stake than raw performance. Maintainability will suffer with one large file since it's harder to work on multiple classes at the same time (tabs in an IDE become useless). I personally would keep all the classes in separate files and make my life as the developer easier rather than making one giant monstrosity of a file.

Now, if you have facebook traffic levels, it may be worth investigating further. But if you're not, I personally wouldn't worry about it...

Solution 2:

I have conducted some tests on the various cost(s) of php include() which I'd like to share, as I see many programmers or CMS platforms overlooking these pre-runtime php costs.

The cost of the function itself is quite negligible. 100 file includes (with empty files) costs about 5ms; and no more than one microsecond when using an opcache.

So the cost savings of including a larger php file containing 100 classes, as opposed to 100 separate file includes, is only about 5ms. And using an OpCode cache makes that cost irrelevant.

The real cost come with the size of your files, and what PHP has to parse and/or compile. For a better idea of what those cost are, here are test results I conducted on a 2010 Mac Mini Server, with a 10,000 RPM drive, running PHP 5.3 with an optimizer enabled eAccelerator opcache.

1µs  for 100 EMPTY File includes, w/opcache
5ms  for 100 EMPTY File includes, no opcache

7ms   for 100 32KB File includes, w/opcache
30ms  for 100 32KB File includes, no opcache

14ms  for 100 64KB File includes, w/opcache
60ms  for 100 64KB File includes, no opcache

22ms  for 100 128KB File includes, w/opcache
100ms for 100 128KB File includes, no opcache

38ms  for 100 200KB File includes, w/opcache
170ms for 100 200KB File includes, no opcache

Therefore, a 600KB php file roughly cost 6ms, or about 1ms when using an opcode cache. What you really want to watch instead is the size of all code included per request.

Merging file in combos to try and save resources is definitely not a good idea and would be a mistake when using an op-cache. My test doesn't account for disk speed very much if at all, as I included the same file 100 times. That said I don't feel the need to cover disk I/O at all, because having an op-cache installed is really a prerequisite in term of basic performance.

To gain performance as much as possible and save RAM usage, one must do the opposite. Which is to split files contextually as much as possible, with the use of an autoloader or a class factory pattern, to include as little unused code as possible for each and every request.

To that effect, misusing include_once() can also have negative performance consequences...

In regards to your base classes. I have similar circumstances, but I only include a tiny portion of the table schema. Mainly the field types and primary key details. For performance reasons, I purposely do not include the quite heavy schema of the tables all the time, because they are rarely used, and when they are, I use only a couple of them maximum per request.

The average full column details of a table being roughly 20-50k per schema arrays. Including 10-15 of them on any given request cost just about 1-3 ms for the arrays. Which in itself, is not much. But it becomes worthwhile when combined with a 500k RAM saving per request.

Solution 3:

APC will save you a lot, but I don't know if it will be negligible if your source is 600k. That is about 15000 lines of code? Not that much for a website, but quite large for a single file.

You'd rather use a more dynamic approach and isolation specific functionality in specific classes. Then, for each page, you can choose which code is needed.

Especially when you use APC, this approach will be better, because you don't have the overhead of file I/O which you will have when you load many small files from disk. I would choose to implement small, specified classes and put each of those in a separate file. You can use the PHP class loading mechanism (__autoload) to automatically load the right units.

When you figure out a good naming convention for your classes and units, this will make your development a lot easier.