How to scale php5+MySQL above 200 requests/second?

I am tweaking my homepage for performance, currently it handles about 200 requests/second on 3.14.by which eats 6 SQL queries, and 20 req/second on 3.14.by/forum which is phpBB forum.

Strangely enough, numbers are about the same on some VPS and dedicated Atom 330 server.

Server software is the following: Apache2+mod_php prefork 4 childs (tried different numbers here), php5, APC, nginx, memcached for PHP sessions storage.

MySQL is configured to eat about 30% of available RAM (~150Mb on VPS, 700Mb on dedicated server)

This looks like there is a bottleneck somewhere not allowing me to go higher, any suggestions? (i.e. I know that doing less than 6 SQL's would make it faster, but this does not look like a limiting factor, as sqld eats no more than a few % in top due to cached queries)

Has anyone tested that kicking preforked apache2 and leaving just nginx+php is much faster?

Some more benchmarks

Small 40-byte static file: 1484 r/s via nginx+apache2, 2452 if we talk to apache2 directly. 
Small "Hello world" php script: 458 r/s via ngin+apache2.

Update: It appears bottleneck is MySQL performance on cached data. Page with single SQL shows 354req/sec, with 6 SQL's - 180 req/sec. What do you think I can tweak here? (I can fork out 100-200Mb for MySQL)

[client]
port        = 3306
socket      = /var/run/mysqld/mysqld.sock

[mysqld_safe]
socket      = /var/run/mysqld/mysqld.sock
nice        = 0

[mysqld]
default-character-set=cp1251
collation-server=cp1251_general_cs

skip-character-set-client-handshake

user        = mysql
pid-file    = /var/run/mysqld/mysqld.pid
socket      = /var/run/mysqld/mysqld.sock
port        = 3306
basedir     = /usr
datadir     = /var/lib/mysql
tmpdir      = /tmp
skip-external-locking

bind-address        = 127.0.0.1

key_buffer      = 16M
max_allowed_packet  = 8M
thread_stack        = 64K
thread_cache_size   = 16
sort_buffer_size    = 8M
read_buffer_size    = 1M

myisam-recover      = BACKUP
max_connections        = 650
table_cache            = 256
thread_concurrency     = 10

query_cache_limit       = 1M
query_cache_size        = 16M

expire_logs_days    = 10
max_binlog_size         = 100M

[mysqldump]
quick
quote-names
max_allowed_packet  = 8M

[mysql]
[isamchk]
key_buffer      = 8M

!includedir /etc/mysql/conf.d/

Obviously, there is a lot you can try. Your best bet is chasing your logs for queries that don't use indexes (enable logs for those) and other non-optimized queries. I have compiled a huge list of performance related options over the years, so I've included a small subset here for your information - hopefully it helps. Here are some general notes for things you can try (if you haven't already):

MySQL

  • query_cache_type=1 - cache SQL queries is on. If set to 2, queries are only cached if the SQL_CACHE hint is passed to them. Similarly with type 1, you can disable cache for a particular query with the SQL_NO_CACHE hint
  • key_buffer_size=128M (default: 8M) - memory buffer for MyISAM table indexes. On dedicated servers, aim to set the key_buffer_size to at least a quarter, but no more than half, of the total amount of memory on the server
  • query_cache_size=64M (default: 0) - size of the query cache
  • back_log=100 (default: 50, max: 65535) - The queue of outstanding connection requests. Only matters when there are lots of connections in short time
  • join_buffer_size=1M (default: 131072) - a buffer that's used when having full table scans (no indexes)
  • table_cache=2048 (default: 256) - should be max_user_connections multiplied by the maximum number of JOINs your heaviest SQL query contains. Use the "open_tables" variable at peak times as a guide. Also look at the "opened_tables" variable - it should be close to "open_tables"
  • query_prealloc_size=32K (default: 8K) - persistant memory for statements parsing and execution. Increase if having complex queries
  • sort_buffer_size=16M (default: 2M) - helps with sorting (ORDER BY and GROUP BY operations)
  • read_buffer_size=2M (default: 128K) - Helps with sequential scans. Increase if there are many sequential scans.
  • read_rnd_buffer_size=4M - helps MyISAM table speed up read after sort
  • max_length_for_sort_data - row size to store instead of row pointer in sort file. Can avoid random table reads
  • key_cache_age_threshold=3000 (default: 300) - time to keep key cache in the hot-zone (before it's demoted to warm)
  • key_cache_division_limit=50 (default: 100) - enables a more sophisticated cache eviction mechanism (two levels). Denotes the percentage to keep for the bottom level. delay_key_write=ALL - the key buffer is not flushed for the table on every index update, but only when the table is closed. This speeds up writes on keys a lot, but if you use this feature, you should add automatic checking of all MyISAM tables by starting the server with the --myisam-recover=BACKUP,FORCE option
  • memlock=1 - lock process in memory (to reduce swapping in/out)

Apache

  • change the spawning method (to mpm for example)
  • disable logs if possible
  • AllowOverride None - whenever possible disable .htaccess. It stops apache for looking for .htaccess files if they are not used so it saves a file lookup request
  • SendBufferSize - Set to OS default. On congested networks, you should set this parameter close to the size of the largest file normally downloaded
  • KeepAlive Off (default On) - and install lingerd to properly close network connections and is faster
  • DirectoryIndex index.php - Keep file list as short and absolute as possible.
  • Options FollowSymLinks - to simplify file access process in Apache
  • Avoid using mod_rewrite or at least complex regexs
  • ServerToken=prod

PHP

  • variables_order="GPCS" (If you don't need environment variables)
  • register_globals=Off - apart from being a security risk, it also has a performance impact
  • Keep include_path as minimal as possible (avoids extra filesystem lookups)
  • display_errors=Off - Disable showing errors. Strongly recommended for all production servers (doesn't display ugly error messages in case of a problem).
  • magic_quotes_gpc=Off
  • magic_quotes_*=Off
  • output_buffering=On
  • Disable logging if possible
  • expose_php=Off
  • register_argc_argv=Off
  • always_populate_raw_post_data=Off
  • place php.ini file where php would look for it first.
  • session.gc_divisor=1000 or 10000
  • session.save_path = "N;/path" - For large sites consider using it. Splits session files into subdirectories

OS Tweaks

  • Mount used hard disks with the -o noatime option (no access time). Also add this option to /etc/fstab file.
  • Tweak the /proc/sys/vm/swappiness (from 0 to 100) to see what has best results
  • Use RAM Disks - mount --bind -ttmpfs /tmp /tmp

If the bottleneck is not CPU, then its IO - either network or disc. So.. you need to see how much IO is going on. I wouldn't have thought its the network (unless you're on a 10mbps half-duplex link, but its worth checking the switch in case auto-detect isn't doing its job right).

That leaves disk IO, which can be a big factor especially on VPSs. Use sar or iostat to have a look at the disks, then google how to find more details if your disk is being used heavily.