How do you make good use of multicore CPUs in your PHP/MySQL applications? [closed]
I maintain a custom built CMS-like application.
Whenever a document is submitted, several tasks are performed that can be roughly grouped into the following categories:
- MySQL queries.
- HTML content parsing.
- Search index updating.
Category 1 includes updates to various MySQL tables relating to a document's content.
Category 2 includes parsing of HTML content stored in MySQL LONGTEXT fields to perform some automatic anchor tag transformations. I suspect that a great deal of computation time is spent in this task.
Category 3 includes updates to a simple MySQL-based search index using just a handful of fields corresponding to the document.
All of these tasks need to complete for the document submission to be considered complete.
The machine that hosts this application has dual quad-core Xeon processors (a total of 8 cores). However, whenever a document submits, all PHP code that executes is constrained to a single process running on one of the cores.
My question:
What schemes, if any, have you used to split up your PHP/MySQL web application processing load among multiple CPU cores? My ideal solution would basically spawn a few processes, let them execute in parallel on several cores, and then block until all of the processes are done.
Related question:
What is your favorite PHP performance profiling tool?
Solution 1:
Introduction
PHP has full Multi-Threading support which you can take full advantage of in so many ways. Have been able to demonstrate this Multi-Threading ability in different examples:
- How can one use multi threading in PHP applications
- pcntl runs the same code several times, assistance required
- Improving HTML scraper efficiency with pcntl_fork()
A quick Search would give additional resources.
Categories
1: MySQL queries
MySQL is fully multi-threaded and will make use of multiple CPUs, provided that the operating system supports them, It would also maximize system resources if properly configured for performance.
A typical setting in the my.ini
that affect thread performance is :
thread_cache_size = 8
thread_cache_size can be increased to improve performance if you have a lot of new connections. Normally, this does not provide a notable performance improvement if you have a good thread implementation. However, if your server sees hundreds of connections per second you should normally set thread_cache_size high enough so that most new connections use cached threads
If you are using Solaris then you can use
thread_concurrency = 8
thread_concurrency enables applications to give the threads system a hint about the desired number of threads that should be run at the same time.
This variable is deprecated as of MySQL 5.6.1 and is removed in MySQL 5.7. You should remove this from MySQL configuration files whenever you see it unless they are for Solaris 8 or earlier.
InnoDB: :
You don't have such limitations if you are using Innodb has the storage engine because it full supports thread concurrency
innodb_thread_concurrency // Recommended 2 * CPUs + number of disks
You can also look at innodb_read_io_threads
and innodb_write_io_threads
where the default is 4
and it can be increased to as high as 64
depending on the hardware
Others:
Other configurations to also look at include key_buffer_size
, table_open_cache
, sort_buffer_size
etc. which cal all result in better performance
PHP:
In pure PHP you can create MySQL Worker where each query are executed in separate PHP threads
$sql = new SQLWorker($host, $user, $pass, $db);
$sql->start();
$sql->stack($q1 = new SQLQuery("One long Query"));
$sql->stack($q2 = new SQLQuery("Another long Query"));
$q1->wait();
$q2->wait();
// Do Something Useful
Here is a Full Working Example of SQLWorker
2: HTML content parsing
I suspect that a great deal of computation time is spent in this task.
If you already know the problem then it makes it easier to solve via event loops , Job Queue or using Threads.
Working on one document one at a time can be a very, very slow, painful process. @ka once hacked his way out using ajax to calling multiple request, Some Creative minds would just fork the process using pcntl_fork but if you are using windows
then you can not take advantage of pcntl
With pThreads
supporting both windows and Unix systems, You don't have such limitation. Is as easy as .. If you need to parse 100 document? Spawn 100 Threads ... Simple
HTML Scanning
// Scan my System
$dir = new RecursiveDirectoryIterator($dir, RecursiveDirectoryIterator::SKIP_DOTS);
$dir = new RecursiveIteratorIterator($dir);
// Allowed Extension
$ext = array(
"html",
"htm"
);
// Threads Array
$ts = array();
// Simple Storage
$s = new Sink();
// Start Timer
$time = microtime(true);
$count = 0;
// Parse All HTML
foreach($dir as $html) {
if ($html->isFile() && in_array($html->getExtension(), $ext)) {
$count ++;
$ts[] = new LinkParser("$html", $s);
}
}
// Wait for all Threads to finish
foreach($ts as $t) {
$t->join();
}
// Put The Output
printf("Total Files:\t\t%s \n", number_format($count, 0));
printf("Total Links:\t\t%s \n", number_format($t = count($s), 0));
printf("Finished:\t\t%0.4f sec \n", $tm = microtime(true) - $time);
printf("AvgSpeed:\t\t%0.4f sec per file\n", $tm / $t);
printf("File P/S:\t\t%d file per sec\n", $count / $tm);
printf("Link P/S:\t\t%d links per sec\n", $t / $tm);
Output
Total Files: 8,714
Total Links: 105,109
Finished: 108.3460 sec
AvgSpeed: 0.0010 sec per file
File P/S: 80 file per sec
Link P/S: 907 links per sec
Class Used
Sink
class Sink extends Stackable {
public function run() {
}
}
LinkParser
class LinkParser extends Thread {
public function __construct($file, $sink) {
$this->file = $file;
$this->sink = $sink;
$this->start();
}
public function run() {
$dom = new DOMDocument();
@$dom->loadHTML(file_get_contents($this->file));
foreach($dom->getElementsByTagName('a') as $links) {
$this->sink[] = $links->getAttribute('href');
}
}
}
Experiment
Trying parsing 8,714
files that have 105,109
links without threads and see how long it would take.
Better Architecture
Spawning too many threads which is not a clever thing to do In production. A better approch would be to use Pooling. Have a pool of define Workers then stack with a Task
Performance Improvement
Fine, the example above can still be improved. Instead of waiting for the system to scan all files in a single thread you can use multiple threads to scan my system for files then stack the data to Workers for processing
3: Search index updating
This has been pretty much answered by the first answer, but there are so many ways for performance improvement. Have you ever considered an Event based approach?
Introducing Event
@rdlowrey Quote 1:
Well think of it like this. Imagine you need to serve 10,000 simultaneously connected clients in your web application. Traditional thread-per-request or process-per-request servers aren't an option because no matter how lightweight your threads are you still can't hold 10,000 of them open at a time.
@rdlowrey Quote 2:
On the other hand, if you keep all the sockets in a single process and listen for those sockets to become readable or writable you can put your entire server inside a single event loop and operate on each socket only when there's something to read/write.
Why don't you experiment with event-driven
, non-blocking I/O
approach to your problem. PHP has libevent to supercharge your application.
I know this question is all Multi-Threading
but if you have some time you can look this Nuclear Reactor written in PHP by @igorw
Finally
Consideration
I think you should consider using Cache
and Job Queue
for some of your tasks. You can easily have a message saying
Document uploaded for processing ..... 5% - Done
Then do all the time wasting tasks in the background. Please look at Making a large processing job smaller for a similar case study.
Profiling
Profiling Tool? There is no single profile tool for a web application from Xdebug to Yslow are all very useful. Eg. Xdebug is not useful when it comes to threads because its not supported
I don't have a favorite
Solution 2:
PHP is not quite oriented towards multi-threading : as you already noticed, each page is served by one PHP process -- that does one thing at a time, including just "waiting" while an SQL query is executed on the database server.
There is not much you can do about that, unfortunately : it's the way PHP works.
Still, here's a couple of thoughts :
- First of all, you'll probably have more that 1 user at a time on your server, which means you'll serve several pages at the same time, which, in turn, means you'll have several PHP processes and SQL queries running at the same time... which means several cores of your server will be used.
- Each PHP process will run on one core, in response to the request of one user, but there are several sub-processes of Apache running in parallel (one for each request, up to a couple of dozens or hundreds, depending on your configuration)
- The MySQL server is multi-threaded, which means it can use several distinct cores to answer several concurrent requests -- even if each request cannot be served by more that one core.
So, in fact, your server's 8 core will end up being used ;-)
And, if you think your pages are taking too long to generate, a possible solution is to separate your calculations in two groups :
- On one hand, the things that have to be done to generate the page : for those, there is not much you can do
- On the other hand, the things that have to be run sometimes, but not necessarily immediately
- For instance, I am think about some statistics calculations : you want them to be quite up to date, but if they lag a couple of minutes behind, that's generally quite OK.
- Same for e-mail sending : anyway, several minutes will pass before your users receive/read their mail, so there is no need to send them immediately.
For the kind of situations in my second point, as you don't need those things done immediately... Well, just don't do them immediately ;-)
A solution that I often use is some queuing mechanism :
- The web application store things in a "todo-list"
- And that "todo-list" is de-queued by some batches that are run frequently via a cronjob
And for some other manipulations, you just want them run every X minutes -- and, here too, a cronjob is the perfect tool.
Solution 3:
Scaling out Web Servers is not going to make MySQL budge one inch when it comes to accessing Multicore CPUs. Why? First consider the two main Storage Engines of MySQL
MyISAM
This storage engine does not access multiple cores. It never has and never will. It does full table locking for each INSERT, UPDATE, and DELETE. Sending queries from multiple Web Servers to do anything with a MyISAM just gets bottlenecked.
InnoDB
Prior to MySQL 5.1.38, this storage engine has accessed only one CPU. You had to do strange things like run MySQL multiple times on one machine to coerce the cores to handle different instances of MySQL. Then, have the Web Servers' DB connections load balanced among the multiple instances. That's old school (especially if you are using versions of MySQL before MySQl 5.1.38).
Starting with MySQL 5.1.38, you install the new InnoDB Plugin. It has features that you have to tune for getting InnoDB to access multiple CPUs. I have written about this in the DBA StackExchange
-
Sep 20, 2011
: Multi cores and MySQL Performance -
Sep 12, 2011
: Possible to make MySQL use more than one core? -
May 26, 2011
: About single threaded versus multithreaded databases performance
Those new features are fully available in MySQL 5.5/5.6 and Percona Server as well.
CAVEAT
If your custom CMS uses FULLTEXT indexing/searching, you should upgrade to MySQL 5.6 because InnoDB now supports FULLTEXT indexing/searching.
Installing to MySQL 5.6 is not going to automatically make the CPUs get going. You will have to tune it because, LEFT UNCONFIGURED, it is possible for older versions of MySQL to outrun and outgun newer versions:
-
Nov 24, 2011
: Why mysql 5.5 slower than 5.1 (linux,using mysqlslap) -
Oct 05, 2011
: Query runs a long time in some newer MySQL versions -
Jun 19, 2011
: How do I properly perform a MySQL bake-off?
Solution 4:
This might not be an answer to the question you are looking for, but the solution you seek deals with threading. Threading is necessary for multicore-programming, and threading is not implemented in PHP.
But, in a sense, you could fake threading in PHP by relying on the operating system's multitasking abilities. I suggest given a quick overview of Multi-threading strategies in PHP to develop a strategy to achieve what you need.
Dead link: Multi-threading strategies in PHP