Weak performance of CGEventPost under GPU load

We've stumbled upon a performance problem with Quartz Events, more specifically CGEventPost: during heavy GPU load CGEventPost can block. We've created a small benchmark application to demonstrate the issue. This application is just a loop that creates, post and releases events.

Below you can see the results of running the application. The first run is on an idle system. The second run is with FurMark (GPU stress test) with the dials cranked up as much as possible.

  • Inner is how long the inner loop takes, basically just creating, posting and releasing an event with Quartz Events.
  • Outer is how long our program is waiting to be woken up (a sleep). Should be close to the time we sleep, but if the system is under pressure it could be delayed.
  • Post is how long the event post takes.

 

18:58:01.683 EventPerformance[4946:707] Measurements: (outer should be close to 10)
18:58:01.684 EventPerformance[4946:707] inner (ms): 0.04, outer (ms): 11.02, CGEventPost (ms): 0.03
18:58:01.684 EventPerformance[4946:707] inner (ms): 0.04, outer (ms): 11.02, CGEventPost (ms): 0.03
18:58:01.685 EventPerformance[4946:707] inner (ms): 0.07, outer (ms): 10.26, CGEventPost (ms): 0.03
18:58:01.685 EventPerformance[4946:707] inner (ms): 0.06, outer (ms): 10.85, CGEventPost (ms): 0.05
18:58:01.686 EventPerformance[4946:707] inner (ms): 0.07, outer (ms): 10.41, CGEventPost (ms): 0.04
18:58:01.686 EventPerformance[4946:707] inner (ms): 0.04, outer (ms): 10.39, CGEventPost (ms): 0.03
18:58:01.686 EventPerformance[4946:707] inner (ms): 0.05, outer (ms): 11.02, CGEventPost (ms): 0.03
18:58:01.687 EventPerformance[4946:707] inner (ms): 0.03, outer (ms): 10.67, CGEventPost (ms): 0.03
18:58:01.687 EventPerformance[4946:707] inner (ms): 0.08, outer (ms): 10.09, CGEventPost (ms): 0.05
18:58:01.688 EventPerformance[4946:707] Averages: (outer should be close to 10)
18:58:01.688 EventPerformance[4946:707] avg inner (ms): 0.05, avg outer (ms): 10.64, avg post (ms): 0.03

Here we can see that posting the event takes about 0.03 ms on average. Also the thread seems to be woken up around 0.5ms too late. No spikes in CGEventPost.

19:02:02.150 EventPerformance[5241:707] Measurements: (outer should be close to 10)
19:02:02.151 EventPerformance[5241:707] inner (ms): 0.03, outer (ms): 10.23, CGEventPost (ms): 0.02
19:02:02.151 EventPerformance[5241:707] inner (ms): 0.02, outer (ms): 10.54, CGEventPost (ms): 0.02
19:02:02.151 EventPerformance[5241:707] inner (ms): 0.02, outer (ms): 11.01, CGEventPost (ms): 0.01
19:02:02.152 EventPerformance[5241:707] inner (ms): 0.02, outer (ms): 10.74, CGEventPost (ms): 0.01
19:02:02.152 EventPerformance[5241:707] inner (ms): 0.02, outer (ms): 10.20, CGEventPost (ms): 0.01
19:02:02.152 EventPerformance[5241:707] inner (ms): 10.35, outer (ms): 11.01, CGEventPost (ms): 10.35
19:02:02.152 EventPerformance[5241:707] inner (ms): 0.03, outer (ms): 10.02, CGEventPost (ms): 0.02
19:02:02.153 EventPerformance[5241:707] inner (ms): 58.90, outer (ms): 10.11, CGEventPost (ms): 58.90
19:02:02.153 EventPerformance[5241:707] inner (ms): 0.03, outer (ms): 10.12, CGEventPost (ms): 0.02
19:02:02.153 EventPerformance[5241:707] Averages: (outer should be close to 10)
19:02:02.371 EventPerformance[5241:707] avg inner (ms): 7.71, avg outer (ms): 10.44, avg post (ms): 7.71

When the system is under heavy GPU load, posting an event can take (spikes) milliseconds instead of microseconds. Under extreme GPU stress (< 1 FPS), this value can take seconds. CGEventPost sometimes seems to be waiting for the GPU to finish some work before returning. Our thread is still scheduled normally with no noticeable delay/spikes (outer).

Any ideas are appreciated.


Solution 1:

I guess you're filling up the queue (underlying mach port)...

You can confirm this using the "scheduling" or "system call" instrument in Instruments. (Create a new blank document, add the instrument, then under File > Record Options... make sure "deferred mode" is checked.) This will show all thread activity in your app (when threads block, when they sleep, when they are activated, and why).

I would first try raising the thread priority (cf. man 3 PTHREAD_SCHEDPARAM) of the thread calling CGEventPost. If your thread is blocked on a lower priority thread, the kernel should temporarily elevate the priority of the blocking thread to avoid priority inversion and help your task complete earlier.

Overall I think you will have to implement a 2-thread solution, as below:

Create a queue for events you want to post. Post events to this queue from your main thread (or event posting thread), then signal a second thread (an event consumer thread you create) to walk the queue and post any outstanding events with CGEventPost.

When CGEventPost blocks, your second event-posting thread will block, but this won't block any other threads. When CGEventPost eventually unblocks, it will consume any outstanding events posted by your event consumer thread and the event consumer thread can resume posting events.

Another possibility: can you coallesce events? There are certain types of events (mouse moves?) that you could coallesce into fewer events. You will probably still run into the queue limit of CGEventPost at times, I think the 2-thread approach is probably your best bet.