MySQL Error: Can't create thread to handle new connection(errno= 11) triggered by many Gunicorn API requests

Solution 1:

max_connections        = 20000

is much too high. 200 is more realistic. If you are trying to have 20K connection open at the same time, there are architectural problems in your system.

API requests should come and go in milliseconds, thereby not piling up 20K live connections.

If you client (Apache, Tomcat, whatever) is allowing 20K threads to run, then that is a problem.

Analysis of STATUS/VARIABLES

Observations:

  • Version: 5.7.23-0ubuntu0.16.04.1
  • 16 GB of RAM
  • Uptime = 05:08:49; some GLOBAL STATUS values may not be meaningful yet.
  • You are not running on Windows.
  • Running 64-bit version
  • You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

Lots of SHOW commands -- What is going on?

A lot of queries use internal temp tables or do full table scans. Lower long_query_time and turn on the slowlog to see what the worst are.

Details and other observations:

( innodb_buffer_pool_size / _ram ) = 2048M / 16384M = 12.5% -- % of RAM used for InnoDB buffer_pool

( (key_buffer_size / 0.20 + innodb_buffer_pool_size / 0.70) / _ram ) = (16M / 0.20 + 2048M / 0.70) / 16384M = 18.3% -- Most of available ram should be made available for caching. -- http://mysql.rjweb.org/doc.php/memory

( Innodb_buffer_pool_pages_free / Innodb_buffer_pool_pages_total ) = 67,332 / 131056 = 51.4% -- Pct of buffer_pool currently not in use -- innodb_buffer_pool_size is bigger than necessary?

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 18,529 / 60 * 256M / 122842112 = 674 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size. (Cannot change in AWS.)

( innodb_flush_method ) = innodb_flush_method = -- How InnoDB should ask the OS to write blocks. Suggest O_DIRECT or O_ALL_DIRECT (Percona) to avoid double buffering. (At least for Unix.) See chrischandler for caveat about O_ALL_DIRECT

( Com_rollback ) = 65,020 / 18529 = 3.5 /sec -- ROLLBACKs in InnoDB. -- An excessive frequency of rollbacks may indicate inefficient app logic.

( Handler_rollback ) = 35,725 / 18529 = 1.9 /sec -- Why so many rollbacks?

( Innodb_rows_deleted / Innodb_rows_inserted ) = 250,597 / 306605 = 0.817 -- Churn -- "Don't queue it, just do it." (If MySQL is being used as a queue.)

( innodb_flush_neighbors ) = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( (Com_show_create_table + Com_show_fields) / Questions ) = (1 + 19522) / 140291 = 13.9% -- Naughty framework -- spending a lot of effort rediscovering the schema. -- Complain to the 3rd party vendor.

( local_infile ) = local_infile = ON -- local_infile = ON is a potential security issue

( (Queries-Questions)/Queries ) = (24488180-140291)/24488180 = 99.4% -- Fraction of queries that are inside Stored Routines. -- (Not bad if high; but it impacts the validity of some other conclusions.)

( Created_tmp_disk_tables ) = 19,628 / 18529 = 1.1 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size and max_heap_table_size. Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM. Better indexes and reformulation of queries are more likely to help.

( Created_tmp_disk_tables / Questions ) = 19,628 / 140291 = 14.0% -- Pct of queries that needed on-disk tmp table. -- Better indexes / No blobs / etc.

( Created_tmp_disk_tables / Created_tmp_tables ) = 19,628 / 22476 = 87.3% -- Percent of temp tables that spilled to disk -- Maybe increase tmp_table_size and max_heap_table_size; improve indexes; avoid blobs, etc.

( Com_rollback / Com_commit ) = 65,020 / 765 = 8499.3% -- Rollback : Commit ratio -- Rollbacks are costly; change app logic

( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (669 + 24 + 164 + 1) / 765 = 1.12 -- Statements per Commit (assuming all InnoDB) -- Low: Might help to group queries together in transactions; High: long transactions strain various things.

( Select_scan ) = 25,262 / 18529 = 1.4 /sec -- full table scans -- Add indexes / optimize queries (unless they are tiny tables)

( Select_scan / Com_select ) = 25,262 / 38182 = 66.2% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( innodb_autoinc_lock_mode ) = 1 -- Galera: desires 2 -- 2 = "interleaved"; 1 = "consecutive" is typical; 0 = "traditional".

( slow_query_log ) = slow_query_log = OFF -- Whether to log slow queries. (5.1.12)

( long_query_time ) = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( Aborted_clients / Connections ) = 1,010 / 1457 = 69.3% -- Threads bumped due to timeout -- Increase wait_timeout; be nice, use disconnect

( thread_cache_size ) = 500 -- How many extra processes to keep around (Not relevant when using thread pooling) (Autosized as of 5.6.8; based on max_connections)

( thread_cache_size / max_connections ) = 500 / 500 = 100.0%

( thread_cache_size / Max_used_connections ) = 500 / 136 = 367.6% -- There is no advantage in having the thread cache bigger than your likely number of connections. Wasting space is the disadvantage.

Abnormally large:

Com_kill = 0.39 /HR
Com_show_charsets = 0.39 /HR
Com_show_fields = 1.1 /sec
Com_show_slave_hosts = 0.39 /HR
Com_show_storage_engines = 0.78 /HR
Com_show_warnings = 38 /HR
Handler_read_next / Handler_read_key = 5,206
Innodb_dblwr_pages_written / Innodb_dblwr_writes = 62.7
Performance_schema_file_instances_lost = 1
gtid_executed_compression_period = 0.054 /sec
wait_timeout = 1.0e+6

Abnormal strings:

ft_boolean_syntax = + -><()~*:&
innodb_fast_shutdown = 1
optimizer_trace = enabled=off,one_line=off
optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on
session_track_system_variables = time_zone, autocommit, character_set_client, character_set_results, character_set_connection
slave_rows_search_algorithms = TABLE_SCAN,INDEX_SCAN