Server crash caused by MySQL CPU usage

Premise:

I'm not a DBA and i'm not in to servers, but i'm the only person in the company that is able to use a little that stuff.

We have a Windows VPS with the following specs:

CPU: Intel Xeon E5-2630 v4 2.20GHz
RAM: 60GB
SO: Windows Server 2016 Datacenter
HDD: 2TB SSD

Here i have hosted my web applications which access the DB hosted in the same server, the web applications are useb by +- 1000 users which require the data from DB via web applications API, the MySQL version is: 8.0.20 (MySQL Comminity Server - GPL)

And here is my.ini

# Other default tuning values
# MySQL Server Instance Configuration File
# ----------------------------------------------------------------------
# Generated by the MySQL Server Instance Configuration Wizard
# 
# Installation Instructions
# ----------------------------------------------------------------------
# 
# On Linux you can copy this file to /etc/my.cnf to set global options,
# mysql-data-dir/my.cnf to set server-specific options
# (@localstatedir@ for this installation) or to
# ~/.my.cnf to set user-specific options.
# 
# On Windows you should keep this file in the installation directory 
# of your server (e.g. C:\Program Files\MySQL\MySQL Server X.Y). To
# make sure the server reads the config file use the startup option 
# "--defaults-file". 
# 
# To run the server from the command line, execute this in a 
# command line shell, e.g.
# mysqld --defaults-file="C:\Program Files\MySQL\MySQL Server X.Y\my.ini"
# 
# To install the server as a Windows service manually, execute this in a 
# command line shell, e.g.
# mysqld --install MySQLXY --defaults-file="C:\Program Files\MySQL\MySQL Server X.Y\my.ini"
# 
# And then execute this in a command line shell to start the server, e.g.
# net start MySQLXY
# 
# Guidelines for editing this file
# ----------------------------------------------------------------------
# 
# In this file, you can use all long options that the program supports.
# If you want to know the options a program supports, start the program
# with the "--help" option.
# 
# More detailed information about the individual options can also be
# found in the manual.
# 
# For advice on how to change settings please see
# https://dev.mysql.com/doc/refman/8.0/en/server-configuration-defaults.html
# 
# CLIENT SECTION
# ----------------------------------------------------------------------
# 
# The following options will be read by MySQL client applications.
# Note that only client applications shipped by MySQL are guaranteed
# to read this section. If you want your own MySQL client program to
# honor these values, you need to specify it as an option during the
# MySQL client library initialization.
# 
[client]

# pipe=

# socket=MYSQL

port=3306

[mysql]
no-beep=

# default-character-set=

# SERVER SECTION
# ----------------------------------------------------------------------
# 
# The following options will be read by the MySQL Server. Make sure that
# you have installed the server correctly (see above) so it reads this 
# file.=
# 
# server_type=2
[mysqld]

# The next three options are mutually exclusive to SERVER_PORT below.
# skip-networking=
# enable-named-pipe=
# shared-memory=

# shared-memory-base-name=MYSQL

# The Pipe the MySQL Server will use
# socket=MYSQL

# The TCP/IP Port the MySQL Server will listen on
port=3306

# Path to installation directory. All paths are usually resolved relative to this.
# basedir="C:/Program Files/MySQL/MySQL Server 8.0/"

# Path to the database root
datadir=C:/ProgramData/MySQL/MySQL Server 8.0/Data

# The default character set that will be used when a new schema or table is
# created and no character set is defined
# character-set-server=

# The default authentication plugin to be used when connecting to the server
default_authentication_plugin=mysql_native_password

# The default storage engine that will be used when create new tables when
default-storage-engine=INNODB

# Set the SQL mode to strict
sql-mode="STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION"

# General and Slow logging.
log-output=FILE

general-log=0

general_log_file="VMI384596.log"

slow-query-log=1

slow_query_log_file="VMI384596-slow.log"

long_query_time=10

# Error Logging.
log-error="VMI384596.err"

# ***** Group Replication Related *****
# Specifies the base name to use for binary log files. With binary logging
# enabled, the server logs all statements that change data to the binary
# log, which is used for backup and replication.
log-bin="VMI384596-bin"

# ***** Group Replication Related *****
# Sets the binary logging format, and can be any one of STATEMENT, ROW,
# or MIXED. ROW is suggested for Group Replication.
# binlog_format=

# ***** Group Replication Related *****
# Causes the master to write a checksum for each event in the binary log.
# binlog_checksum supports the values NONE (disabled) and CRC32.
# The default is CRC32. When disabled (value NONE), the server verifies
# that it is writing only complete events to the binary log by writing
# and checking the event length (rather than a checksum) for each event.
# NONE must be used with Group Replication.
# binlog_checksum=

# ***** Group Replication Related *****
# The base name for the relay log. The server creates relay log files in
# sequence by adding a numeric suffix to the base name. If you specify this
# option, the value specified is also used as the base name for the relay log
# index file. Relay logs increase speed by using load-balancing between disks.
# relay_log=

# ***** Group Replication Related *****
# Specifies the server ID. For servers that are used in a replication topology,
# you must specify a unique server ID for each replication server, in the
# range from 1 to 2^32 - 1. �Unique� means that each ID must be different
# from every other ID in use by any other replication master or slave.
server-id=1

# ***** Group Replication Related *****
# The host name or IP address of the slave to be reported to the master
# during slave registration. This value appears in the output of SHOW SLAVE HOSTS
# on the master server. Leave the value unset if you do not want the slave to
# register itself with the master.
# report_host=0.0

# ***** Group Replication Related *****
# The TCP/IP port number for connecting to the slave, to be reported to the master during
# slave registration. Set this only if the slave is listening on a nondefault port or if
# you have a special tunnel from the master or other clients to the slave.
report_port=3306

# ***** Group Replication Related *****
# This option specifies whether global transaction identifiers (GTIDs) are
# used to identify transactions. ON must be used with Group Replication.
# gtid_mode=

# ***** Group Replication Related *****
# When enabled, the server enforces GTID consistency by allowing execution of
# only statements that can be safely logged using a GTID. You must set this
# option to ON before enabling GTID based replication.
# enforce_gtid_consistency=

# ***** Group Replication Related *****
# Whether updates received by a slave server from a master server should be
# logged to the slave's own binary log. Binary logging must be enabled on
# the slave for this variable to have any effect. ON must be used with
# Group Replication.
# log_slave_updates=

# ***** Group Replication Related *****
# Determines whether the slave server logs master status and connection information
# to an InnoDB table in the mysql database, or to a file in the data directory.
# The TABLE setting is required when multiple replication channels are configured.
# master_info_repository=

# ***** Group Replication Related *****
# Determines whether the slave server logs its position in the relay logs to an InnoDB
# table in the mysql database, or to a file in the data directory. The TABLE setting is
# required when multiple replication channels are configured.
# relay_log_info_repository=

# ***** Group Replication Related *****
# Defines the algorithm used to hash the writes extracted during a transaction. If you
# are using Group Replication, this variable must be set to XXHASH64 because the process
# of extracting the writes from a transaction is required for conflict detection on all
# group members.
# transaction_write_set_extraction=

# NOTE: Modify this value after Server initialization won't take effect.
lower_case_table_names=1

# Secure File Priv.
secure-file-priv="C:/ProgramData/MySQL/MySQL Server 8.0/Uploads"

# The maximum amount of concurrent sessions the MySQL server will
# allow. One of these connections will be reserved for a user with
# SUPER privileges to allow the administrator to login even if the
# connection limit has been reached.
max_connections = 2000

# The number of open tables for all threads. Increasing this value
# increases the number of file descriptors that mysqld requires.
# Therefore you have to make sure to set the amount of open files
# allowed to at least 4096 in the variable "open-files-limit" in
# section [mysqld_safe]
table_open_cache=2000

# Maximum size for internal (in-memory) temporary tables. If a table
# grows larger than this value, it is automatically converted to disk
# based table This limitation is for a single table. There can be many
# of them.
tmp_table_size = 4G

# How many threads we should keep in a cache for reuse. When a client
# disconnects, the client's threads are put in the cache if there aren't
# more than thread_cache_size threads from before.  This greatly reduces
# the amount of thread creations needed if you have a lot of new
# connections. (Normally this doesn't give a notable performance
# improvement if you have a good thread implementation.)
thread_cache_size=10

# *** MyISAM Specific options
# The maximum size of the temporary file MySQL is allowed to use while
# recreating the index (during REPAIR, ALTER TABLE or LOAD DATA INFILE.
# If the file-size would be bigger than this, the index will be created
# through the key cache (which is slower).
myisam_max_sort_file_size=10G

# The size of the buffer that is allocated when sorting MyISAM indexes
# during a REPAIR TABLE or when creating indexes with CREATE INDEX
# or ALTER TABLE.
myisam_sort_buffer_size=256K

# Size of the Key Buffer, used to cache index blocks for MyISAM tables.
# Do not set it larger than 30% of your available memory, as some memory
# is also required by the OS to cache rows. Even if you're not using
# MyISAM tables, you should still set it to 8-64M as it will also be
# used for internal temporary disk tables.
key_buffer_size = 64M

# Size of the buffer used for doing full table scans of MyISAM tables.
# Allocated per thread, if a full scan is needed.
read_buffer_size=64K

read_rnd_buffer_size=256K

# *** INNODB Specific options ***
# innodb_data_home_dir=

# Use this option if you have a MySQL server with InnoDB support enabled
# but you do not plan to use it. This will save memory and disk space
# and speed up some things.
# skip-innodb=

# If set to 1, InnoDB will flush (fsync) the transaction logs to the
# disk at each commit, which offers full ACID behavior. If you are
# willing to compromise this safety, and you are running small
# transactions, you may set this to 0 or 2 to reduce disk I/O to the
# logs. Value 0 means that the log is only written to the log file and
# the log file flushed to disk approximately once per second. Value 2
# means the log is written to the log file at each commit, but the log
# file is only flushed to disk approximately once per second.
innodb_flush_log_at_trx_commit = 1

# The size of the buffer InnoDB uses for buffering log data. As soon as
# it is full, InnoDB will have to flush it to disk. As it is flushed
# once per second anyway, it does not make sense to have it very large
# (even with long transactions).


# InnoDB, unlike MyISAM, uses a buffer pool to cache both indexes and
# row data. The bigger you set this the less disk I/O is needed to
# access data in tables. On a dedicated database server you may set this
# parameter up to 80% of the machine physical memory size. Do not set it
# too large, though, because competition of the physical memory may
# cause paging in the operating system.  Note that on 32bit systems you
# might be limited to 2-3.5G of user level memory per process, so do not
# set it too high.
innodb_buffer_pool_size=38G

# Size of each log file in a log group. You should set the combined size
# of log files to about 25%-100% of your buffer pool size to avoid
# unneeded buffer pool flush activity on log file overwrite. However,
# note that a larger logfile size will increase the time needed for the
# recovery process.
innodb_log_file_size=48M

# Number of threads allowed inside the InnoDB kernel. The optimal value
# depends highly on the application, hardware as well as the OS
# scheduler properties. A too high value may lead to thread thrashing.
innodb_thread_concurrency=21

# The increment size (in MB) for extending the size of an auto-extend InnoDB system tablespace file when it becomes full.
innodb_autoextend_increment=64

# The number of regions that the InnoDB buffer pool is divided into.
# For systems with buffer pools in the multi-gigabyte range, dividing the buffer pool into separate instances can improve concurrency,
# by reducing contention as different threads read and write to cached pages.
innodb_buffer_pool_instances=8

# Determines the number of threads that can enter InnoDB concurrently.
innodb_concurrency_tickets=5000

# Specifies how long in milliseconds (ms) a block inserted into the old sublist must stay there after its first access before
# it can be moved to the new sublist.
innodb_old_blocks_time=1000

# It specifies the maximum number of .ibd files that MySQL can keep open at one time. The minimum value is 10.
innodb_open_files=300

# When this variable is enabled, InnoDB updates statistics during metadata statements.
innodb_stats_on_metadata=0

# When innodb_file_per_table is enabled (the default in 5.6.6 and higher), InnoDB stores the data and indexes for each newly created table
# in a separate .ibd file, rather than in the system tablespace.
innodb_file_per_table=1

# Use the following list of values: 0 for crc32, 1 for strict_crc32, 2 for innodb, 3 for strict_innodb, 4 for none, 5 for strict_none.
innodb_checksum_algorithm = none

skip-innodb-doublewrite=

# The number of outstanding connection requests MySQL can have.
# This option is useful when the main MySQL thread gets many connection requests in a very short time.
# It then takes some time (although very little) for the main thread to check the connection and start a new thread.
# The back_log value indicates how many requests can be stacked during this short time before MySQL momentarily
# stops answering new requests.
# You need to increase this only if you expect a large number of connections in a short period of time.
back_log=80

# If this is set to a nonzero value, all tables are closed every flush_time seconds to free up resources and
# synchronize unflushed data to disk.
# This option is best used only on systems with minimal resources.
flush_time=0

# The minimum size of the buffer that is used for plain index scans, range index scans, and joins that do not use
# indexes and thus perform full table scans.
join_buffer_size=256K

# The maximum size of one packet or any generated or intermediate string, or any parameter sent by the
# mysql_stmt_send_long_data() C API function.
max_allowed_packet=4M

# If more than this many successive connection requests from a host are interrupted without a successful connection,
# the server blocks that host from performing further connections.
max_connect_errors=100

# Changes the number of file descriptors available to mysqld.
# You should try increasing the value of this option if mysqld gives you the error "Too many open files".
open_files_limit=4161

# If you see many sort_merge_passes per second in SHOW GLOBAL STATUS output, you can consider increasing the
# sort_buffer_size value to speed up ORDER BY or GROUP BY operations that cannot be improved with query optimization
# or improved indexing.
sort_buffer_size = 256K

# The number of table definitions (from .frm files) that can be stored in the definition cache.
# If you use a large number of tables, you can create a large table definition cache to speed up opening of tables.
# The table definition cache takes less space and does not use file descriptors, unlike the normal table cache.
# The minimum and default values are both 400.
table_definition_cache=1400

# Specify the maximum size of a row-based binary log event, in bytes.
# Rows are grouped into events smaller than this size if possible. The value should be a multiple of 256.
binlog_row_event_max_size=8K

# If the value of this variable is greater than 0, a replication slave synchronizes its master.info file to disk.
# (using fdatasync()) after every sync_master_info events.
sync_master_info=10000

# If the value of this variable is greater than 0, the MySQL server synchronizes its relay log to disk.
# (using fdatasync()) after every sync_relay_log writes to the relay log.
sync_relay_log=10000

# If the value of this variable is greater than 0, a replication slave synchronizes its relay-log.info file to disk.
# (using fdatasync()) after every sync_relay_log_info transactions.
sync_relay_log_info=10000

# Load mysql plugins at start."plugin_x ; plugin_y".
# plugin_load=

# The TCP/IP Port the MySQL Server X Protocol will listen on.
# loose_mysqlx_port=33060

# Size of the Key Buffer, used to cache index blocks for MyISAM tables.
# Do not set it larger than 30% of your available memory, as some memory
# is also required by the OS to cache rows. Even if you're not using
# MyISAM tables, you should still set it to 8-64M as it will also be
# used for internal temporary disk tables.

How should i improve my MySQL performance and prevent the server crash by limiting the CPU usage?


Solution 1:

No obvious explanation in the VARIABLES and STATUS.

Analysis of GLOBAL STATUS and VARIABLES:

Observations:

  • Version: 8.0.20
  • 60 GB of RAM
  • Uptime = 04:49:11; some GLOBAL STATUS values may not be meaningful yet.
  • You are running on Windows.
  • 4.89 Queries/sec : 3.43 Questions/sec

The More Important Issues:

Almost nothing is going on. I am having trouble imagining that MySQL cased the crash.

Lower max_connections to 500. (There have not been more than 23 concurrent connections since startup.)

tmp_table_size = 500M -- it is currently dangerously high for the amount of RAM you have.

innodb_doublewrite = ON

Details and other observations:

( innodb_lru_scan_depth * innodb_page_cleaners ) = 1,024 * 4 = 4,096 -- Amount of work for page cleaners every second. -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixable by lowering lru_scan_depth: Consider 1000 / innodb_page_cleaners (now 4). Also check for swapping.

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( Innodb_buffer_pool_pages_free * 16384 / innodb_buffer_pool_size ) = 2,478,311 * 16384 / 38912M = 99.5% -- buffer pool free -- buffer_pool_size is bigger than working set; could decrease it

( innodb_io_capacity ) = 200 -- When flushing, use this many IOPs. -- Reads could be slugghish or spiky.

( Innodb_buffer_pool_pages_free / Innodb_buffer_pool_pages_total ) = 2,478,311 / 2490368 = 99.5% -- Pct of buffer_pool currently not in use -- innodb_buffer_pool_size (now 40802189312) is bigger than necessary?

( innodb_io_capacity_max / innodb_io_capacity ) = 2,000 / 200 = 10 -- Capacity: max/plain -- Recommend 2. Max should be about equal to the IOPs your I/O subsystem can handle. (If the drive type is unknown 2000/200 may be a reasonable pair.)

( Innodb_buffer_pool_bytes_data / innodb_buffer_pool_size ) = 196,214,784 / 38912M = 0.48% -- Percent of buffer pool taken up by data -- A small percent may indicate that the buffer_pool is unnecessarily big.

( innodb_doublewrite ) = innodb_doublewrite = OFF -- Extra I/O, but extra safety in crash. -- OFF is OK for FusionIO, Galera, Replicas, ZFS.

( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 3,944,448 / (17351 / 3600) / 2 / 48M = 0.00813 -- Ratio -- (see minutes)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 17,351 / 60 * 48M / 3944448 = 3,690 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size (now 50331648). (Cannot change in AWS.)

( innodb_flush_method ) = innodb_flush_method = unbuffered -- How InnoDB should ask the OS to write blocks. Suggest O_DIRECT or O_ALL_DIRECT (Percona) to avoid double buffering. (At least for Unix.) See chrischandler for caveat about O_ALL_DIRECT

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( innodb_adaptive_hash_index ) = innodb_adaptive_hash_index = ON -- Usually should be ON. -- There are cases where OFF is better. See also innodb_adaptive_hash_index_parts (now 8) (after 5.7.9) and innodb_adaptive_hash_index_partitions (MariaDB and Percona). ON has been implicated in rare crashes (bug 73890). 10.5.0 decided to default OFF.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( max_connections ) = 2,000 -- Maximum number of connections (threads). Impacts various allocations. -- If max_connections (now 2000) is too high and various memory settings are high, you could run out of RAM.

( bulk_insert_buffer_size ) = 8 / 61440M = 0.01% -- Buffer for multi-row INSERTs and LOAD DATA -- Too big could threaten RAM size. Too small could hinder such operations.

( tmp_table_size ) = 4096M -- Limit on size of MEMORY temp tables used to support a SELECT -- Decrease tmp_table_size (now 4294967296) to avoid running out of RAM. Perhaps no more than 64M.

( Select_full_join / Com_select ) = 15,198 / 31082 = 48.9% -- % of selects that are indexless joins -- Add suitable index(es) to tables used in JOINs.

( Com_admin_commands / Queries ) = 25,348 / 84808 = 29.9% -- Percent of queries that are "admin" commands. -- What's going on?

( long_query_time ) = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( log_slow_slave_statements ) = log_slow_slave_statements = OFF -- (5.6.11, 5.7.1) By default, replicated statements won't show up in the slowlog; this causes them to show. -- It can be helpful in the slowlog to see writes that could be interfering with Replica reads.

( back_log ) = 80 -- (Autosized as of 5.6.6; based on max_connections) -- Raising to min(150, max_connections (now 2000)) may help when doing lots of connections.

( Max_used_connections / max_connections ) = 23 / 2000 = 1.1% -- Peak % of connections -- Since several memory factors can expand based on max_connections (now 2000), it is good not to have that setting too high.

( Com_change_db / Connections ) = 25,413 / 311 = 81.7 -- Database switches per connection -- (minor) Consider using "db.table" syntax

( Aborted_connects / Connections ) = 227 / 311 = 73.0% -- Perhaps a hacker is trying to break in? (Attempts to connect)

Abnormally small:

10 * read_buffer_size = 0.6MB
Com_insert = 4.4 /HR
Handler_read_next = 16 /sec
Innodb_buffer_pool_reads * innodb_page_size / innodb_buffer_pool_size = 0.47%
Innodb_dblwr_pages_written = 0
Innodb_rows_updated = 0.62 /HR
back_log / max_connections = 4.0%
innodb_doublewrite_files = 0
innodb_doublewrite_pages = 0

Abnormally large:

Com_create_db = 0.21 /HR
Com_create_table = 92 /HR
Com_show_charsets = 1.7 /HR
Com_show_plugins = 0.41 /HR
Com_show_storage_engines = 0.41 /HR
Innodb_buffer_pool_pages_free = 2.48e+6
Innodb_system_rows_deleted = 0.1 /sec
Innodb_system_rows_inserted = 0.1 /sec
Innodb_system_rows_updated = 0.32 /sec
Ssl_accepts = 304
Ssl_default_timeout = 7,200
Ssl_finished_accepts = 304
Ssl_session_cache_hits = 290
Ssl_session_cache_timeouts = 5
Ssl_verify_depth = 4.29e+9
Ssl_verify_mode = 5
gtid_executed_compression_period = 0.058 /sec
innodb_thread_concurrency = 21
max_error_count = 1,024
max_length_for_sort_data = 4,096
optimizer_trace_offset = --1
performance_schema_max_cond_classes = 100
performance_schema_max_mutex_classes = 300
performance_schema_max_rwlock_classes = 60
performance_schema_max_stage_classes = 175
performance_schema_max_statement_classes = 218
performance_schema_max_thread_classes = 100

Abnormal strings:

event_scheduler = ON
ft_boolean_syntax = + -><()~*:\"\"&
have_query_cache = NO
innodb_fast_shutdown = 1
innodb_temp_tablespaces_dir = .\\#innodb_temp\\
lower_case_file_system = ON
lower_case_table_names = 1
mysqlx_compression_algorithms = DEFLATE_STREAM,LZ4_MESSAGE,ZSTD_STREAM
optimizer_trace = enabled=off,one_line=off
optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on
protocol_compression_algorithms = zlib,zstd,uncompressed
slave_rows_search_algorithms = INDEX_SCAN,HASH_SCAN