MariaDB memory spikes and crash
We are running a MariaDB 10.5.8 server on GKE with 16Gb of RAM. The server have multiple times a day unexpected memory usage spikes that crash the server
1day memory usage graph (orange line is k8s requested ram)
A few extra details
- Server have 13.4 GB of available ram (excl. mysql)
- Occur even on quiet days (like today)
- QPS: ~150 (5% updates, 3% inserts)
- Avg Connections 50-150
- No unusual network traffic
-
slow_query_log
does not show any useful things
What am i missing here ? How so does the server ran out of memory ?
Next step will be enabling general_log
and try to see if i can catch what's occurring before the crash.
Configs
[mysqld]
skip-name-resolve
explicit_defaults_for_timestamp
character-set-server=UTF8
collation-server=utf8_general_ci
sql_mode=TRADITIONAL
innodb_buffer_pool_size=4G
tmp_table_size=32M
max_heap_table_size=32M
net_read_timeout=1800
net_write_timeout=1800
max_connections=300
open_files_limit=8192
Expected maximum memory usage
SELECT @@innodb_buffer_pool_size/1024/1024 as cur_buf, ROUND(
( @@GLOBAL.key_buffer_size
+ @@GLOBAL.query_cache_size
+ @@GLOBAL.tmp_table_size
+ @@GLOBAL.innodb_buffer_pool_size
+ @@GLOBAL.innodb_log_buffer_size
+ @@GLOBAL.max_connections * (
@@GLOBAL.sort_buffer_size
+ @@GLOBAL.read_buffer_size
+ @@GLOBAL.read_rnd_buffer_size
+ @@GLOBAL.join_buffer_size
+ @@GLOBAL.thread_stack
+ @@GLOBAL.binlog_cache_size)
) / 1024 / 1024, 1) `total MB`;
#cur_buf: 4096.00000000
# total MB: 5155.4
Current total index size
SELECT sum( ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 2)) size_in_mb
FROM mysql.innodb_index_stats
WHERE stat_name = 'size' AND index_name != 'PRIMARY' ORDER BY `size_in_mb` DESC
# size_in_mb 6471.11
EDIT
updated status 2021-02-08
... Oh Yikes! There is some issues that were not there before ! ...
don't see a command prompt, try pressing enter.
[--] Status: +ARCHIVE +Aria +BLACKHOLE +CSV +InnoDB +MEMORY +MRG_MyISAM +MyISAM +PERFORMANCE_SCHEMA +SEQUENCE
[--] Data in InnoDB tables: 20.6G (Tables: 1680)
[OK] Total fragmented tables: 0
-------- Analysis Performance Metrics --------------------------------------------------------------
[--] innodb_stats_on_metadata: OFF
[OK] No stat updates during querying INFORMATION_SCHEMA.
-------- Security Recommendations ------------------------------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[--] There are 620 basic passwords in the list.
-------- CVE Security Recommendations --------------------------------------------------------------
[OK] NO SECURITY CVE FOUND FOR YOUR VERSION
-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 10h 14m 26s (4M q [128.720 qps], 295K conn, TX: 97G, RX: 1G)
[--] Reads / Writes: 89% / 11%
[--] Binary logging is disabled
[--] Physical Memory : 13.7G
[--] Max MySQL memory : 8.8G
[--] Other process memory: 0B
[--] Total buffers: 3.3G global + 18.9M per thread (300 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 5.6G (41.01% of installed RAM)
[OK] Maximum possible memory usage: 8.8G (64.64% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (19/4M)
[OK] Highest usage of available connections: 41% (125/300)
[OK] Aborted connections: 0.00% (3/295567)
[OK] Query cache is disabled by default due to mutex contention on multiprocessor machines.
[OK] Sorts requiring temporary tables: 0% (6 temp sorts / 359K sorts)
[!!] Joins performed without indexes: 1244
[!!] Temporary tables created on disk: 54% (76K on disk / 140K total)
[OK] Thread cache hit rate: 99% (125 created / 295K connections)
[OK] Table cache hit rate: 27% (1K open / 6K opened)
[!!] table_definition_cache(400) is lower than number of tables(1882)
[OK] Open file limit used: 0% (16/32K)
[OK] Table locks acquired immediately: 100% (7K immediate / 7K locks)
-------- Performance schema ------------------------------------------------------------------------
[--] Performance schema is disabled.
[--] Memory used by P_S: 0B
[--] Sys schema isn't installed.
-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is enabled.
[--] Thread Pool Size: 4 thread(s).
[--] Using default value is good enough for your version (10.5.8-MariaDB)
-------- MyISAM Metrics ----------------------------------------------------------------------------
[!!] Key buffer used: 18.2% (24M used / 134M cache)
[OK] Key buffer size / total MyISAM indexes: 128.0M/4.0K
-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[!!] InnoDB buffer pool / data size: 3.0G/20.6G
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (3.125 %): 96.0M * 1/3.0G should be equal to 25%
[--] Number of InnoDB Buffer Pool Chunk : 24 for 1 Buffer Pool Instance(s)
[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances
[OK] InnoDB Read buffer efficiency: 99.99% (11893153063 hits/ 11894346836 total)
[!!] InnoDB Write Log efficiency: 21.89% (50454 hits/ 230456 total)
[OK] InnoDB log waits: 0.00% (0 waits / 280910 writes)
-------- Aria Metrics ------------------------------------------------------------------------------
[--] Aria Storage Engine is enabled.
[OK] Aria pagecache size / total Aria indexes: 128.0M/2.4M
[!!] Aria pagecache hit rate: 93.2% (1M cached / 77K reads)
-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.
-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.
-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.
-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] No replication slave(s) for this server.
[--] Binlog format: MIXED
[--] XA support enabled: ON
[--] Semi synchronous replication Master: OFF
[--] Semi synchronous replication Slave: OFF
[--] This is a standalone server
-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
MySQL was started within the last 24 hours - recommendations may be inaccurate
We will suggest raising the 'join_buffer_size' until JOINs not using indexes are found.
See https://dev.mysql.com/doc/internals/en/join-buffer-size.html
(specially the conclusions at the bottom of the page).
When making adjustments, make tmp_table_size/max_heap_table_size equal
Reduce your SELECT DISTINCT queries which have no LIMIT clause
Performance schema should be activated for better diagnostics
Consider installing Sys schema from https://github.com/mysql/mysql-sys for MySQL
Consider installing Sys schema from https://github.com/FromDual/mariadb-sys for MariaDB
Before changing innodb_log_file_size and/or innodb_log_files_in_group read
Variables to adjust:
join_buffer_size (> 256.0K, or always use indexes with JOINs)
tmp_table_size (> 32M)
max_heap_table_size (> 32M)
table_definition_cache(400) > 1882 or -1 (autosizing if supported)
performance_schema = ON enable PFS
innodb_buffer_pool_size (>= 20.6G) if possible.
innodb_log_file_size should be (=768M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.
=====================================
2021-02-08 13:58:40 0x7ff6b3d11700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 11 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 20876 srv_active, 0 srv_shutdown, 16149 srv_idle
srv_master_thread log flush and writes: 37025
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 10455
OS WAIT ARRAY INFO: signal count 11626
RW-shared spins 2112, rounds 12450, OS waits 102
RW-excl spins 2416, rounds 5720, OS waits 122
RW-sx spins 146, rounds 1352, OS waits 17
Spin rounds per wait: 5.89 RW-shared, 2.37 RW-excl, 9.26 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 1707308132
Purge done for trx's n:o < 1707308131 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 422180334491048, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334538304, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334516824, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334534008, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334512528, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334503936, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334525416, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334508232, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334478160, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334529712, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334521120, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334499640, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334495344, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334486752, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334482456, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334473864, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334469568, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334465272, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: (null) ((null))
I/O thread 1 state: (null) ((null))
I/O thread 2 state: (null) ((null))
I/O thread 3 state: (null) ((null))
I/O thread 4 state: (null) ((null))
I/O thread 5 state: (null) ((null))
I/O thread 6 state: (null) ((null))
I/O thread 7 state: (null) ((null))
I/O thread 8 state: (null) ((null))
I/O thread 9 state: (null) ((null))
Pending normal aio reads:
Pending flushes (fsync) log: 0; buffer pool: 0
1196036 OS file reads, 362093 OS file writes, 291004 OS fsyncs
0.27 reads/s, 16384 avg bytes/read, 6.73 writes/s, 6.73 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 817, free list len 41154, seg size 41972, 2293 merges
merged operations:
insert 9206, delete mark 105324, delete 196
discarded operations:
insert 0, delete mark 0, delete 0
0.00 hash searches/s, 111476.96 non-hash searches/s
---
LOG
---
Log sequence number 966315111123
Log flushed up to 966315111015
Pages flushed up to 966313325596
Last checkpoint at 966304251703
0 pending log flushes, 0 pending chkp writes
283193 log i/o's done, 6.73 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 3254779904
Dictionary memory allocated 30676992
Buffer pool size 193560
Free buffers 88
Database pages 193472
Old database pages 71422
Modified db pages 688
Percent of dirty pages(LRU & free pages): 0.355
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 0, flush list 0
Pages made young 2028651, not young 55580837
0.00 youngs/s, 0.55 non-youngs/s
Pages read 1194330, created 25139, written 78328
0.27 reads/s, 3.00 creates/s, 0.00 writes/s
Buffer pool hit rate 999 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 193472, unzip_LRU len: 0
I/O sum[1443]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 read views open inside InnoDB
Process ID=0, Main thread ID=0, state: sleeping
Number of rows inserted 118319, updated 165195, deleted 169952, read 9369336877
3.64 inserts/s, 2.55 updates/s, 0.18 deletes/s, 172245.89 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 28810
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
Ok, I think i stumbled on the problem.... thread_cache_size
[OK] Thread cache hit rate: 99% (125 created / 295K connections)
By default thread_cache_size
is set at 256
, in our case, MariaDB handle two types of loads, the regular SQL from apps and clients, and deep and complex analytics queries occasionally.
It seems that since when the connection were completed, the threads would be preserved in cache
for future use, and this was causing threads not to free their memory.
I dont know if there is another way to indicate to MariaDB that it should recycle its thread more often, but in the meanwhile, lowering thread_cache_size
fixed the issue