MySQL问题分析-是谁偷走了我的IO

前言

对于刚接触MySQL数据库不久的小伙伴来说,或多或少会遇到一些棘手的小问题;比如在数据库磁盘无损坏的情况下,我们经常会遇到数据库服务器的磁盘io压力很大,但由于服务器上只有MySQL程序,我们第一想法肯定是某个连接用户在连到数据库后,背着我们在瞎搞着什么;然而通过show processlist命令,却并没有看到不良的操作,可恶。下面模拟一个简单场景,让我们揪出这狗人的邪恶操作。

场景模拟,制造IO压力

[root@tidb-master ~]# /usr/share/sysbench/oltp_read_write.lua --mysql-host=10.186.61.36 --mysql-user=sun --mysql-password='xx' --mysql-port=3306 --mysql-db=testdb --tables=1 --table_size=10000000 --auto_inc=off --report-interval=2 --db-ps-mode=disable --time=100000 --threads=1 prepare sysbench 1.0.17 (using system LuaJIT 2.0.4)  Creating table 'sbtest1'... Inserting 10000000 records into 'sbtest1' Creating a secondary index on 'sbtest1'... [root@tidb-master ~]# /usr/share/sysbench/oltp_read_write.lua --mysql-host=10.186.61.36 --mysql-user=sun --mysql-password='xx' --mysql-port=3306 --mysql-db=testdb --tables=1 --table_size=10000000 --auto_inc=off --report-interval=2 --db-ps-mode=disable --time=100000 --threads=1 run sysbench 1.0.17 (using system LuaJIT 2.0.4)  Running the test with following options: Number of threads: 1 Report intermediate results every 2 second(s) Initializing random number generator from current time   Initializing worker threads...  Threads started!  [ 2s ] thds: 1 tps: 19.48 qps: 399.05 (r/w/o: 279.69/79.91/39.46) lat (ms,95%): 80.03 err/s: 0.00 reconn/s: 0.00 [ 4s ] thds: 1 tps: 30.01 qps: 599.76 (r/w/o: 420.18/119.55/60.03) lat (ms,95%): 69.29 err/s: 0.00 reconn/s: 0.00 [ 6s ] thds: 1 tps: 38.00 qps: 751.49 (r/w/o: 524.99/150.50/76.00) lat (ms,95%): 53.85 err/s: 0.00 reconn/s: 0.00 [ 8s ] thds: 1 tps: 41.00 qps: 827.05 (r/w/o: 581.03/164.01/82.00) lat (ms,95%): 61.08 err/s: 0.00 reconn/s: 0.00 [ 10s ] thds: 1 tps: 35.50 qps: 709.43 (r/w/o: 496.45/141.99/70.99) lat (ms,95%): 64.47 err/s: 0.00 reconn/s: 0.00

问题分析,捉出狗人

1、通过top命令发现io wait比较高

top - 16:46:49 up 176 days,  6:46,  1 user,  load average: 3.72, 2.39, 3.04 Tasks: 182 total,   1 running, 181 sleeping,   0 stopped,   0 zombie %Cpu0  :  4.8 us,  2.0 sy,  0.0 ni, 91.1 id,  1.7 wa,  0.0 hi,  0.0 si,  0.3 st %Cpu1  : 53.2 us, 10.8 sy,  0.0 ni, 14.6 id, 13.9 wa,  0.0 hi,  7.1 si,  0.3 st %Cpu2  : 11.7 us,  4.8 sy,  0.0 ni, 73.8 id,  9.3 wa,  0.0 hi,  0.0 si,  0.3 st %Cpu3  : 11.4 us,  3.1 sy,  0.0 ni, 78.9 id,  6.6 wa,  0.0 hi,  0.0 si,  0.0 st %Cpu4  :  3.8 us,  1.4 sy,  0.0 ni, 93.4 id,  1.0 wa,  0.0 hi,  0.0 si,  0.3 st %Cpu5  :  5.2 us,  2.8 sy,  0.0 ni, 89.6 id,  2.4 wa,  0.0 hi,  0.0 si,  0.0 st %Cpu6  :  6.2 us,  2.4 sy,  0.0 ni, 83.3 id,  7.6 wa,  0.0 hi,  0.0 si,  0.3 st %Cpu7  :  6.8 us,  2.1 sy,  0.0 ni, 86.0 id,  4.8 wa,  0.0 hi,  0.3 si,  0.0 st %Cpu8  :  1.7 us,  1.4 sy,  0.0 ni, 94.3 id,  2.4 wa,  0.0 hi,  0.0 si,  0.3 st %Cpu9  : 16.1 us,  5.1 sy,  0.0 ni, 70.9 id,  7.9 wa,  0.0 hi,  0.0 si,  0.0 st %Cpu10 : 10.3 us,  2.7 sy,  0.0 ni, 82.5 id,  4.1 wa,  0.0 hi,  0.0 si,  0.3 st %Cpu11 : 10.1 us,  3.1 sy,  0.0 ni, 75.0 id, 11.8 wa,  0.0 hi,  0.0 si,  0.0 st KiB Mem : 12137024 total,  5308908 free,  3601408 used,  3226708 buff/cache KiB Swap:        0 total,        0 free,        0 used.  7555800 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                               8858 actiont+  20   0 4867792 697536  15840 S 172.4  5.7   6:12.06 mysqld                                                                                                               26447 tidb      20   0 4000364   1.0g   5384 S   6.0  9.0   9785:41 TiFlashMain                                                                                                          22761 actiont+  20   0 2526348  37548  10980 S   4.3  0.3   1031:05 ustats                                                                                                               25894 tidb      20   0   11.2g 159668  13560 S   2.3  1.3   3682:45 pd-server                                                                                                            25975 tidb      20   0 2637164   1.0g   7392 S   2.0  8.8   4121:29 tikv-server                                                                                                          25967 actiont+  20   0 1879304  16292   6112 S   1.3  0.1 166:08.91 uguard-agent

2、通过iostat命令进一步分析,我们看到磁盘比较繁忙,忙于读写中(%uitl值比较大,一般来说,值大于75是表明磁盘比较繁忙了)

[root@tidb-tikv01 ~]# iostat -x 1 Linux 3.10.0-862.14.4.el7.x86_64 (tidb-tikv01)  04/19/2022      _x86_64_        (12 CPU)  avg-cpu:  %user   %nice %system %iowait  %steal   %idle            1.20    0.00    0.44    0.17    0.10   98.09  Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util vda               0.01     0.18    3.15   21.14    80.79   232.37    25.79     0.19    7.69    6.15    7.92   0.75   1.82 scd0              0.00     0.00    0.00    0.00     0.00     0.00     6.00     0.00    0.50    0.50    0.00   0.50   0.00  avg-cpu:  %user   %nice %system %iowait  %steal   %idle            9.60    0.00    3.43    9.43    0.09   77.46  Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util vda               0.00     0.00 1328.00 3237.00 21248.00 39091.50    26.44     2.71    0.59    1.09    0.39   0.17  79.50 scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00  avg-cpu:  %user   %nice %system %iowait  %steal   %idle           11.27    0.00    3.76    9.31    0.00   75.66  Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util vda               0.00     0.00 1304.00 3561.00 20864.00 45302.00    27.20     2.85    0.58    1.16    0.37   0.16  79.60 scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00  avg-cpu:  %user   %nice %system %iowait  %steal   %idle            8.06    0.00    2.66   19.04    0.17   70.07  Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util vda               0.00     0.00  917.00 3091.00 14672.00 43048.50    28.80     8.35    2.09    3.19    1.76   0.21  82.50 scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

3、通过iop命令可以定位是哪个线程消耗IO比较厉害,根据以下显示,发现线程值为22056的MySQL线程ID占用IO高达50.62%

Total DISK READ :       5.78 M/s | Total DISK WRITE :      16.91 M/s Actual DISK READ:       6.01 M/s | Actual DISK WRITE:      23.69 M/s   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                    22056 be/4 actionte    4.83 M/s  233.93 K/s  0.00 % 50.62 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --daemonize ~ch-mysql --socket=/opt/mysql/data/3306/mysqld.sock --port=3306  8887 be/4 actionte    0.00 B/s    0.00 B/s  0.00 %  6.31 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --daemonize ~ch-mysql --socket=/opt/mysql/data/3306/mysqld.sock --port=3306  8911 be/4 actionte  211.29 K/s    0.00 B/s  0.00 %  2.54 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --daemonize ~ch-mysql --socket=/opt/mysql/data/3306/mysqld.sock --port=3306 22771 be/4 actionte    0.00 B/s    3.77 K/s  0.00 %  1.72 % ustats

4、将上面获取的线程ID带入到MySQL中,便定位到是数据库中哪个连接搞的鬼了,抽死你丫的(performance_schama.threads中THREAD_OS_ID为系统线程ID)

[root@tidb-tikv01 ~]# /opt/mysql/base/8.0.21/bin/mysql -S /opt/mysql/data/3306/mysqld.sock -pxx mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor.  Commands end with ; or \g. Your MySQL connection id is 13832 Server version: 8.0.21 MySQL Community Server - GPL  Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.  Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.  Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.  mysql> show processlist; +-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------+ | Id    | User            | Host               | db     | Command          | Time   | State                                                         | Info             | +-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------+ |     5 | event_scheduler | localhost          | NULL   | Daemon           | 232299 | Waiting on empty queue                                        | NULL             | |    38 | universe_op     | 127.0.0.1:58456    | NULL   | Sleep            |      1 |                                                               | NULL             | |    39 | universe_op     | 127.0.0.1:58454    | NULL   | Sleep            |      1 |                                                               | NULL             | |    40 | universe_op     | 127.0.0.1:58458    | NULL   | Sleep            |      1 |                                                               | NULL             | |    41 | universe_op     | 127.0.0.1:58460    | NULL   | Sleep            |      1 |                                                               | NULL             | |    42 | universe_op     | 127.0.0.1:58466    | NULL   | Sleep            |      1 |                                                               | NULL             | |    43 | universe_op     | 127.0.0.1:58462    | NULL   | Sleep            |      1 |                                                               | NULL             | |    44 | universe_op     | 127.0.0.1:58464    | NULL   | Sleep            |      1 |                                                               | NULL             | |    47 | universe_op     | 127.0.0.1:58656    | NULL   | Sleep            |      7 |                                                               | NULL             | |    48 | universe_op     | 127.0.0.1:58658    | NULL   | Sleep            |      7 |                                                               | NULL             | |    49 | universe_op     | 127.0.0.1:58660    | NULL   | Sleep            |      7 |                                                               | NULL             | |    50 | universe_op     | 127.0.0.1:58662    | NULL   | Sleep            |      7 |                                                               | NULL             | |    51 | universe_op     | 127.0.0.1:58666    | NULL   | Sleep            |      7 |                                                               | NULL             | |    52 | universe_op     | 127.0.0.1:58664    | NULL   | Sleep            |      7 |                                                               | NULL             | |    53 | universe_op     | 127.0.0.1:58668    | NULL   | Sleep            |      7 |                                                               | NULL             | |    54 | universe_op     | 127.0.0.1:58670    | NULL   | Sleep            |      7 |                                                               | NULL             | |    56 | universe_op     | 10.186.61.45:55982 | NULL   | Binlog Dump GTID | 232282 | Master has sent all binlog to slave; waiting for more updates | NULL             | |   231 | universe_op     | 127.0.0.1:41766    | NULL   | Sleep            |      1 |                                                               | NULL             | | 11155 | sun             | 10.186.61.16:53874 | testdb | Sleep            |      0 |                                                               | NULL             | | 13832 | root            | localhost          | NULL   | Query            |      0 | starting                                                      | show processlist | +-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------+ 20 rows in set (0.00 sec)  mysql> select * from performance_schema.threads where THREAD_OS_ID=22056\G *************************** 1. row ***************************           THREAD_ID: 11200                NAME: thread/sql/one_connection                TYPE: FOREGROUND      PROCESSLIST_ID: 11155    PROCESSLIST_USER: sun    PROCESSLIST_HOST: 10.186.61.16      PROCESSLIST_DB: testdb PROCESSLIST_COMMAND: Query    PROCESSLIST_TIME: 0   PROCESSLIST_STATE: statistics    PROCESSLIST_INFO: SELECT c FROM sbtest1 WHERE id=6433600    PARENT_THREAD_ID: NULL                ROLE: NULL        INSTRUMENTED: YES             HISTORY: YES     CONNECTION_TYPE: TCP/IP        THREAD_OS_ID: 22056      RESOURCE_GROUP: USR_default 1 row in set (0.01 sec)  mysql>

总结

 对于一些新入门的小可爱来说,以上方法可以循序渐进的帮助你定位到是哪些连接线程把你的服务器IO折腾高的,可以让你更加了解Linux系统与MySQL之间的联动关系。老鸟嘛,emmm...  咦,飞走了。