数据库内核月报

数据库内核月报 - 2020 / 03

AliSQL · 最佳实践 · Performance Agent

Author: 翊云

背景

性能数据是云上数据库服务的一个重要组成部分,对于性能数据,当前云厂商的一般做法是:由单独的采集系统进行性能数据的收集和处理,然后通过用户控制台进行性能数据的展示。借助控制台的将性能数据以图表的形式进行展示,比较直观,但是用户很难与自己的监控平台进行集成。特别是对于企业级用户,这些用户在上云之前往往有比较成熟的自建性能监控平台,虽然部分云厂商开始提供OpenAPI等方式对外开放性能数据,但是与自建平台的整合依然有诸多限制。

AliSQL解决方案

基于上述背景,AliSQL提出了一种全内聚的性能数据解决方案,直接通过系统表的方式对外提供性能数据。用户可以像查询普通数据一样,直接查询INFORMATION_SCHEMA库下的PERF_STATISTICS表得到性能数据。

设计实现

对于MySQL,用户关心的性能数据主要可以分为以下三种:

  1. Host层性能数据,包括主机的CPU占用、内存使用情况、IO调用等;
  2. Server层性能数据,包括各类连接信息、QPS信息、网络流量等;
  3. Engine层性能数据,包括数据读写情况、事务提交情况等;

MySQL内核内部,除了没有统计Host层的性能数据外,Server层和Engine层的性能数据都有统计并且提供了获取方法,例如:

MySQL [information_schema]> show status like "Com_select";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Com_select    | 3     |
+---------------+-------+
1 row in set (0.00 sec)

MySQL [information_schema]> show status like "Innodb_data_read";
+------------------+-----------+
| Variable_name    | Value     |
+------------------+-----------+
| Innodb_data_read | 470798848 |
+------------------+-----------+
1 row in set (0.00 sec)

AliSQL Performance Agent需要解决的问题就是:1)整合MySQL内核统计的Server层和Engine层性能指标;2)增加Host层的性能统计;3)提供便捷的外部访问方式。具体实现上:

  1. 新增Performance Agent Plugin,在Plugin内部启动一个性能采集线程,按照指定的采样周期,采集Host层、Server层和Engine层的性能数据;
  2. Host层性能数据的获取方式:根据PID信息,直接读取/proc以及/proc/PID目录下的系统文件;
  3. Server层性能数据获取方式:Plugin内调用Server层统计接口,获取Server层性能数据;
  4. Engine层性能数据获取方式:以InnoDB为例,Plugin内调用InnoDB对外接口,获取InnoDB内部性能数据;
  5. 性能数据的汇总计算:不同的性能指标,计算单个采样周期内的差值或者实时值;
  6. 数据保存方式:以CSV文件格式本地保存,同时在INFORMATION_SCHEMA库下新增一张PERF_STATISTICS表,保存最近1小时的性能数据;
/** 获取Server层性能数据 **/

typedef struct system_status_var STATUS_VAR;

struct system_status_var {
  ...
  ulonglong created_tmp_disk_tables;
  ulonglong created_tmp_tables;
  ...
  ulong com_stat[(uint) SQLCOM_END];
  ...
}

void calc_sum_of_all_status(STATUS_VAR *to)
{
  DBUG_ENTER("calc_sum_of_all_status");
  mysql_mutex_assert_owner(&LOCK_status);
  /* Get global values as base. */
  *to= global_status_var;
  Add_status add_status(to);
  Global_THD_manager::get_instance()->do_for_all_thd_copy(&add_status);
  DBUG_VOID_RETURN;
}


/** 获取InnoDB层性能数据 **/

/** Status variables to be passed to MySQL */
extern struct export_var_t export_vars;

struct export_var_t {
  ...
  ulint innodb_data_read;           /*!< Data bytes read */
  ulint innodb_data_writes;         /*!< I/O write requests */
  ulint innodb_data_written;        /*!< Data bytes written */
  ...
}

/* Function to pass InnoDB status variables to MySQL */
void srv_export_innodb_status(void)
{
  ...
  mutex_enter(&srv_innodb_monitor_mutex);
  ...
  export_vars.innodb_data_read = srv_stats.data_read;
  export_vars.innodb_data_writes = os_n_file_writes;
  export_vars.innodb_data_written = srv_stats.data_written;
  ...
}

性能测试

AliSQL Performance Agent启动一个独立的线程用于性能数据的采集和处理,不干扰用户线程的处理。Sysbench下oltp_read_write场景的性能测试结果显示,开启Performance Agent带来的性能损失在1%以内,对性能的影响可以忽略。

并发数 关闭Performance Agent 开启Performance Agent Overhead
1 4121 4101 -0.49%
8 30834 30740 -0.30%
16 58027 57774 -0.44%
32 64972 64321 -1.00%
64 57035 56945 -0.16%
128 50343 49990 -0.70%
256 48360 48307 -0.11%
512 45347 45400 0.12%
1024 43649 43272 -0.86%

使用说明

相比通过外部系统获取MySQL的性能数据,直接读取INFORMATION_SCHEMA库下的PERF_STATISTICS表不仅更加方便,而且数据的实时性也更好。

参数说明

MySQL [information_schema]> show variables like "%performance_agent%";
+----------------------------------------+-----------------+
| Variable_name                          | Value           |
+----------------------------------------+-----------------+
| performance_agent_enabled              | ON              |
| performance_agent_file_size            | 100             |
| performance_agent_interval             | 1               |
| performance_agent_perfstat_volume_size | 3600            |
+----------------------------------------+-----------------+
4 rows in set (0.00 sec)

其中:

  1. performance_agent_enabled: plugin启动开关,支持动态开启/关闭;
  2. performance_agent_file_size: 本地CSV文件大小,单位MB;
  3. performance_agent_interval: 采样周期,单位Second;
  4. performance_agent_perfstat_volume_size: PERF_STATISTICS表大小;

表结构说明

INFORMATION_SCHEMA库下的PERF_STATISTICS表结构如下:

CREATE TEMPORARY TABLE `PERF_STATISTICS` (
  `TIME` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `PROCS_MEM_USAGE` double NOT NULL DEFAULT '0',
  `PROCS_CPU_RATIO` double NOT NULL DEFAULT '0',
  `PROCS_IOPS` double NOT NULL DEFAULT '0',
  `PROCS_IO_READ_BYTES` bigint(21) NOT NULL DEFAULT '0',
  `PROCS_IO_WRITE_BYTES` bigint(21) NOT NULL DEFAULT '0',
  `MYSQL_CONN_ABORT` int(11) NOT NULL DEFAULT '0',
  `MYSQL_CONN_CREATED` int(11) NOT NULL DEFAULT '0',
  `MYSQL_USER_CONN_COUNT` int(11) NOT NULL DEFAULT '0',
  `MYSQL_CONN_COUNT` int(11) NOT NULL DEFAULT '0',
  `MYSQL_CONN_RUNNING` int(11) NOT NULL DEFAULT '0',
  `MYSQL_LOCK_IMMEDIATE` int(11) NOT NULL DEFAULT '0',
  `MYSQL_LOCK_WAITED` int(11) NOT NULL DEFAULT '0',
  `MYSQL_COM_INSERT` int(11) NOT NULL DEFAULT '0',
  `MYSQL_COM_UPDATE` int(11) NOT NULL DEFAULT '0',
  `MYSQL_COM_DELETE` int(11) NOT NULL DEFAULT '0',
  `MYSQL_COM_SELECT` int(11) NOT NULL DEFAULT '0',
  `MYSQL_COM_COMMIT` int(11) NOT NULL DEFAULT '0',
  `MYSQL_COM_ROLLBACK` int(11) NOT NULL DEFAULT '0',
  `MYSQL_COM_PREPARE` int(11) NOT NULL DEFAULT '0',
  `MYSQL_LONG_QUERY` int(11) NOT NULL DEFAULT '0',
  `MYSQL_TCACHE_GET` bigint(21) NOT NULL DEFAULT '0',
  `MYSQL_TCACHE_MISS` bigint(21) NOT NULL DEFAULT '0',
  `MYSQL_TMPFILE_CREATED` int(11) NOT NULL DEFAULT '0',
  `MYSQL_TMP_TABLES` int(11) NOT NULL DEFAULT '0',
  `MYSQL_TMP_DISKTABLES` int(11) NOT NULL DEFAULT '0',
  `MYSQL_SORT_MERGE` int(11) NOT NULL DEFAULT '0',
  `MYSQL_SORT_ROWS` int(11) NOT NULL DEFAULT '0',
  `MYSQL_BYTES_RECEIVED` bigint(21) NOT NULL DEFAULT '0',
  `MYSQL_BYTES_SENT` bigint(21) NOT NULL DEFAULT '0',
  `MYSQL_BINLOG_OFFSET` int(11) NOT NULL DEFAULT '0',
  `MYSQL_IOLOG_OFFSET` int(11) NOT NULL DEFAULT '0',
  `MYSQL_RELAYLOG_OFFSET` int(11) NOT NULL DEFAULT '0',
  `EXTRA` json NOT NULL DEFAULT 'null'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

注:EXTRA字段为json类型,记录Engine层统计信息:

{
  "INNODB_LOG_LSN":0,
  "INNODB_TRX_CNT":0,
  "INNODB_DATA_READ":0,
  "INNODB_IBUF_SIZE":0,
  "INNODB_LOG_WAITS":0,
  "INNODB_MAX_PURGE":0,
  "INNODB_N_WAITING":0,
  "INNODB_ROWS_READ":0,
  "INNODB_LOG_WRITES":0,
  "INNODB_IBUF_MERGES":0,
  "INNODB_DATA_WRITTEN":0,
  "INNODB_DBLWR_WRITES":0,
  "INNODB_IBUF_MERGEOP":0,
  "INNODB_IBUF_SEGSIZE":0,
  "INNODB_ROWS_DELETED":0,
  "INNODB_ROWS_UPDATED":0,
  "INNODB_COMMIT_TRXCNT":0,
  "INNODB_IBUF_FREELIST":0,
  "INNODB_MYSQL_TRX_CNT":0,
  "INNODB_ROWS_INSERTED":0,
  "INNODB_ACTIVE_TRX_CNT":0,
  "INNODB_COMMIT_TRXTIME":0,
  "INNODB_IBUF_DISCARDOP":0,
  "INNODB_OS_LOG_WRITTEN":0,
  "INNODB_ACTIVE_VIEW_CNT":0,
  "INNODB_LOG_FLUSHED_LSN":0,
  "INNODB_RSEG_HISTORY_LEN":0,
  "INNODB_AVG_COMMIT_TRXTIME":0,
  "INNODB_LOG_CHECKPOINT_LSN":0,
  "INNODB_MAX_COMMIT_TRXTIME":0,
  "INNODB_DBLWR_PAGES_WRITTEN":0
}

系统集成

AliSQL Performance Agent通过对外提供INFORMATION_SCHEMA库下的PERF_STATISTICS表的方式,让用户可以像查询普通数据的一样直接查询性能数据。

-- 查询最近30S内的内存和CPU使用情况 --
MySQL [information_schema]> SELECT TIME, PROCS_MEM_USAGE, PROCS_CPU_RATIO
    -> FROM PERF_STATISTICS ORDER BY TIME DESC LIMIT 30;
+---------------------+-----------------+-----------------+
| TIME                | PROCS_MEM_USAGE | PROCS_CPU_RATIO |
+---------------------+-----------------+-----------------+
| 2020-03-19 15:09:50 |      6070943744 |          101.11 |
| 2020-03-19 15:09:49 |      6070837248 |          100.99 |
| 2020-03-19 15:09:48 |      6070546432 |          101.11 |
| 2020-03-19 15:09:47 |      6071123968 |          101.17 |
| 2020-03-19 15:09:46 |      6070509568 |          101.23 |
| 2020-03-19 15:09:45 |      6070030336 |          101.63 |
| 2020-03-19 15:09:44 |      6069497856 |          100.72 |
| 2020-03-19 15:09:43 |      6069764096 |          100.85 |
| 2020-03-19 15:09:42 |      6069522432 |          101.23 |
| 2020-03-19 15:09:41 |      6068592640 |          101.25 |
| 2020-03-19 15:09:40 |      6069272576 |          100.87 |
| 2020-03-19 15:09:39 |      6069297152 |          101.31 |
| 2020-03-19 15:09:38 |      6069706752 |          101.04 |
| 2020-03-19 15:09:37 |      6069907456 |           100.8 |
| 2020-03-19 15:09:36 |      6069907456 |          103.72 |
| 2020-03-19 15:09:35 |      6069235712 |           99.05 |
| 2020-03-19 15:09:34 |      6068707328 |          101.32 |
| 2020-03-19 15:09:33 |      6068723712 |          100.66 |
| 2020-03-19 15:09:32 |      6069379072 |          101.25 |
| 2020-03-19 15:09:31 |      6069243904 |          103.62 |
| 2020-03-19 15:09:30 |      6069567488 |          101.17 |
| 2020-03-19 15:09:29 |      6069641216 |           98.15 |
| 2020-03-19 15:09:28 |      6069968896 |          101.12 |
| 2020-03-19 15:09:27 |      6070087680 |          104.15 |
| 2020-03-19 15:09:26 |      6069633024 |           101.3 |
| 2020-03-19 15:09:25 |      6069846016 |          100.94 |
| 2020-03-19 15:09:24 |      6068805632 |          101.26 |
| 2020-03-19 15:09:23 |      6068228096 |           98.45 |
| 2020-03-19 15:09:22 |      6067957760 |          103.89 |
| 2020-03-19 15:09:21 |      6067544064 |           98.66 |
+---------------------+-----------------+-----------------+
30 rows in set (0.26 sec)

-- 查询最近30S内InnoDB层的读取和插入行数 --
MySQL [information_schema]> SELECT TIME, EXTRA->'$.INNODB_ROWS_READ' AS INNODB_ROWS_READ,
    -> EXTRA->'$.INNODB_ROWS_INSERTED' AS INNODB_ROWS_INSERTED
    -> FROM information_schema.PERF_STATISTICS ORDER BY TIME DESC LIMIT 30;
+---------------------+------------------+----------------------+
| TIME                | INNODB_ROWS_READ | INNODB_ROWS_INSERTED |
+---------------------+------------------+----------------------+
| 2020-03-19 15:09:50 | 1588696          | 6309                 |
| 2020-03-19 15:09:49 | 1534831          | 22712                |
| 2020-03-19 15:09:48 | 1445766          | 25011                |
| 2020-03-19 15:09:47 | 1455092          | 25038                |
| 2020-03-19 15:09:46 | 1427958          | 24966                |
| 2020-03-19 15:09:45 | 1460370          | 25054                |
| 2020-03-19 15:09:44 | 1441310          | 24989                |
| 2020-03-19 15:09:43 | 1430437          | 25963                |
| 2020-03-19 15:09:42 | 1512929          | 24179                |
| 2020-03-19 15:09:41 | 1432366          | 24979                |
| 2020-03-19 15:09:40 | 1471565          | 25075                |
| 2020-03-19 15:09:39 | 1440499          | 24995                |
| 2020-03-19 15:09:38 | 1442158          | 24996                |
| 2020-03-19 15:09:37 | 1457681          | 25035                |
| 2020-03-19 15:09:36 | 1401060          | 24865                |
| 2020-03-19 15:09:35 | 1538809          | 25281                |
| 2020-03-19 15:09:34 | 1465982          | 25073                |
| 2020-03-19 15:09:33 | 1441252          | 24997                |
| 2020-03-19 15:09:32 | 1478242          | 24235                |
| 2020-03-19 15:09:31 | 1449499          | 22237                |
| 2020-03-19 15:09:30 | 1460754          | 25021                |
| 2020-03-19 15:09:29 | 1461106          | 25029                |
| 2020-03-19 15:09:28 | 1471250          | 22653                |
| 2020-03-19 15:09:27 | 1453101          | 21005                |
| 2020-03-19 15:09:26 | 1468384          | 21649                |
| 2020-03-19 15:09:25 | 1413783          | 28213                |
| 2020-03-19 15:09:24 | 1510981          | 16213                |
| 2020-03-19 15:09:23 | 1432580          | 27732                |
| 2020-03-19 15:09:22 | 1486866          | 20387                |
| 2020-03-19 15:09:21 | 1430200          | 26969                |
+---------------------+------------------+----------------------+
30 rows in set (0.20 sec)

BI集成

由于INFORMATION_SCHEMA库下的PERF_STATISTICS表中保存了标准的时间信息,所以用户可以直接与BI系统进行集成,例如:Grafana。以下是利用Grafana实现的实时监控平台。

PERF_STATISTICS

参考SQL如下:

-- 实时监控CPU和内存使用情况 --
SELECT
  $__timeGroupAlias(TIME,1s),
  sum(PROCS_CPU_RATIO) AS "PROCS_CPU_RATIO",
  sum(PROCS_MEM_USAGE) AS "PROCS_MEM_USAGE"
FROM PERF_STATISTICS
GROUP BY 1
ORDER BY $__timeGroup(TIME,1s);