数据库内核月报

数据库内核月报 - 2017 / 12

MySQL · myrocks · 相关tools介绍

Author: 张远

概述

MyRocks提供了丰富的tools,如sst_dump, mysql_ldb等,这些工具对我们的运维和分析问题非常有用。

sst_dump 可以导出sst中的数据和属性信息。

sst_dump --help

sst_dump --file=<data_dir_OR_sst_file> [--command=check|scan|raw]
    --file=<data_dir_OR_sst_file>
      Path to SST file or directory containing SST files

    --command=check|scan|raw|verify
        check: Iterate over entries in files but dont print anything except if an error is encounterd (default command)
        scan: Iterate over entries in files and print them to screen
        raw: Dump all the table contents to <file_name>_dump.txt
        verify: Iterate all the blocks in files verifying checksum to detect possible coruption but dont print anything except if a corruption is encountered
        recompress: reports the SST file size if recompressed with different
                    compression types

    --output_hex
      Can be combined with scan command to print the keys and values in Hex

    --from=<user_key>
      Key to start reading from when executing check|scan

    --to=<user_key>
      Key to stop reading at when executing check|scan

    --prefix=<user_key>
      Returns all keys with this prefix when executing check|scan
      Cannot be used in conjunction with --from

    --read_num=<num>
      Maximum number of entries to read when executing check|scan

    --verify_checksum
      Verify file checksum when executing check|scan

    --input_key_hex
      Can be combined with --from and --to to indicate that these values are encoded in Hex

    --show_properties
      Print table properties after iterating over the file when executing
      check|scan|raw

    --set_block_size=<block_size>
      Can be combined with --command=recompress to set the block size that will
      be used when trying different compression algorithms

    --compression_types=<comma-separated list of CompressionType members, e.g.,
      kSnappyCompression>
      Can be combined with --command=recompress to run recompression for this
      list of compression types

    --parse_internal_key=<0xKEY>
      Convenience option to parse an internal key on the command line. Dumps the
      internal key in hex format {'key' @ SN: type}

mysql_ldb 工具功能非常强大,可以解析rocksdb各类文件,sst, manifest, wal等; 可以写入和查询数据;还提供了一些维护功能,详见如下help信息

bin/mysql_ldb --help
ldb - RocksDB Tool

commands MUST specify --db=<full_path_to_db_directory> when necessary

The following optional parameters control if keys/values are input/output as hex or as plain strings:
  --key_hex : Keys are input/output as hex
  --value_hex : Values are input/output as hex
  --hex : Both keys and values are input/output as hex

The following optional parameters control the database internals:
  --column_family=<string> : name of the column family to operate on. default: default column family
  --ttl with 'put','get','scan','dump','query','batchput' : DB supports ttl and value is internally timestamp-suffixed
  --try_load_options : Try to load option file from DB.
  --ignore_unknown_options : Ignore unknown options when loading option file.
  --bloom_bits=<int,e.g.:14>
  --fix_prefix_len=<int,e.g.:14>
  --compression_type=<no|snappy|zlib|bzip2|lz4|lz4hc|xpress|zstd>
  --compression_max_dict_bytes=<int,e.g.:16384>
  --block_size=<block_size_in_bytes>
  --auto_compaction=<true|false>
  --db_write_buffer_size=<int,e.g.:16777216>
  --write_buffer_size=<int,e.g.:4194304>
  --file_size=<int,e.g.:2097152>


Data Access Commands:
  put <key> <value>  [--ttl]
  get <key> [--ttl]
  batchput <key> <value> [<key> <value>] [..] [--ttl]
  scan [--from] [--to]  [--ttl] [--timestamp] [--max_keys=<N>q]  [--start_time=<N>:- is inclusive] [--end_time=<N>:- is exclusive] [--no_value]
  delete <key>
  deleterange <begin key> <end key>
  query [--ttl]
    Starts a REPL shell.  Type help for list of available commands.
  approxsize [--from] [--to]
  checkconsistency


Admin Commands:
  dump_wal --walfile=<write_ahead_log_file_path> [--header]  [--print_value]
  compact [--from] [--to]
  reduce_levels --new_levels=<New number of levels> [--print_old_levels]
  change_compaction_style --old_compaction_style=<Old compaction style: 0 for level compaction, 1 for universal compaction> --new_compaction_style=<New compaction style: 0 for level compaction, 1 for universal compaction>
  dump [--from] [--to]  [--ttl] [--max_keys=<N>] [--timestamp] [--count_only] [--count_delim=<char>] [--stats] [--bucket=<N>] [--start_time=<N>:- is inclusive] [--end_time=<N>:- is exclusive] [--path=<path_to_a_file>]
  load [--create_if_missing] [--disable_wal] [--bulk_load] [--compact]
  manifest_dump [--verbose] [--json] [--path=<path_to_manifest_file>]
  list_column_families full_path_to_db_directory
  dump_live_files
  idump [--from] [--to]  [--input_key_hex] [--max_keys=<N>] [--count_only] [--count_delim=<char>] [--stats]
  repair
  backup [--backup_env_uri]  [--backup_dir]  [--num_threads]  [--stderr_log_level=<int (InfoLogLevel)>]
  restore [--backup_env_uri]  [--backup_dir]  [--num_threads]  [--stderr_log_level=<int (InfoLogLevel)>]
  checkpoint [--checkpoint_dir]

Note: ldb是rocksdb自带的工具,其功能和mysql_ldb一致。但mysql_ldb使用了MyRocks的comparactor, 因此推荐使用mysql_ldb

示例初始化

create table t1( c1 int , c2 int, c3 varchar(10), primary key (c1), key
idx1(c2)) engine=rocksdb;
insert t1 values (1,101,'A');
insert t1 values (2,102,'B');
insert t1 values (3,103,'C');
insert t1 values (4,104,'D');
insert t1 values (5,105,'E');


select * from t1;
+----+------+------+
| c1 | c2   | c3   |
+----+------+------+
|  1 |  101 | A    |
|  2 |  102 | B    |
|  3 |  103 | C    |
|  4 |  104 | D    |
|  5 |  105 | E    |
+----+------+------+

SET GLOBAL rocksdb_force_flush_memtable_now=1;

select d.table_schema, d.table_name, d.index_name, d.index_number,f.sst_name, f.NUM_ROWS, f.ENTRY_DELETES, f.ENTRY_SINGLEDELETES,f.ENTRY_MERGES, f.ENTRY_OTHERS from information_schema.ROCKSDB_DDL d, information_schema.rocksdb_index_file_map f where d.index_number=f.index_number and d.table_name='t1';
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| table_schema | table_name | index_name | index_number | sst_name   | NUM_ROWS | ENTRY_DELETES | ENTRY_SINGLEDELETES | ENTRY_MERGES | ENTRY_OTHERS |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| test         | t1         | PRIMARY    |          256 | 000030.sst |        5 |             0 |                   0 |            0 |            0 |
| test         | t1         | idx1       |          257 | 000030.sst |        5 |             0 |                   0 |            0 |            0 |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+

使用 sst_dump

查看test.t1数据分布情况

select d.table_schema, d.table_name, d.index_name, d.index_number,f.sst_name, f.NUM_ROWS, f.ENTRY_DELETES, f.ENTRY_SINGLEDELETES,f.ENTRY_MERGES, f.ENTRY_OTHERS from information_schema.ROCKSDB_DDL d, information_schema.rocksdb_index_file_map f where d.index_number=f.index_number and d.table_name='t1';
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| table_schema | table_name | index_name | index_number | sst_name   | NUM_ROWS | ENTRY_DELETES | ENTRY_SINGLEDELETES | ENTRY_MERGES | ENTRY_OTHERS |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| test         | t1         | PRIMARY    |          256 | 000030.sst |        5 |             0 |                   0 |            0 |            0 |
| test         | t1         | idx1       |          257 | 000030.sst |        5 |             0 |                   0 |            0 |            0 |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
bin/sst_dump --command=scan --file=data/.rocksdb/000030.sst --output_hex
from [] to []
Process data/.rocksdb/000030.sst
Sst file format: block-based
'0000010080000001' seq:13, type:1 => 00650000000141
'0000010080000002' seq:16, type:1 => 00660000000142
'0000010080000003' seq:19, type:1 => 00670000000143
'0000010080000004' seq:22, type:1 => 00680000000144
'0000010080000005' seq:25, type:1 => 00690000000145
'00000101018000006580000001' seq:14, type:1 =>
'00000101018000006680000002' seq:17, type:1 =>
'00000101018000006780000003' seq:20, type:1 =>
'00000101018000006880000004' seq:23, type:1 =>
'00000101018000006980000005' seq:26, type:1 =>

前五行为主键primary数据

后五行为索引idx1的数据

MyRocks记录格式参考这里

以此行(1,101,’A’)数据为例子

主键为

'0000010080000001' seq:13, type:1 => 00650000000141

key由indexid+c1+seq+type组成

Primary index id: 256 ==>   100
c1:               1   ==>   80000001 (符号反转)
type:1           PUT

value 由c2+c3组成

c2:              101  ==>   65
c3:              A    ==>   141

二级索引idx1数据为

'00000101018000006580000001' seq:14, type:1 =>

key由indexid+c2(主键)+c1(二级索引键)组成

second index id: 257 ==>   101
c2 : 101             ==>   80000065              主键   
c1: 1                ==>  80000001 (符号反转)    二级索引键

value为null

Note type值参考如下定义

enum ValueType : unsigned char {
  kTypeDeletion = 0x0,
  kTypeValue = 0x1,
  kTypeMerge = 0x2,
  kTypeLogData = 0x3,               // WAL only.
  kTypeColumnFamilyDeletion = 0x4,  // WAL only.
  kTypeColumnFamilyValue = 0x5,     // WAL only.
  kTypeColumnFamilyMerge = 0x6,     // WAL only.
  kTypeSingleDeletion = 0x7,
  kTypeColumnFamilySingleDeletion = 0x8,  // WAL only.
  kTypeBeginPrepareXID = 0x9,             // WAL only.
  kTypeEndPrepareXID = 0xA,               // WAL only.
  kTypeCommitXID = 0xB,                   // WAL only.
  kTypeRollbackXID = 0xC,                 // WAL only.
  kTypeNoop = 0xD,                        // WAL only.
  kTypeColumnFamilyRangeDeletion = 0xE,   // WAL only.
  kTypeRangeDeletion = 0xF,               // meta block
  kTypeColumnFamilyBlobIndex = 0x10,      // Blob DB only
  kTypeBlobIndex = 0x11,                  // Blob DB only
  // When the prepared record is also persisted in db, we use a different
  // record. This is to ensure that the WAL that is generated by a WritePolicy
  // is not mistakenly read by another, which would result into data
  // inconsistency.
  kTypeBeginPersistedPrepareXID = 0x12,  // WAL only.
  kMaxValue = 0x7F                       // Not used for storing records.
};
bin/sst_dump --command=scan --file=data/.rocksdb/000030.sst  --from='0x0000010080000002' --to='0x0000010080000005' --input_key_hex --output_hex
from [0000010080000002] to [0000010080000005]
Process data/.rocksdb/000030.sst
Sst file format: block-based
'0000010080000002' seq:16, type:1 => 00660000000142
'0000010080000003' seq:19, type:1 => 00670000000143
'0000010080000004' seq:22, type:1 => 00680000000144

command=raw可以将数据和属性信息都写到*_dump.txt文件中

bin/sst_dump --command=raw --file=data/.rocksdb/000030.sst  --output_hex
from [] to []
Process data/.rocksdb/000030.sst
Sst file format: block-based
raw dump written to file data/.rocksdb/000030_dump.txt

另外选项–show_properties也可以展示属性信息

mysql_ldb 使用

bin/mysql_ldb --db=data/.rocksdb scan --hex
0x0000010080000001 : 0x00650000000141
0x0000010080000002 : 0x00660000000142
0x0000010080000003 : 0x00670000000143
0x0000010080000004 : 0x00680000000144
0x0000010080000005 : 0x00690000000145
0x00000101018000006580000001 : 0x
0x00000101018000006680000002 : 0x
0x00000101018000006780000003 : 0x
0x00000101018000006880000004 : 0x
0x00000101018000006980000005 : 0x

MyRocks在运行过程中,mysql_ldb是不能写入数据的

bin/mysql_ldb --db=data/.rocksdb put --hex 0x0000010080000006 0x006A0000000146
Failed: IO error: While lock file: data/.rocksdb/LOCK: Resource temporarily unavailable

shutdown myrocks实例后,再写入成功

bin/mysql_ldb --db=data/.rocksdb put --hex 0x0000010080000006 0x006A0000000146
ok

可以看到数据写到了新的sst(000041.sst)中

select d.table_schema, d.table_name, d.index_name, d.index_number,f.sst_name, f.NUM_ROWS, f.ENTRY_DELETES, f.ENTRY_SINGLEDELETES,f.ENTRY_MERGES, f.ENTRY_OTHERS from information_schema.ROCKSDB_DDL d, information_schema.rocksdb_index_file_map f where d.index_number=f.index_number and d.table_name='t1';
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| table_schema | table_name | index_name | index_number | sst_name   | NUM_ROWS | ENTRY_DELETES | ENTRY_SINGLEDELETES | ENTRY_MERGES | ENTRY_OTHERS |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| test         | t1         | PRIMARY    |          256 | 000030.sst |        5 |             0 |                   0 |            0 |            0 |
| test         | t1         | idx1       |          257 | 000030.sst |        5 |             0 |                   0 |            0 |            0 |
| test         | t1         | PRIMARY    |          256 | 000041.sst |        1 |             0 |                   0 |            0 |            0 |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+

同时可以看到新的数据(6,106,’F’);

select * from t1;
+----+------+------+
| c1 | c2   | c3   |
+----+------+------+
|  1 |  101 | A    |
|  2 |  102 | B    |
|  3 |  103 | C    |
|  4 |  104 | D    |
|  5 |  105 | E    |
|  6 |  106 | F    |
+----+------+------+

二级索引没有写数据,所以查不到

select * from t1 where c2=106;
Empty set (0.00 sec)

explain select * from t1 where c2=106;
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
|  1 | SIMPLE      | t1    | ref  | idx1          | idx1 | 5       | const |    4 | NULL  |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+

接着二级索引插入数据

bin/mysql_ldb --db=data/.rocksdb put --hex 0x00000101018000006A80000006 0x
OK

此时可以从二级索引查出数据了

select * from t1 where c2=106;
+----+------+------+
| c1 | c2   | c3   |
+----+------+------+
|  6 |  106 | F    |
+----+------+------+

select d.table_schema, d.table_name, d.index_name, d.index_number,f.sst_name, f.NUM_ROWS, f.ENTRY_DELETES, f.ENTRY_SINGLEDELETES,f.ENTRY_MERGES, f.ENTRY_OTHERS from information_schema.ROCKSDB_DDL d, information_schema.rocksdb_index_file_map f where d.index_number=f.index_number and d.table_name='t1';
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| table_schema | table_name | index_name | index_number | sst_name   | NUM_ROWS | ENTRY_DELETES | ENTRY_SINGLEDELETES | ENTRY_MERGES | ENTRY_OTHERS |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| test         | t1         | PRIMARY    |          256 | 000030.sst |        5 |             0 |                   0 |            0 |            0 |
| test         | t1         | idx1       |          257 | 000030.sst |        5 |             0 |                   0 |            0 |            0 |
| test         | t1         | PRIMARY    |          256 | 000041.sst |        1 |             0 |                   0 |            0 |            0 |
| test         | t1         | idx1       |          257 | 000058.sst |        1 |             0 |                   0 |            0 |            0 |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+

MANIFEST文件格式参考这里

bin/mysql_ldb manifest_dump  --path=data/.rocksdb/MANIFEST-000059
--------------- Column family "default"  (ID 0) --------------
log number: 58
comparator: RocksDB_SE_v3.10
--- level 0 --- version# 2 ---
 58:740['
--------------- Column family "__system__"  (ID 1) --------------
log number: 63
comparator: RocksDB_SE_v3.10
--- level 0 --- version# 3 ---
 64:773['
next_file_number 66 last_sequence 36  prev_log_number 0 max_column_family 1

人为将MANIFEST-000059删除,rocksdb数据将无法查看

bin/mysql_ldb --db=data/.rocksdb scan --hex
Failed: IO error: While opening a file for sequentially reading: data/.rocksdb/MANIFEST-000059

mysql_ldb 通过 repair 选项可以恢复出MANIFEST

bin/mysql_ldb repair --db=data/.rocksdb scan
RocksDB version: 5.9.0

Git sha rocksdb_build_git_sha:7c8c83458365f8b359cae13785d15b0bdc9df380
Compile date Dec 16 2017
DB SUMMARY

CURRENT file:  CURRENT

IDENTITY file:  IDENTITY

SST files in data/.rocksdb dir, Total Num: 7, files: 000030.sst 000039.sst 000041.sst 000047.sst 000054.sst 000058.sst 000064.sst

Write Ahead Log file in data/.rocksdb: 000063.log size: 19 ;
.......... //省略部分配置信息
                  
[WARN] [/home/zhangyuan.zy/git/rds6/rocksdb/db/repair.cc:209] **** Repaired rocksdb data/.rocksdb; recovered 7 files; 5847bytes. Some data may have been lost. ****
OK

repair 完成后数据可以查看

bin/mysql_ldb --db=data/.rocksdb scan --hex
0x0000010080000001 : 0x00650000000141
0x0000010080000002 : 0x00660000000142
0x0000010080000003 : 0x00670000000143
0x0000010080000004 : 0x00680000000144
0x0000010080000005 : 0x00690000000145
0x0000010080000006 : 0x006A0000000146
0x00000101018000006580000001 : 0x
0x00000101018000006680000002 : 0x
0x00000101018000006780000003 : 0x
0x00000101018000006880000004 : 0x
0x00000101018000006980000005 : 0x
0x00000101018000006A80000006 : 0x

同时生成了新的MANIFEST文件

bin/mysql_ldb manifest_dump  --path=data/.rocksdb/MANIFEST-000003
--------------- Column family "default"  (ID 0) --------------
log number: 0
comparator: RocksDB_SE_v3.10
--- level 0 --- version# 2 ---
 58:740['
--------------- Column family "__system__"  (ID 1) --------------
log number: 0
comparator: RocksDB_SE_v3.10
--- level 0 --- version# 3 ---
 64:773['
next_file_number 66 last_sequence 36  prev_log_number 0 max_column_family 1

最后

本文简单介绍了MyRocks相关工具sst_dump和mysql_ldb的基本用法,希望对大家有所帮助。