Author: 张远
MyRocks提供了丰富的tools,如sst_dump, mysql_ldb等,这些工具对我们的运维和分析问题非常有用。
sst_dump 可以导出sst中的数据和属性信息。
sst_dump --help
sst_dump --file=<data_dir_OR_sst_file> [--command=check|scan|raw]
--file=<data_dir_OR_sst_file>
Path to SST file or directory containing SST files
--command=check|scan|raw|verify
check: Iterate over entries in files but dont print anything except if an error is encounterd (default command)
scan: Iterate over entries in files and print them to screen
raw: Dump all the table contents to <file_name>_dump.txt
verify: Iterate all the blocks in files verifying checksum to detect possible coruption but dont print anything except if a corruption is encountered
recompress: reports the SST file size if recompressed with different
compression types
--output_hex
Can be combined with scan command to print the keys and values in Hex
--from=<user_key>
Key to start reading from when executing check|scan
--to=<user_key>
Key to stop reading at when executing check|scan
--prefix=<user_key>
Returns all keys with this prefix when executing check|scan
Cannot be used in conjunction with --from
--read_num=<num>
Maximum number of entries to read when executing check|scan
--verify_checksum
Verify file checksum when executing check|scan
--input_key_hex
Can be combined with --from and --to to indicate that these values are encoded in Hex
--show_properties
Print table properties after iterating over the file when executing
check|scan|raw
--set_block_size=<block_size>
Can be combined with --command=recompress to set the block size that will
be used when trying different compression algorithms
--compression_types=<comma-separated list of CompressionType members, e.g.,
kSnappyCompression>
Can be combined with --command=recompress to run recompression for this
list of compression types
--parse_internal_key=<0xKEY>
Convenience option to parse an internal key on the command line. Dumps the
internal key in hex format {'key' @ SN: type}
mysql_ldb 工具功能非常强大,可以解析rocksdb各类文件,sst, manifest, wal等; 可以写入和查询数据;还提供了一些维护功能,详见如下help信息
bin/mysql_ldb --help
ldb - RocksDB Tool
commands MUST specify --db=<full_path_to_db_directory> when necessary
The following optional parameters control if keys/values are input/output as hex or as plain strings:
--key_hex : Keys are input/output as hex
--value_hex : Values are input/output as hex
--hex : Both keys and values are input/output as hex
The following optional parameters control the database internals:
--column_family=<string> : name of the column family to operate on. default: default column family
--ttl with 'put','get','scan','dump','query','batchput' : DB supports ttl and value is internally timestamp-suffixed
--try_load_options : Try to load option file from DB.
--ignore_unknown_options : Ignore unknown options when loading option file.
--bloom_bits=<int,e.g.:14>
--fix_prefix_len=<int,e.g.:14>
--compression_type=<no|snappy|zlib|bzip2|lz4|lz4hc|xpress|zstd>
--compression_max_dict_bytes=<int,e.g.:16384>
--block_size=<block_size_in_bytes>
--auto_compaction=<true|false>
--db_write_buffer_size=<int,e.g.:16777216>
--write_buffer_size=<int,e.g.:4194304>
--file_size=<int,e.g.:2097152>
Data Access Commands:
put <key> <value> [--ttl]
get <key> [--ttl]
batchput <key> <value> [<key> <value>] [..] [--ttl]
scan [--from] [--to] [--ttl] [--timestamp] [--max_keys=<N>q] [--start_time=<N>:- is inclusive] [--end_time=<N>:- is exclusive] [--no_value]
delete <key>
deleterange <begin key> <end key>
query [--ttl]
Starts a REPL shell. Type help for list of available commands.
approxsize [--from] [--to]
checkconsistency
Admin Commands:
dump_wal --walfile=<write_ahead_log_file_path> [--header] [--print_value]
compact [--from] [--to]
reduce_levels --new_levels=<New number of levels> [--print_old_levels]
change_compaction_style --old_compaction_style=<Old compaction style: 0 for level compaction, 1 for universal compaction> --new_compaction_style=<New compaction style: 0 for level compaction, 1 for universal compaction>
dump [--from] [--to] [--ttl] [--max_keys=<N>] [--timestamp] [--count_only] [--count_delim=<char>] [--stats] [--bucket=<N>] [--start_time=<N>:- is inclusive] [--end_time=<N>:- is exclusive] [--path=<path_to_a_file>]
load [--create_if_missing] [--disable_wal] [--bulk_load] [--compact]
manifest_dump [--verbose] [--json] [--path=<path_to_manifest_file>]
list_column_families full_path_to_db_directory
dump_live_files
idump [--from] [--to] [--input_key_hex] [--max_keys=<N>] [--count_only] [--count_delim=<char>] [--stats]
repair
backup [--backup_env_uri] [--backup_dir] [--num_threads] [--stderr_log_level=<int (InfoLogLevel)>]
restore [--backup_env_uri] [--backup_dir] [--num_threads] [--stderr_log_level=<int (InfoLogLevel)>]
checkpoint [--checkpoint_dir]
Note: ldb是rocksdb自带的工具,其功能和mysql_ldb一致。但mysql_ldb使用了MyRocks的comparactor, 因此推荐使用mysql_ldb
create table t1( c1 int , c2 int, c3 varchar(10), primary key (c1), key
idx1(c2)) engine=rocksdb;
insert t1 values (1,101,'A');
insert t1 values (2,102,'B');
insert t1 values (3,103,'C');
insert t1 values (4,104,'D');
insert t1 values (5,105,'E');
select * from t1;
+----+------+------+
| c1 | c2 | c3 |
+----+------+------+
| 1 | 101 | A |
| 2 | 102 | B |
| 3 | 103 | C |
| 4 | 104 | D |
| 5 | 105 | E |
+----+------+------+
SET GLOBAL rocksdb_force_flush_memtable_now=1;
select d.table_schema, d.table_name, d.index_name, d.index_number,f.sst_name, f.NUM_ROWS, f.ENTRY_DELETES, f.ENTRY_SINGLEDELETES,f.ENTRY_MERGES, f.ENTRY_OTHERS from information_schema.ROCKSDB_DDL d, information_schema.rocksdb_index_file_map f where d.index_number=f.index_number and d.table_name='t1';
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| table_schema | table_name | index_name | index_number | sst_name | NUM_ROWS | ENTRY_DELETES | ENTRY_SINGLEDELETES | ENTRY_MERGES | ENTRY_OTHERS |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| test | t1 | PRIMARY | 256 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
| test | t1 | idx1 | 257 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
查看test.t1数据分布情况
select d.table_schema, d.table_name, d.index_name, d.index_number,f.sst_name, f.NUM_ROWS, f.ENTRY_DELETES, f.ENTRY_SINGLEDELETES,f.ENTRY_MERGES, f.ENTRY_OTHERS from information_schema.ROCKSDB_DDL d, information_schema.rocksdb_index_file_map f where d.index_number=f.index_number and d.table_name='t1';
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| table_schema | table_name | index_name | index_number | sst_name | NUM_ROWS | ENTRY_DELETES | ENTRY_SINGLEDELETES | ENTRY_MERGES | ENTRY_OTHERS |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| test | t1 | PRIMARY | 256 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
| test | t1 | idx1 | 257 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
bin/sst_dump --command=scan --file=data/.rocksdb/000030.sst --output_hex
from [] to []
Process data/.rocksdb/000030.sst
Sst file format: block-based
'0000010080000001' seq:13, type:1 => 00650000000141
'0000010080000002' seq:16, type:1 => 00660000000142
'0000010080000003' seq:19, type:1 => 00670000000143
'0000010080000004' seq:22, type:1 => 00680000000144
'0000010080000005' seq:25, type:1 => 00690000000145
'00000101018000006580000001' seq:14, type:1 =>
'00000101018000006680000002' seq:17, type:1 =>
'00000101018000006780000003' seq:20, type:1 =>
'00000101018000006880000004' seq:23, type:1 =>
'00000101018000006980000005' seq:26, type:1 =>
前五行为主键primary数据
后五行为索引idx1的数据
MyRocks记录格式参考这里
以此行(1,101,’A’)数据为例子
主键为
'0000010080000001' seq:13, type:1 => 00650000000141
key由indexid+c1+seq+type组成
Primary index id: 256 ==> 100
c1: 1 ==> 80000001 (符号反转)
type:1 PUT
value 由c2+c3组成
c2: 101 ==> 65
c3: A ==> 141
二级索引idx1数据为
'00000101018000006580000001' seq:14, type:1 =>
key由indexid+c2(主键)+c1(二级索引键)组成
second index id: 257 ==> 101
c2 : 101 ==> 80000065 主键
c1: 1 ==> 80000001 (符号反转) 二级索引键
value为null
Note type值参考如下定义
enum ValueType : unsigned char {
kTypeDeletion = 0x0,
kTypeValue = 0x1,
kTypeMerge = 0x2,
kTypeLogData = 0x3, // WAL only.
kTypeColumnFamilyDeletion = 0x4, // WAL only.
kTypeColumnFamilyValue = 0x5, // WAL only.
kTypeColumnFamilyMerge = 0x6, // WAL only.
kTypeSingleDeletion = 0x7,
kTypeColumnFamilySingleDeletion = 0x8, // WAL only.
kTypeBeginPrepareXID = 0x9, // WAL only.
kTypeEndPrepareXID = 0xA, // WAL only.
kTypeCommitXID = 0xB, // WAL only.
kTypeRollbackXID = 0xC, // WAL only.
kTypeNoop = 0xD, // WAL only.
kTypeColumnFamilyRangeDeletion = 0xE, // WAL only.
kTypeRangeDeletion = 0xF, // meta block
kTypeColumnFamilyBlobIndex = 0x10, // Blob DB only
kTypeBlobIndex = 0x11, // Blob DB only
// When the prepared record is also persisted in db, we use a different
// record. This is to ensure that the WAL that is generated by a WritePolicy
// is not mistakenly read by another, which would result into data
// inconsistency.
kTypeBeginPersistedPrepareXID = 0x12, // WAL only.
kMaxValue = 0x7F // Not used for storing records.
};
bin/sst_dump --command=scan --file=data/.rocksdb/000030.sst --from='0x0000010080000002' --to='0x0000010080000005' --input_key_hex --output_hex
from [0000010080000002] to [0000010080000005]
Process data/.rocksdb/000030.sst
Sst file format: block-based
'0000010080000002' seq:16, type:1 => 00660000000142
'0000010080000003' seq:19, type:1 => 00670000000143
'0000010080000004' seq:22, type:1 => 00680000000144
command=raw可以将数据和属性信息都写到*_dump.txt文件中
bin/sst_dump --command=raw --file=data/.rocksdb/000030.sst --output_hex
from [] to []
Process data/.rocksdb/000030.sst
Sst file format: block-based
raw dump written to file data/.rocksdb/000030_dump.txt
另外选项–show_properties也可以展示属性信息
bin/mysql_ldb --db=data/.rocksdb scan --hex
0x0000010080000001 : 0x00650000000141
0x0000010080000002 : 0x00660000000142
0x0000010080000003 : 0x00670000000143
0x0000010080000004 : 0x00680000000144
0x0000010080000005 : 0x00690000000145
0x00000101018000006580000001 : 0x
0x00000101018000006680000002 : 0x
0x00000101018000006780000003 : 0x
0x00000101018000006880000004 : 0x
0x00000101018000006980000005 : 0x
MyRocks在运行过程中,mysql_ldb是不能写入数据的
bin/mysql_ldb --db=data/.rocksdb put --hex 0x0000010080000006 0x006A0000000146
Failed: IO error: While lock file: data/.rocksdb/LOCK: Resource temporarily unavailable
shutdown myrocks实例后,再写入成功
bin/mysql_ldb --db=data/.rocksdb put --hex 0x0000010080000006 0x006A0000000146
ok
可以看到数据写到了新的sst(000041.sst)中
select d.table_schema, d.table_name, d.index_name, d.index_number,f.sst_name, f.NUM_ROWS, f.ENTRY_DELETES, f.ENTRY_SINGLEDELETES,f.ENTRY_MERGES, f.ENTRY_OTHERS from information_schema.ROCKSDB_DDL d, information_schema.rocksdb_index_file_map f where d.index_number=f.index_number and d.table_name='t1';
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| table_schema | table_name | index_name | index_number | sst_name | NUM_ROWS | ENTRY_DELETES | ENTRY_SINGLEDELETES | ENTRY_MERGES | ENTRY_OTHERS |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| test | t1 | PRIMARY | 256 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
| test | t1 | idx1 | 257 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
| test | t1 | PRIMARY | 256 | 000041.sst | 1 | 0 | 0 | 0 | 0 |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
同时可以看到新的数据(6,106,’F’);
select * from t1;
+----+------+------+
| c1 | c2 | c3 |
+----+------+------+
| 1 | 101 | A |
| 2 | 102 | B |
| 3 | 103 | C |
| 4 | 104 | D |
| 5 | 105 | E |
| 6 | 106 | F |
+----+------+------+
二级索引没有写数据,所以查不到
select * from t1 where c2=106;
Empty set (0.00 sec)
explain select * from t1 where c2=106;
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
| 1 | SIMPLE | t1 | ref | idx1 | idx1 | 5 | const | 4 | NULL |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
接着二级索引插入数据
bin/mysql_ldb --db=data/.rocksdb put --hex 0x00000101018000006A80000006 0x
OK
此时可以从二级索引查出数据了
select * from t1 where c2=106;
+----+------+------+
| c1 | c2 | c3 |
+----+------+------+
| 6 | 106 | F |
+----+------+------+
select d.table_schema, d.table_name, d.index_name, d.index_number,f.sst_name, f.NUM_ROWS, f.ENTRY_DELETES, f.ENTRY_SINGLEDELETES,f.ENTRY_MERGES, f.ENTRY_OTHERS from information_schema.ROCKSDB_DDL d, information_schema.rocksdb_index_file_map f where d.index_number=f.index_number and d.table_name='t1';
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| table_schema | table_name | index_name | index_number | sst_name | NUM_ROWS | ENTRY_DELETES | ENTRY_SINGLEDELETES | ENTRY_MERGES | ENTRY_OTHERS |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
| test | t1 | PRIMARY | 256 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
| test | t1 | idx1 | 257 | 000030.sst | 5 | 0 | 0 | 0 | 0 |
| test | t1 | PRIMARY | 256 | 000041.sst | 1 | 0 | 0 | 0 | 0 |
| test | t1 | idx1 | 257 | 000058.sst | 1 | 0 | 0 | 0 | 0 |
+--------------+------------+------------+--------------+------------+----------+---------------+---------------------+--------------+--------------+
MANIFEST文件格式参考这里
bin/mysql_ldb manifest_dump --path=data/.rocksdb/MANIFEST-000059
--------------- Column family "default" (ID 0) --------------
log number: 58
comparator: RocksDB_SE_v3.10
--- level 0 --- version# 2 ---
58:740['
--------------- Column family "__system__" (ID 1) --------------
log number: 63
comparator: RocksDB_SE_v3.10
--- level 0 --- version# 3 ---
64:773['
next_file_number 66 last_sequence 36 prev_log_number 0 max_column_family 1
人为将MANIFEST-000059删除,rocksdb数据将无法查看
bin/mysql_ldb --db=data/.rocksdb scan --hex
Failed: IO error: While opening a file for sequentially reading: data/.rocksdb/MANIFEST-000059
mysql_ldb 通过 repair 选项可以恢复出MANIFEST
bin/mysql_ldb repair --db=data/.rocksdb scan
RocksDB version: 5.9.0
Git sha rocksdb_build_git_sha:7c8c83458365f8b359cae13785d15b0bdc9df380
Compile date Dec 16 2017
DB SUMMARY
CURRENT file: CURRENT
IDENTITY file: IDENTITY
SST files in data/.rocksdb dir, Total Num: 7, files: 000030.sst 000039.sst 000041.sst 000047.sst 000054.sst 000058.sst 000064.sst
Write Ahead Log file in data/.rocksdb: 000063.log size: 19 ;
.......... //省略部分配置信息
[WARN] [/home/zhangyuan.zy/git/rds6/rocksdb/db/repair.cc:209] **** Repaired rocksdb data/.rocksdb; recovered 7 files; 5847bytes. Some data may have been lost. ****
OK
repair 完成后数据可以查看
bin/mysql_ldb --db=data/.rocksdb scan --hex
0x0000010080000001 : 0x00650000000141
0x0000010080000002 : 0x00660000000142
0x0000010080000003 : 0x00670000000143
0x0000010080000004 : 0x00680000000144
0x0000010080000005 : 0x00690000000145
0x0000010080000006 : 0x006A0000000146
0x00000101018000006580000001 : 0x
0x00000101018000006680000002 : 0x
0x00000101018000006780000003 : 0x
0x00000101018000006880000004 : 0x
0x00000101018000006980000005 : 0x
0x00000101018000006A80000006 : 0x
同时生成了新的MANIFEST文件
bin/mysql_ldb manifest_dump --path=data/.rocksdb/MANIFEST-000003
--------------- Column family "default" (ID 0) --------------
log number: 0
comparator: RocksDB_SE_v3.10
--- level 0 --- version# 2 ---
58:740['
--------------- Column family "__system__" (ID 1) --------------
log number: 0
comparator: RocksDB_SE_v3.10
--- level 0 --- version# 3 ---
64:773['
next_file_number 66 last_sequence 36 prev_log_number 0 max_column_family 1
本文简单介绍了MyRocks相关工具sst_dump和mysql_ldb的基本用法,希望对大家有所帮助。