数据库内核月报

以下参数是db级别的，全局有效

参数	说明	备注
rocksdb_block_cache	缓存uncompressed blocks，此cache有分区优化，分区数由table_cache_numshardbits控制，默认为6即64个分区。每个分区至少大于512k（rocksdb::LRUCache::LRUCache）	默认为512M
rocksdb_max_total_wal_size	如果WAL超过rocksdb_max_total_wal_size，会swich memtable并flush memtable	默认为0, 表示大小不能超过所有 columnfamily write_buffer的4倍
rocksdb_wal_size_limit_mb	purge wal时最多可以保留wal的最大大小（对应DBOptions::WAL_size_limit_MB）	默认为0，表示不控制保留wal数量, 只要memtable flush了wal都可以 purge
rocksdb_wal_ttl_seconds	控制purge wal的频率，每隔rocksdb_wal_ttl_seconds/2 purge一次。如果rocksdb_wal_size_limit_mb > 0, 那么每600s purge一次（kDefaultIntervalToDeleteObsoleteWAL）	默认为0
rocksdb_manual_wal_flush	If true WAL is not flushed automatically after each write. Instead it relies on manual invocation of FlushWAL to write the WAL buffer to its file.	默认为true
rocksdb_deadlock_detect	是否开启死锁检测	默认是关闭的
rocksdb_wal_bytes_per_sync	每rocksdb_wal_bytes_per_sync字节sync一次WAL(WritableFileWriter::Flush)	默认为0, 每次都刷
rocksdb_wal_recovery_mode	重启时recovery模式	1: Fail to start, do not recover 0: If corrupted last entry: truncate and start 2: Truncate everything after corrupted entry • Even not corrupted entries • Acceptable on slaves 3: Truncate only corrupted entry • Most dangerous option
rocksdb_strict_collation_exceptions	可以取非memcompare类型collation的表	取值为正则表达式，如"t1,t2*"
rpl_skip_tx_api	Use write batches for replication thread instead of tx api	作用于备库
rocksdb_master_skip_tx_api	Disables Transaction API Enables WriteBatch API, There is no row lock，UPDATE and DELETEs are faster	You must ensure no concurrent operation running
rocksdb_read_free_rpl_tables	用正则表达式指定使用read free replication的库表，如.或t.	默认为空
rocksdb_info_log_level	日志级别，数值越小越详细	0:debug_level 1:info_level 2:warn_level 3:error_level 4:fatal_level 5:header_level
rocksdb_perf_context_level	指定 perf context的级别 0，1: disable 2: enable only count stats 3: Other than count stats, also enable time stats except for mutexes 4: enable count and time stats	默认0
rocksdb_max_background_jobs	后台工作线程数	老版本还分为rocksdb_max_background_jobs和max_background_compactions，新版合为一个，会自动分配两者数量。 https://github.com/facebook/rocksdb/wiki/Thread-Pool
rocksdb_commit_in_the_middle	Commit rows implicitly every rocksdb_bulk_load_size, 设置rocksdb_bulk_load为on时自动commit in middle	默认OFF, 不建议全局设置，应回话级别设置
rocksdb_blind_delete_primary_key	通过主键delete 有且仅有主键索引的表时，不需要读取数据，直接通过指定的主键来删除	默认OFF, DELETES by Primary Key Works: DELETE FROM t WHERE id IN (1, 2, 3, 4, 5, 6, ...., 10000) Does not work: DELETE .. WHERE id < 10
rocksdb_use_direct_reads	use O_DIRECT for reading data	默认OFF
rocksdb_use_direct_io_for_flush_and_compaction	use O_DIRECT for flush and compact	默认OFF
rocksdb_skip_fill_cache	Skip filling block cache on read requests	默认OFF， DDL load 时使用
gap_lock_raise_error	Using Gap Lock without full unique key in multi-table or multi-statement transactions is not allowed. 违法以上情况使用gap lock会记入错误日志	默认false
gap_lock_write_log	Using Gap Lock without full unique key in multi-table or multi-statement transactions is not allowed. 违法以上情况使用gap lock会记入gap_lock_log_file指定的文件中	默认false
gap_lock_log_file	指定记录gap lock的文件
rocksdb_stats_dump_period_sec	控制Statistic信息记录到LOG中的频率（DBImpl::PrintStatistics）	默认600， Note that currently it is only dumped after a compaction. So if the database doesn't serve any write for a long time, statistics may not be dumped, despite of options.stats_dump_period_sec.
rocksdb_compaction_readahead_size	If non-zero, we perform bigger reads when doing compaction. If you're running RocksDB on spinning disks, you should set this to at least 2MB. That way RocksDB's compaction is doing sequential instead of random reads.	默认为0
rocksdb_advise_random_on_open	If set true, will hint the underlying file system that the file access pattern is random, when a sst file is opened.	默认ON
rocksdb_max_row_locks	事务最多可以持有锁的个数	默认1M
rocksdb_bytes_per_sync	每rocksdb_wal_bytes_per_sync字节sync一次sst文件(WritableFileWriter::Flush)	默认为0, 每次都刷 You may consider using rate_limiter to regulate write rate to device. When rate limiter is enabled, it automatically enables bytes_per_sync to 1MB.
rocksdb_enable_ttl	Enable expired TTL records to be dropped during compaction	默认ON
rocksdb_enable_ttl_read_filtering	For tables with TTL, expired records are skipped/filtered out during processing and in query results. Disabling this will allow these records to be seen, but as a result rows may disappear in the middle of transactions as they are dropped during compaction. Use with caution.	默认ON
rocksdb_bulk_load	bulk_load开关	默认OFF, https://github.com/facebook/mysql-5.6/wiki/data-loading
rocksdb_bulk_load_allow_unsorted	支持非主键排序数据的bulk_load	默认OFF
rocksdb_bulk_load_size	每rocksdb_bulk_load_size次write进行一次bulk_load	默认1000次
rocksdb_enable_bulk_load_api	Enables using SstFileWriter for bulk loading	默认ON
rocksdb_enable_2pc	是否开启2pc	默认ON
rocksdb_rate_limiter_bytes_per_sec	控制读写sst的速度 DBOptions::rate_limiter bytes_per_sec for RocksDB	默认0
rocksdb_sst_mgr_rate_bytes_per_sec	控制删除sst的速度 DBOptions::sst_file_manager rate_bytes_per_sec for RocksDB	默认0
rocksdb_delayed_write_rate	WriteStall时delay的时间，单位微秒（DBOptions::delayed_write_rate）	默认0
rocksdb_write_disable_wal	是否关闭WAL	默认为OFF
rocksdb_flush_log_at_trx_commit	Sync wal on transaction commit Similar to innodb_flush_log_at_trx_commit. 1: sync on commit, 0,2: not sync on commit	默认1
rocksdb_cache_index_and_filter_blocks	index和filter blocks是否缓存到block cache	默认ON
rocksdb_pin_l0_filter_and_index_blocks_in_cache	if cache_index_and_filter_blocks is true and the below is true, then filter and index blocks are stored in the cache, but a reference is held in the "table reader" object so the blocks are pinned and only evicted from cache when the table reader is freed.	默认ON

以上参数可以通过show variables查看

更详细可以参考代码 db_options_type_info

include/rocksdb/options.h

以下参数是column family级别的，可以分别对每个column family设置

参数	说明	备注
write_buffer_size	memtable内存大小	默认
max_write_buffer_number	memtable的最大个数	默认2
min_write_buffer_number_to_merge	it is the minimum number of memtables to be merged before flushing to storage. For example, if this option is set to 2, immutable memtables are only flushed when there are two of them	默认1
target_file_size_base	level1 sst大小	默认64M
target_file_size_multiplier	level L(L>1) sst大小 target_file_size_base * (target_file_size_multiplier ^ (L-1))	默认1， For example, if target_file_size_base is 2MB and target_file_size_multiplier is 10, then each file on level-1 will be 2MB, and each file on level-2 will be 20MB, and each file on level-3 will be 200MB
max_bytes_for_level_base	level1的sst总大小	默认256M
max_bytes_for_level_multiplier	level L的sst总大小为 max_bytes_for_level_base(max_bytes_for_level_multiplier)^(L-1))max_bytes_for_level_multiplier_additional(L-1) (VersionStorageInfo::CalculateBaseBytes)	默认10
max_bytes_for_level_multiplier_additional	Different max-size multipliers for different levels. (VersionStorageInfo::CalculateBaseBytes)	默认：1:1:1:1:1:1:1
num_levels	level数量	默认7
level0_file_num_compaction_trigger	当level0文件数量超过此值时触发level0 compact	默认4
level0_slowdown_writes_trigger	当level0文件数量超过此值时触发x写delay	默认20
level0_stop_writes_trigger	当level0文件数量超过此值时触发停写	默认36
pin_l0_filter_and_index_blocks_in_cache	if cache_index_and_filter_blocks is true and the below is true, then filter and index blocks are stored in the cache, but a reference is held in the "table reader" object so the blocks are pinned and only evicted from cache when the table reader is freed.	默认1， column family单独设置会覆盖rocksdb_pin_l0_filter_and_index_blocks_in_cache
cache_index_and_filter_blocks	index和filter blocks是否缓存到block cache	默认1， column family单独设置会覆盖rocksdb_cache_index_and_filter_blocks
optimize_filters_for_hits	设置为True，最后一层不保存filter信息，最后一层bloomfilter实际没有用处	默认OFF
filter_policy	指定filter策略	filter_policy=bloomfilter:10:false 表示使用bloomfilter, bits_per_key_=10, hash函数个数为10*ln2， false：use_block_based_builder_=false，表示使用full filter
prefix_extractor	指定filter使用前缀	prefix_extractor=capped:24表示最多取前缀24个字节，另外还有fixed:n方式表示只取前缀n个字节，忽略小于n个字节的key. 具体可参考CappedPrefixTransform，FixedPrefixTransform
partition_filters	表示时否使用partitioned filter	默认false filter 参数优先级如下 block base > partitioned > full. 比如说同时指定use_block_based_builder_=true和partition_filters=true实际使用的block based filter
whole_key_filtering	If true, place whole keys in the filter (not just prefixes)	默认1
level_compaction_dynamic_level_bytes	In this mode, size target of levels are changed dynamically based on size of the last level. 减少写放大	http://rocksdb.org/blog/2015/07/23/dynamic-level.html
memtable	指定memtable类型(skiplist/vector/hash_linkedlist/prefix_hash/cuckoo)	默认skiplist
compaction_pri	compact选择文件策略 kByCompensatedSize： Slightly prioritize larger files by size compensated by #deletes kOldestLargestSeqFirst： First compact files whose data's latest update time is oldest kOldestSmallestSeqFirst： First compact files whose range hasn't been compacted to the next level for the longest kMinOverlappingRatio： First compact files whose ratio between overlapping size in next level and its size is the smallest	默认kByCompensatedSize
compression_per_level	指定每个level的压缩策略	It usually makes sense to avoid compressing levels 0 and 1 and to compress data only in higher levels. You can even set slower compression in highest level and faster compression in lower levels (by highest we mean Lmax).
bottommost_compression	指定最底level的压缩策略
arena_block_size	rocksdb内存分配单位KBlockSize由参数arena_block_size指定	arena_block_size不指定时默认为write_buffer_size的1/8.
soft_pending_compaction_bytes_limit	All writes will be slowed down to at least delayed_write_rate if estimated bytes needed to be compaction exceed this threshold	默认64G
hard_pending_compaction_bytes_limit	All writes are stopped if estimated bytes needed to be compaction exceed this threshold.	默认256G

以上参数可以通过select * from information_schema.rocksdb_cf_options查看

更详细可以参考代码ParseColumnFamilyOption, cf_options_type_info

include/rocksdb/table.h
rocksdb/util/options_helper.h
rocksdb/options/options_helper.cc
include/rocksdb/advanced_options.h

参数配置示例

rocksdb_default_cf_options=memtable=vector;
arena_block_size=10M;
disable_auto_compactions=1;
min_write_buffer_number_to_merge=1;
write_buffer_size=100000m;
target_file_size_base=32m;
max_bytes_for_level_base=512m;
level0_file_num_compaction_trigger=20;
level0_slowdown_writes_trigger=30;
level0_stop_writes_trigger=30;
max_write_buffer_number=5;
compression_per_level=kNoCompression:kNoCompression:kNoCompression:kNoCompression:kNoCompression:kNoCompression;
bottommost_compression=kNoCompression;
block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};
level_compaction_dynamic_level_bytes=false;
optimize_filters_for_hits=true

参数修改示例

SET @@global.rocksdb_update_cf_options='cf1={write_buffer_size=8m;target_file_size_base=2m};cf2={write_buffer_size =16m;max_bytes_for_level_multiplier=8};cf3={target_file_size_base=4m};';

注意：此方式可以动态修改，但没有持久化到OPTIONS文件中, 需手动修改OPTIONS文件

数据库内核月报－ 2018 / 01

MySQL · MyRocks · MyRocks参数介绍

以下参数是db级别的，全局有效

以下参数是column family级别的，可以分别对每个column family设置