Author: 张远
参数 |
说明 |
备注 |
rocksdb_block_cache |
缓存uncompressed blocks,此cache有分区优化,分区数由table_cache_numshardbits控制,默认为6即64个分区。 每个分区至少大于512k(rocksdb::LRUCache::LRUCache) |
默认为512M |
rocksdb_max_total_wal_size |
如果WAL超过rocksdb_max_total_wal_size,会swich memtable并flush memtable |
默认为0, 表示大小不能超过所有 columnfamily write_buffer的4倍 |
rocksdb_wal_size_limit_mb |
purge wal时最多可以保留wal的最大大小 (对应DBOptions::WAL_size_limit_MB) |
默认为0,表示不控制保留wal数量, 只要memtable flush了wal都可以 purge |
rocksdb_wal_ttl_seconds |
控制purge wal的频率,每隔rocksdb_wal_ttl_seconds/2 purge一次 。如果rocksdb_wal_size_limit_mb > 0, 那么每600s purge一次(kDefaultIntervalToDeleteObsoleteWAL) |
默认为0 |
rocksdb_manual_wal_flush |
If true WAL is not flushed automatically after each write. Instead it relies on manual invocation of FlushWAL to write the WAL buffer to its file. |
默认为true |
rocksdb_deadlock_detect |
是否开启死锁检测 |
默认是关闭的 |
rocksdb_wal_bytes_per_sync |
每rocksdb_wal_bytes_per_sync字节sync一次WAL(WritableFileWriter::Flush) |
默认为0, 每次都刷 |
rocksdb_wal_recovery_mode |
重启时recovery模式 |
1: Fail to start, do not recover 0: If corrupted last entry: truncate and start 2: Truncate everything after corrupted entry • Even not corrupted entries • Acceptable on slaves 3: Truncate only corrupted entry • Most dangerous option |
rocksdb_strict_collation_exceptions |
可以取非memcompare类型collation的表 |
取值为正则表达式,如"t1,t2*" |
rpl_skip_tx_api |
Use write batches for replication thread instead of tx api |
作用于备库 |
rocksdb_master_skip_tx_api |
Disables Transaction API Enables WriteBatch API, There is no row lock,UPDATE and DELETEs are faster |
You must ensure no concurrent operation running |
rocksdb_read_free_rpl_tables |
用正则表达式指定使用read free replication的库表,如.*或t.* |
默认为空 |
rocksdb_info_log_level |
日志级别,数值越小越详细 |
0:debug_level 1:info_level 2:warn_level 3:error_level 4:fatal_level 5:header_level |
rocksdb_perf_context_level |
指定 perf context的级别 0,1: disable 2: enable only count stats 3: Other than count stats, also enable time stats except for mutexes 4: enable count and time stats |
默认0 |
rocksdb_max_background_jobs |
后台工作线程数 |
老版本还分为rocksdb_max_background_jobs和max_background_compactions,新版合为一个,会自动分配两者数量。 https://github.com/facebook/rocksdb/wiki/Thread-Pool |
rocksdb_commit_in_the_middle |
Commit rows implicitly every rocksdb_bulk_load_size, 设置rocksdb_bulk_load为on时自动commit in middle |
默认OFF, 不建议全局设置,应回话级别设置 |
rocksdb_blind_delete_primary_key |
通过主键delete 有且仅有主键索引的表时,不需要读取数据,直接通过指定的主键来删除 |
默认OFF, DELETES by Primary Key Works: DELETE FROM t WHERE id IN (1, 2, 3, 4, 5, 6, ...., 10000) Does not work: DELETE .. WHERE id < 10 |
rocksdb_use_direct_reads |
use O_DIRECT for reading data |
默认OFF |
rocksdb_use_direct_io_for_flush_and_compaction |
use O_DIRECT for flush and compact |
默认OFF |
rocksdb_skip_fill_cache |
Skip filling block cache on read requests |
默认OFF, DDL load 时使用 |
gap_lock_raise_error |
Using Gap Lock without full unique key in multi-table or multi-statement transactions is not allowed. 违法以上情况使用gap lock会记入错误日志 |
默认false |
gap_lock_write_log |
Using Gap Lock without full unique key in multi-table or multi-statement transactions is not allowed. 违法以上情况使用gap lock会记入gap_lock_log_file指定的文件中 |
默认false |
gap_lock_log_file |
指定记录gap lock的文件 |
|
rocksdb_stats_dump_period_sec |
控制Statistic信息记录到LOG中的频率(DBImpl::PrintStatistics) |
默认600, Note that currently it is only dumped after a compaction. So if the database doesn't serve any write for a long time, statistics may not be dumped, despite of options.stats_dump_period_sec. |
rocksdb_compaction_readahead_size |
If non-zero, we perform bigger reads when doing compaction. If you're running RocksDB on spinning disks, you should set this to at least 2MB. That way RocksDB's compaction is doing sequential instead of random reads. |
默认为0 |
rocksdb_advise_random_on_open |
If set true, will hint the underlying file system that the file access pattern is random, when a sst file is opened. |
默认ON |
rocksdb_max_row_locks |
事务最多可以持有锁的个数 |
默认1M |
rocksdb_bytes_per_sync |
每rocksdb_wal_bytes_per_sync字节sync一次sst文件(WritableFileWriter::Flush) |
默认为0, 每次都刷 You may consider using rate_limiter to regulate write rate to device. When rate limiter is enabled, it automatically enables bytes_per_sync to 1MB. |
rocksdb_enable_ttl |
Enable expired TTL records to be dropped during compaction |
默认ON |
rocksdb_enable_ttl_read_filtering |
For tables with TTL, expired records are skipped/filtered out during processing and in query results. Disabling this will allow these records to be seen, but as a result rows may disappear in the middle of transactions as they are dropped during compaction. Use with caution. |
默认ON |
rocksdb_bulk_load |
bulk_load开关 |
默认OFF, |
rocksdb_bulk_load_allow_unsorted |
支持非主键排序数据的bulk_load |
默认OFF |
rocksdb_bulk_load_size |
每rocksdb_bulk_load_size次write进行一次bulk_load |
默认1000次 |
rocksdb_enable_bulk_load_api |
Enables using SstFileWriter for bulk loading |
默认ON |
rocksdb_enable_2pc |
是否开启2pc |
默认ON |
rocksdb_rate_limiter_bytes_per_sec |
控制读写sst的速度 DBOptions::rate_limiter bytes_per_sec for RocksDB |
默认0 |
rocksdb_sst_mgr_rate_bytes_per_sec |
控制删除sst的速度 DBOptions::sst_file_manager rate_bytes_per_sec for RocksDB |
默认0 |
rocksdb_delayed_write_rate |
WriteStall时delay的时间,单位微秒(DBOptions::delayed_write_rate) |
默认0 |
rocksdb_write_disable_wal |
是否关闭WAL |
默认为OFF |
rocksdb_flush_log_at_trx_commit |
Sync wal on transaction commit Similar to innodb_flush_log_at_trx_commit. 1: sync on commit, 0,2: not sync on commit |
默认1 |
rocksdb_cache_index_and_filter_blocks |
index和filter blocks是否缓存到block cache |
默认ON |
rocksdb_pin_l0_filter_and_index_blocks_in_cache |
if cache_index_and_filter_blocks is true and the below is true, then filter and index blocks are stored in the cache, but a reference is held in the "table reader" object so the blocks are pinned and only evicted from cache when the table reader is freed. |
默认ON |
以上参数可以通过show variables查看
更详细可以参考代码 db_options_type_info
include/rocksdb/options.h
参数 |
说明 |
备注 |
write_buffer_size |
memtable内存大小 |
默认 |
max_write_buffer_number |
memtable的最大个数 |
默认2 |
min_write_buffer_number_to_merge |
it is the minimum number of memtables to be merged before flushing to storage. For example, if this option is set to 2, immutable memtables are only flushed when there are two of them |
默认1 |
target_file_size_base |
level1 sst大小 |
默认64M |
target_file_size_multiplier |
level L(L>1) sst大小 target_file_size_base * (target_file_size_multiplier ^ (L-1)) |
默认1, For example, if target_file_size_base is 2MB and target_file_size_multiplier is 10, then each file on level-1 will be 2MB, and each file on level-2 will be 20MB, and each file on level-3 will be 200MB |
max_bytes_for_level_base |
level1的sst总大小 |
默认256M |
max_bytes_for_level_multiplier |
level L的sst总大小为 max_bytes_for_level_base*(max_bytes_for_level_multiplier)^(L-1))*max_bytes_for_level_multiplier_additional(L-1) (VersionStorageInfo::CalculateBaseBytes) |
默认10 |
max_bytes_for_level_multiplier_additional |
Different max-size multipliers for different levels. (VersionStorageInfo::CalculateBaseBytes) |
默认:1:1:1:1:1:1:1 |
num_levels |
level数量 |
默认7 |
level0_file_num_compaction_trigger |
当level0文件数量超过此值时触发level0 compact |
默认4 |
level0_slowdown_writes_trigger |
当level0文件数量超过此值时触发x写delay |
默认20 |
level0_stop_writes_trigger |
当level0文件数量超过此值时触发停写 |
默认36 |
pin_l0_filter_and_index_blocks_in_cache |
if cache_index_and_filter_blocks is true and the below is true, then filter and index blocks are stored in the cache, but a reference is held in the "table reader" object so the blocks are pinned and only evicted from cache when the table reader is freed. |
默认1, column family单独设置会覆盖rocksdb_pin_l0_filter_and_index_blocks_in_cache |
cache_index_and_filter_blocks |
index和filter blocks是否缓存到block cache |
默认1, column family单独设置会覆盖rocksdb_cache_index_and_filter_blocks |
optimize_filters_for_hits |
设置为True, 最后一层不保存filter信息,最后一层bloomfilter实际没有用处 |
默认OFF |
filter_policy |
指定filter策略 |
filter_policy=bloomfilter:10:false 表示使用bloomfilter, bits_per_key_=10, hash函数个数为10*ln2, false:use_block_based_builder_=false,表示使用full filter |
prefix_extractor |
指定filter使用前缀 |
prefix_extractor=capped:24表示最多取前缀24个字节,另外还有fixed:n方式表示只取前缀n个字节,忽略小于n个字节的key. 具体可参考CappedPrefixTransform,FixedPrefixTransform |
partition_filters |
表示时否使用partitioned filter |
默认false filter 参数优先级如下 block base > partitioned > full. 比如说同时指定use_block_based_builder_=true和partition_filters=true实际使用的block based filter |
whole_key_filtering |
If true, place whole keys in the filter (not just prefixes) |
默认1 |
level_compaction_dynamic_level_bytes |
In this mode, size target of levels are changed dynamically based on size of the last level. 减少写放大 |
|
memtable |
指定memtable类型(skiplist/vector/hash_linkedlist/prefix_hash/cuckoo) |
默认skiplist |
compaction_pri |
compact选择文件策略 kByCompensatedSize: Slightly prioritize larger files by size compensated by #deletes kOldestLargestSeqFirst: First compact files whose data's latest update time is oldest kOldestSmallestSeqFirst: First compact files whose range hasn't been compacted to the next level for the longest kMinOverlappingRatio: First compact files whose ratio between overlapping size in next level and its size is the smallest |
默认kByCompensatedSize |
compression_per_level |
指定每个level的压缩策略 |
It usually makes sense to avoid compressing levels 0 and 1 and to compress data only in higher levels. You can even set slower compression in highest level and faster compression in lower levels (by highest we mean Lmax). |
bottommost_compression |
指定最底level的压缩策略 |
|
arena_block_size |
rocksdb内存分配单位KBlockSize由参数arena_block_size指定 |
arena_block_size不指定时默认为write_buffer_size的1/8. |
soft_pending_compaction_bytes_limit |
All writes will be slowed down to at least delayed_write_rate if estimated bytes needed to be compaction exceed this threshold |
默认64G |
hard_pending_compaction_bytes_limit |
All writes are stopped if estimated bytes needed to be compaction exceed this threshold. |
默认256G |
以上参数可以通过select * from information_schema.rocksdb_cf_options查看
更详细可以参考代码ParseColumnFamilyOption, cf_options_type_info
include/rocksdb/table.h
rocksdb/util/options_helper.h
rocksdb/options/options_helper.cc
include/rocksdb/advanced_options.h
参数配置示例
rocksdb_default_cf_options=memtable=vector;
arena_block_size=10M;
disable_auto_compactions=1;
min_write_buffer_number_to_merge=1;
write_buffer_size=100000m;
target_file_size_base=32m;
max_bytes_for_level_base=512m;
level0_file_num_compaction_trigger=20;
level0_slowdown_writes_trigger=30;
level0_stop_writes_trigger=30;
max_write_buffer_number=5;
compression_per_level=kNoCompression:kNoCompression:kNoCompression:kNoCompression:kNoCompression:kNoCompression;
bottommost_compression=kNoCompression;
block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};
level_compaction_dynamic_level_bytes=false;
optimize_filters_for_hits=true
参数修改示例
SET @@global.rocksdb_update_cf_options='cf1={write_buffer_size=8m;target_file_size_base=2m};cf2={write_buffer_size =16m;max_bytes_for_level_multiplier=8};cf3={target_file_size_base=4m};';
注意:此方式可以动态修改,但没有持久化到OPTIONS文件中, 需手动修改OPTIONS文件