clickhouse-merge
2021-10-14 16:23:57 15 举报
clickhouse架构图
作者其他创作
大纲/内容
false
true
保留最小分数
merger_mutator.renameMergedTemporaryPart
getBest
ReplicatedMergeTreeQueueprocessEntry
break
write_part_log
end - begin < settings.min_parts_to_merge_at_once
构建part_info对象
StorageMergeTree-merge
Estimator
age_normalized
sum_size/max_size >= lowered_base
left与right的part(block size)处在相同的version
selectWithinPartition
SimpleMergeSelector
begin>1000
end - begin > settings.max_parts_to_merge_at_once
PartRangevector<Part>
MergeTreeDataPartWriterWidewrite
max_total_size_to_mergemin_size_to_lower_base_logmax_size_to_lower_base_log
for->begin
part block size
merger_mutator.mergePartsToTemporaryPart
FutureMergedMutatedPartfuture_part
size_t busy_threads_in_pool正在运行的background task个数
Merge执行流程
MergedBlockOutputStreamwrite
根据参数计算,选在合适的partRange进行后考虑
MTDMM::getMaxSourcePartsSizeForMerge
vector<PartsRange>
MergeTreeDataMergerMutatormergePartsToTemporaryPart
sum_size > max_total_size_to_merge
num_parts_normalized
mergeSelectedParts
end
lowered_base
selectPartsToMerge
OPTIMIZE with FINAL Merge
ReplicatedMergeTreeLogEntryzk中fetch和merge同步信息
1.partition_id相同;2.pre_part与当前part是否能进行merge3.can_merge函数=false
SimpleMergeSelector.selectWithinPartition(max_total_size_to_merge)
SimpleMergeSelector.allow
min_age_to_lower_basemax_age_to_lower_base(size_normalized越大,max_age越大)
MergeTreeIndexAggregatorSailfishupdate
sailfish::IndexWriter::UpdateDocument
remove small part at right如果右边的part小于sum_size*0.01则删除right part
estimator.getBest()
estimator.consider
size_normalized
SimpleMergeSelector.Settings
(sum_size + sum_size_fixed_cost * count) / (count - 1.9)
MergeMutateSelectedEntry
future_part
urrent_score < min_score
min_score = current_score;best_begin = begin;best_end = end
MergeList
计算分数 consider
如果sum_size/max_size越小:part大小不均衡,出现特别大的part。如果sum_size/max_size越大代表数据越均衡
lock(currently_processing_in_background_mutex)线程锁
CurrentlyMergingPartsTagger
partition_id.empty()未指定partition,进行自动merge
max_bytes_to_merge_at_max_space_in_pool
sum_sizemax_sizemin_age
merge_with_ttl_allowed正在运行的ttl-merge小于最大ttl-merge线程数
left和right not in 正在merge的队列中,同时left与right的version(block size level)相同,即可进行merge
for->PartsRange
right
如果prev_part与当前PartRange比较比较大,将会降低当前PartRange计算得到的分数
getMaxSourcePartsSizeForMerge
merge
part->info.getDataVersion()merge场景:part空间大小mutation场景:mutation值大于part空间大小
确定进行merge的原始part空间大小,返回一个在max_bytes_to_merge_at_max_space_in_pool和max_bytes_to_merge_at_min_space_in_pool之间的空间大小,在考虑磁盘剩余空间,进行merge
FutureMergedMutatedPart将要merge/mutated 的part元信息
资源判断逻辑
difference < difference_in_powers_of_two
size_prev_at_left > sum_size*sum_size_to_prev_part
sum_size
not in
PartsRangesvector<vector<Part>>
SimpleMergeSelector.Estimator
Merge实际执行流程
merge_settings构建merge选择器配置
min_size_to_lower_base=1Mmax_size_to_lower_base=100g
Context
max_source_parts_size > 0
getDataPartsVector({DataPartState::Committed})返回已经commit的part列表,根据state和partInfo升序排序
current_score*= interpolateLinear
continue
MutableDataPartPtrmerge计划的新part
log2(static_cast<double>(sum_size) / size_prev_at_left)
空闲空间最大的一块磁盘空间/DISK_USAGE_COEFFICIENT_TO_SELECT 与 max_size取最小值
storage_settings.get()
combined_ratio
StorageReplicatedMergeTreeprocessQueueEntryexecuteLogEntrytryExecuteMerge
parts_ranges.emplace_back()分段
select
for data part
负责select parts和执行merge和mutator和move
判断逻辑:1.pool中使用的线程<=1;2.空闲的线程数大于number_of_free_entries_in_pool_to_lower_max_size_of_merge;满足一条(有足够的空闲线程,将执行按照最大part空间大小执行merge)
MergeTreeDataMergerMutator-》MTDMM
Part merge通过后台的启发式算法
MergeTreeDataPartWriterOnDiskcalculateAndSerializeSkipIndices
can_merge
metadata_snapshot
for->end
table_lock_holder=lockForShare
allow
parts_to_mergestd::vector<Part>
left
收藏
0 条评论
下一页