Starrocks-SparkDpp
2023-01-10 13:02:58 0 举报
spark load 中etl过程
作者其他创作
大纲/内容
SparkEtlJob
新版
processData()
RollupTree
LOAD
硬编码int aggregateConcurrency = 200;
initSparkConfigs(fileGroup.hiveTableProperties)
并发
原版
ETL
串行分层遍历RollupTree
isHive?
doDpp()
hdfs://127.0.0.1:10000/jobconfig.json
SparkDpp
data quality don't check for orc/parquet load
EtlJobConfig.SourceType.FILE
SparkLoadPendingTask
submit job
init()
processDpp()
EtlJobConfig.SourceType.HIVE
USE starrocks_demo;LOAD LABEL starrocks_demo.label1( DATA INFILE(\"hdfs://127.0.0.1:1000/starrocks-demo/data/demo3_data1*\") INTO TABLE demo3_spark_tb1 COLUMNS TERMINATED BY \"\\t\
收藏
0 条评论
下一页
为你推荐
查看更多