spark submit 源码分析
2020-04-17 13:48:42 14 举报
spark-submit源码分析 包含:资源的分配、driver、application、executor的创建
作者其他创作
大纲/内容
rpcEnv
CoarseGrainedSchedulerBackend
spark-submit
3
name=driverClientmasterEndpoints:从传递参数中获取master的url
如果coresPerExecutor为Nil,则为每个Worker分配一个Executor
DAGScheduler
资源划分、启动Worker
CoarseGrainedScheduler
send
childMainClass: ClientApp
manager.start()
5C9G
usableWorkers
memoryPerExecutor
6C12G
driverDescriptionmainClass:org.apache.spark.deploy.worker.DriverWrapper
待分配核数:取还需分配核数和可用核数的最小值
ClientApp
每个Executor分配的内存
DriverRunner
启动我们写的方法,通过driverArgs.mainClass 调用我们写的new SparkContext(conf)方法
1
workers
SchedulerBackend
assignedCores
driver.start()
从waitingDrivers中遍历driver向Worker发送LaunchDriver类型的消息
向所有Master发送RegisterApplication类型的消息
spreadOutApps
org.apache.spark.deploy.worker.DriverWrapper
schedule()
SparkEnv
遍历masterEndpoints,向master ask消息,消息类型为:RequestSubmitDriver注册Driver
在LaunchExecutor之后向Application发送ExecutorAdded类型的消息
参数列表:--master MASTER_URL--deploy-mode DEPLOY_MODE--class CLASS_NAME--name NAME--jars JARS--driver-memory MEM --executor-memory MEM--executor-core (standalone、yarn)--total-executor-cores (standalone、mesos)--num-executors (yarn)......
onStart
StandaloneSchedulerBackend
Worker: Driver
coresToAssign
ExecutorRunner
Application
start
runMain
_taskScheduler
assignedExecutors
AppClient
1、registerRpcEndpoint2、EndpointData3、Inboxinbox.synchronized { messages.add(OnStart) } 4、receivers.take(OnStart)5、endpoint.onStart()
DriverInfo
coresPerExecutor
9C8G
默认为true,一般不改true水平划分,遍历每台worker划分false垂直划分,先将一台worker分配满,再去下一台
TaskSchedulerImpl
org.apache.spark.executor.CoarseGrainedExecutorBackend
tryRegisterAllMasters
参数需求:--executor-memory 4g--executor-core 3--total-executor-cores 9
参数中不同的运行模式,创建不同的实现local:TaskSchedulerImpl、LocalSchedulerBackendspark:TaskSchedulerImpl、StandaloneSchedulerBackend
ask
Worker: Executor
Master
Nil
driverArgs.mainClass
minCoresPerExecutor
继承
每个executor上的core个数,从参数--executor-core获取
oneExecutorPerWorker
可用的worker数组:根据以下条件过滤1、worker.memoryFree >= app.desc.memoryPerExecutorMB2、worker.coresFree >= coresPerExecutor返回一个可用cores从大到小数组
SparkApplication
StandaloneAppClient
从waitingApps中遍历app向Worker发送LaunchExecutor类型的消息
prepareSubmitEnvironment: Tuple4
Client
基于spark跑在standalone cluster模式下的分析
SparkContext
client和cluster模式的区别在于:启动Driver进程是在Client端还是在集群的某一台机器上
默认最小值为1
分配的cores数组
0 条评论
下一页