Spark 2.4.3 spark-submit执行流程
2020-07-03 14:20:49 12 举报
spark submit 提交流程
作者其他创作
大纲/内容
最终的结果就是提交jvm参数,+ ApplicationMaster 或者 ExecutorLauncher到yarn上去运行
利用YarnClient 与 Yarn集群连接
new ApplicationMasterArguments(args)
提交Application
yarnClient.submitApplication(appContext)
prepareSubmitEnvironment
org.apache.spark.deploy.yarn.ApplicationMaster
//Endpoint生命周期The life-cycle of an endpoint is: constructor -> onStart -> receive -> onStop
runDriver()
最终直接执行SparkPi的main方法
org.apache.spark.executor.CoarseGrainedExecutorBackend
new Client()
org.apache.spark.deploy.yarn.YarnAllocator#handleAllocatedContainers//使用线程池,去运行多个Executor线程runAllocatedContainers(containersToUse)
run()
startUserApplication()
org.apache.spark.deploy.yarn.ApplicationMaster#runDriver
org.apache.spark.deploy.SparkSubmit
createAllocator()
根据启动参数判断是 client 模式;cluster 模式
利用反射执行用户的程序main
master.run()
启动AM
client
org.apache.spark.deploy.SparkSubmit#doSubmit
main()
org.apache.spark.deploy.yarn.ApplicationMaster#runImpl
start()方法
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client ./examples/jars/spark-examples_2.11-2.1.1.jar 100
org.apache.spark.rpc.ThreadSafeRpcEndpoint
org.apache.spark.deploy.yarn.Client#createContainerLaunchContext
spark-submit
org.apache.spark.deploy.yarn.Client#submitApplicationlauncherBackend.connect()yarnClient.init(hadoopConf)yarnClient.start()createContainerLaunchContext
submitApplication()
main
org.apache.spark.deploy.yarn.ApplicationMaster#startUserApplicationuserThread.setName(\"Driver\
org.apache.spark.deploy.yarn.ExecutorRunnable#runnmClient.start()startContainer()prepareCommand() --->bin/java org.apache.spark.executor.CoarseGrainedExecutorBackend
org.apache.spark.deploy.yarn.Client
allocateResources()
YarnClient.createYarnClient
此时应该看CoarseGrainedExecutorBackend 下的两个方法:onStart()receive()
在此方法内,封装解析一系列参数,最后将bin/java xxxx 指令封装,简而言之,最后提交到Yarn上的就是该command bin/java org.apache.spark.deploy.yarn.ApplicationMaster (Cluster) bin/java org.apache.spark.deploy.yarn.ExecutorLauncher (client)
注册AM
org.apache.spark.deploy.yarn.Client#runthis.appId = submitApplication()
SparkSubmit
new ApplicationMaster(amArgs)
override def onStart() { logInfo(\"Connecting to driver: \
创建AM
handleAllocatedContainers
通过代码可以看见,无论是ApplicationMaster,或者ExecutorLauncher,都是走的ApplicationMaster.main(args)方法
registerAM())
0 条评论
下一页