一、ExecutionGraph概述
在JobGraph向ExecutionGraph转化的过程中,主要的工作内容根据Operator的并行度来拆分JobVertext,每一个Jobvertex根据自身并行度会拆分成多个ExecutionVertex,使用IntermediateResultPartition对象来接收ExecutionVertex的输出。对于同一个ExecutionVertex的多个输出IntermediaResultPartition对象组成了一个IntermediateResult对象。1.12之后没有了ExecutionEdge 取而代之的是ConsumedPartitionGroup和ConsumedVertexGroup。
在flink的ExecutionGraph中,有一对一和多对多两种模式,当上游节点处于多对多模式时,会遍历所有的ExecutionGraph,时间复杂度为O(n的平方)。这将占用大量内存用于大规模作业。
由于同一ExecutionJobVertex中的ExecutionVertex都是由同一个JobVertext根据并行度划分而来,所以接收他们输出的IntermediaResultpartion的结构是相同的,同理,IntermediateResultPartition所连接的下游的ExecutionJobVertex所有的ExecutionVertex也是结构相同的,因此flink根据ExecutionVertex和IntermediaResultPartition进行分组:对于属于同一个ExecutionJobVertex的所有ExecutionVertex构成了一个ConsumerVertexGroup,所有对此ExecutionJobVertex的输入IntermediatePartition构成了一个ConsumerpaititionGroup。
由于ExecutionEdge被替换成看ConsumerPartitionGroup和ConsumedVertexGroup,所有相同结构分区都连接到同一个下游ConsumedVertexGroup。
二、JobMaster初始化
以session模式为例
public JobClient executeAsync(StreamGraph streamGraph) throws Exception {
checkNotNull(streamGraph, "StreamGraph cannot be null.");
checkNotNull(
configuration.get(DeploymentOptions.TARGET),
"No execution.target specified in your configuration file.");
final PipelineExecutorFactory executorFactory =
executorServiceLoader.getExecutorFactory(configuration);
checkNotNull(
executorFactory,
"Cannot find compatible factory for specified execution.target (=%s)",
configuration.get(DeploymentOptions.TARGET));
CompletableFuture<JobClient> jobClientFuture =
executorFactory
.getExecutor(configuration)
// 进入execute()方法
.execute(streamGraph, configuration, userClassloader);
try {
JobClient jobClient = jobClientFuture.get();
jobListeners.forEach(jobListener -> jobListener.onJobSubmitted(jobClient, null));
return jobClient;
} catch (ExecutionException executionException) {
final Throwable strippedException =
ExceptionUtils.stripExecutionException(executionException);
jobListeners.forEach(
jobListener -> jobListener.onJobSubmitted(null, strippedException));
throw new FlinkException(
String.format("Failed to execute job '%s'.", streamGraph.getJobName()),
strippedException);
}
}
具体实现类AbstractSessionClusterExecutor per-job的具体实现类是 AbstractJobClusterExecutor
@Override
public CompletableFuture<JobClient> execute(
@Nonnull final Pipeline pipeline,
@Nonnull final Configuration configuration,
@Nonnull final ClassLoader userCodeClassloader)
throws Exception {
// 这里是将StreamGraph 构建完成 变为JobGraph
final JobGraph jobGraph = PipelineExecutorUtils.getJobGraph(pipeline, configuration);
//
try (final ClusterDescriptor<ClusterID> clusterDescriptor =
clusterClientFactory.createClusterDescriptor(configuration)) {
final ClusterID clusterID = clusterClientFactory.getClusterId(configuration);
checkState(clusterID != null);
// 创建RestClusterClient的Provider ClusterClientprovider
// 初始化RestClient RestClient初始化的时候会创建netty客户端
// 提交job的netty客户端就是 RestCLusterClient
// 接收job的服务端就是jobManager中的WebMonitorEndpoint中启动的netty服务
final ClusterClientProvider<ClusterID> clusterClientProvider =
clusterDescriptor.retrieve(clusterID);
ClusterClient<ClusterID> clusterClient = clusterClientProvider.getClusterClient();
// 这里进行提交 RestClusterClient
return clusterClient
// 调用RestClient内部的netty客户端进行提交 进入submit()方法
.submitJob(jobGraph)
.thenApplyAsync(
FunctionUtils.uncheckedFunction(
jobId -> {
ClientUtils.waitUntilJobInitializationFinished(
() -> clusterClient.getJobStatus(jobId).get(),
() -> clusterClient.requestJobResult(jobId).get(),
userCodeClassloader);
return jobId;
}))
.thenApplyAsync(
jobID ->
(JobClient)
new ClusterClientJobClientAdapter<>(
clusterClientProvider,
jobID,
userCodeClassloader))
.whenCompleteAsync((ignored1, ignored2) -> clusterClient.close());
}
}
public CompletableFuture<JobID> submitJob(@Nonnull JobGraph jobGraph) {
CompletableFuture<java.nio.file.Path> jobGraphFileFuture =
// 将jobGraph进行持久化成jobgraphFile
CompletableFuture.supplyAsync(
() -> {
try {
final java.nio.file.Path jobGraphFile =
// 持久化文件名就是 flink-jobgraph.bin 提交jobGraph到flink集群最终运行的就是这个文件,这个文件最终提交给了WebMonitor(JobSubmitHandler)接收请求来执行处理,JobSubmitHandler在执行处理的时候先进行文件的反序列化
Files.createTempFile("flink-jobgraph", ".bin");
try (ObjectOutputStream objectOut =
new ObjectOutputStream(
Files.newOutputStream(jobGraphFile))) {
objectOut.writeObject(jobGraph);
}
return jobGraphFile;
} catch (IOException e) {
throw new CompletionException(
new FlinkException("Failed to serialize JobGraph.", e));
}
},
executorService);
// todo JobGraph进行持久化完成之后,将JobGraphFile添加进上传的文件列表
CompletableFuture<Tuple2<JobSubmitRequestBody, Collection<FileUpload>>> requestFuture =
jobGraphFileFuture.thenApply(
jobGraphFile -> {
List<String> jarFileNames = new ArrayList<>(8);
List<JobSubmitRequestBody.DistributedCacheFile> artifactFileNames =
new ArrayList<>(8);
Collection<FileUpload> filesToUpload = new ArrayList<>(8);
// 将jobGraphFile假如待上传的文件列表
filesToUpload.add(
new FileUpload(
jobGraphFile, RestConstants.CONTENT_TYPE_BINARY));
for (Path jar : jobGraph.getUserJars()) {
jarFileNames.add(jar.getName());
// 上传job所需要的jar
filesToUpload.add(
new FileUpload(
Paths.get(jar.toUri()),
RestConstants.CONTENT_TYPE_JAR));
}
for (Map.Entry<String, DistributedCache.DistributedCacheEntry>
artifacts : jobGraph.getUserArtifacts().entrySet()) {
final Path artifactFilePath =
new Path(artifacts.getValue().filePath);
try {
// Only local artifacts need to be uploaded.
if (!artifactFilePath.getFileSystem().isDistributedFS()) {
artifactFileNames.add(
new JobSubmitRequestBody.DistributedCacheFile(
artifacts.getKey(),
artifactFilePath.getName()));
filesToUpload.add(
new FileUpload(
Paths.get(artifacts.getValue().filePath),
RestConstants.CONTENT_TYPE_BINARY));
}
} catch (IOException e) {
throw new CompletionException(
new FlinkException(
"Failed to get the FileSystem of artifact "
+ artifactFilePath
+ ".",
e));
}
}
// 构建提交任务的请求体,包含对应的资源,JobGraphFile以及对应的jar包依赖
final JobSubmitRequestBody requestBody =
new JobSubmitRequestBody(
jobGraphFile.getFileName().toString(),
jarFileNames,
artifactFileNames);
// 返回一个tuple2 requestBody 和fileToUpload
return Tuple2.of(
requestBody, Collections.unmodifiableCollection(filesToUpload));
});
final CompletableFuture<JobSubmitResponseBody> submissionFuture =
requestFuture.thenCompose(
requestAndFileUploads ->
sendRetriableRequest(
JobSubmitHeaders.getInstance(),
EmptyMessageParameters.getInstance(),
requestAndFileUploads.f0,
requestAndFileUploads.f1,
isConnectionProblemOrServiceUnavailable()));
submissionFuture
.thenCombine(jobGraphFileFuture, (ignored, jobGraphFile) -> jobGraphFile)
.thenAccept(
jobGraphFile -> {
try {
Files.delete(jobGraphFile);
} catch (IOException e) {
LOG.warn("Could not delete temporary file {}.", jobGraphFile, e);
}
});
return submissionFuture
.thenApply(ignore -> jobGraph.getJobID())
.exceptionally(
(Throwable throwable) -> {
throw new CompletionException(
new JobSubmissionException(
jobGraph.getJobID(),
"Failed to submit JobGraph.",
ExceptionUtils.stripCompletionException(throwable)));
});
}
// todo 反序列化得到jobGraph 并提交给Dispatcher
protected CompletableFuture<JobSubmitResponseBody> handleRequest(
@Nonnull HandlerRequest<JobSubmitRequestBody, EmptyMessageParameters> request,
@Nonnull DispatcherGateway gateway)
throws RestHandlerException {
// 从请求体重获取jobGraphFile序列化文件
final Collection<File> uploadedFiles = request.getUploadedFiles();
final Map<String, Path> nameToFile =
uploadedFiles.stream()
.collect(Collectors.toMap(File::getName, Path::fromLocalFile));
if (uploadedFiles.size() != nameToFile.size()) {
throw new RestHandlerException(
String.format(
"The number of uploaded files was %s than the expected count. Expected: %s Actual %s",
uploadedFiles.size() < nameToFile.size() ? "lower" : "higher",
nameToFile.size(),
uploadedFiles.size()),
HttpResponseStatus.BAD_REQUEST);
}
// 获取请求体
final JobSubmitRequestBody requestBody = request.getRequestBody();
if (requestBody.jobGraphFileName == null) {
throw new RestHandlerException(
String.format(
"The %s field must not be omitted or be null.",
JobSubmitRequestBody.FIELD_NAME_JOB_GRAPH),
HttpResponseStatus.BAD_REQUEST);
}
// 将jobGraphFile反序列化得到jobGraph 也就是服务端接收客户端提交的jobGraph
CompletableFuture<JobGraph> jobGraphFuture = loadJobGraph(requestBody, nameToFile);
// 获取job的jar
Collection<Path> jarFiles = getJarFilesToUpload(requestBody.jarFileNames, nameToFile);
// 获取job的依赖jar包
Collection<Tuple2<String, Path>> artifacts =
getArtifactFilesToUpload(requestBody.artifactFileNames, nameToFile);
// 将JobGraph + job jar 和 依赖jar 上传至 BlobServer
CompletableFuture<JobGraph> finalizedJobGraphFuture =
uploadJobGraphFiles(gateway, jobGraphFuture, jarFiles, artifacts, configuration);
// 将封装好的 finalizedJobGraphFuture 上传至Dispater
CompletableFuture<Acknowledge> jobSubmissionFuture =
finalizedJobGraphFuture.thenCompose(
// 由JobSubmitHandler转交给Dispatcher来进行处理 gateway就是dispatcher的代理对象
// 进入gateway.submitJob()
jobGraph -> gateway.submitJob(jobGraph, timeout));
return jobSubmissionFuture.thenCombine(
jobGraphFuture,
(ack, jobGraph) -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID()));
}
private CompletableFuture<JobGraph> loadJobGraph(
JobSubmitRequestBody requestBody, Map<String, Path> nameToFile)
throws MissingFileException {
final Path jobGraphFile =
getPathAndAssertUpload(
requestBody.jobGraphFileName, FILE_TYPE_JOB_GRAPH, nameToFile);
return CompletableFuture.supplyAsync(
() -> {
JobGraph jobGraph;
try (ObjectInputStream objectIn =
new ObjectInputStream(
jobGraphFile.getFileSystem().open(jobGraphFile))) {
jobGraph = (JobGraph) objectIn.readObject();
} catch (Exception e) {
throw new CompletionException(
new RestHandlerException(
"Failed to deserialize JobGraph.",
HttpResponseStatus.BAD_REQUEST,
e));
}
return jobGraph;
},
executor);
}
Dispatcher接收JobGraph并初始化和启动JobMaster
进入gateway.submitJob() 具体实现类是Dispatcher类中的submitJob()
public CompletableFuture<Acknowledge> submitJob(JobGraph jobGraph, Time timeout) {
log.info("Received JobGraph submission {} ({}).", jobGraph.getJobID(), jobGraph.getName());
try {
// 判断jobId是否重复
if (isDuplicateJob(jobGraph.getJobID())) {
return FutureUtils.completedExceptionally(
new DuplicateJobSubmissionException(jobGraph.getJobID()));
} else if (isPartialResourceConfigured(jobGraph)) {
return FutureUtils.completedExceptionally(
new JobSubmissionException(
jobGraph.getJobID(),
"Currently jobs is not supported if parts of the vertices have "
+ "resources configured. The limitation will be removed in future versions."));
} else {
// 提交Job,此时JobGraph所需要的jar都已经上传完毕 JobGraph 会在启动JobMaster的时候 用来构建ExecutionGraph 进入 InternalSunbmitJob()方法
return internalSubmitJob(jobGraph);
}
} catch (FlinkException e) {
return FutureUtils.completedExceptionally(e);
}
}
进入internalSubmitJob(jobGraph);
private CompletableFuture<Acknowledge> internalSubmitJob(JobGraph jobGraph) {
log.info("Submitting job {} ({}).", jobGraph.getJobID(), jobGraph.getName());
final CompletableFuture<Acknowledge> persistAndRunFuture =
// todo 先持久化,然后运行(JobMaster) this::persistAndRunJob
waitForTerminatingJob(jobGraph.getJobID(), jobGraph,
// 进入this::persistAndRunJob()方法
this::persistAndRunJob)
.thenApply(ignored -> Acknowledge.get());
return persistAndRunFuture.handleAsync(
(acknowledge, throwable) -> {
if (throwable != null) {
cleanUpJobData(jobGraph.getJobID(), true);
ClusterEntryPointExceptionUtils.tryEnrichClusterEntryPointError(throwable);
final Throwable strippedThrowable =
ExceptionUtils.stripCompletionException(throwable);
log.error(
"Failed to submit job {}.", jobGraph.getJobID(), strippedThrowable);
throw new CompletionException(
new JobSubmissionException(
jobGraph.getJobID(),
"Failed to submit job.",
strippedThrowable));
} else {
return acknowledge;
}
},
ioExecutor);
}
private void persistAndRunJob(JobGraph jobGraph) throws Exception {
// todo 服务端保存JobGraph此时是将JobGraph持久化到文件系统,比如HDFS 并且返回一个句柄,并将句柄状态保存在zookeeper中
// 与此同时主节点在启动dispatcher时也会启动一个JobGraphStore服务,如果JobGraph里边有未执行完成的JobGraph,会先进行恢复
// jobGraphWriter=DefaultGraphStore
jobGraphWriter.putJobGraph(jobGraph);
runJob(jobGraph, ExecutionType.SUBMISSION);
}
private void runJob(JobGraph jobGraph, ExecutionType executionType) throws Exception {
Preconditions.checkState(!runningJobs.containsKey(jobGraph.getJobID()));
long initializationTimestamp = System.currentTimeMillis();
// 创建JobManagerRunner启动器,内部会初始化DefaultJobMasterServiceProcessFactory对象
// 创建JobMaster实例
// 在创建JobMaster的时候同时会把JobGraph构建成ExectionGraph
JobManagerRunner jobManagerRunner =
createJobManagerRunner(jobGraph, initializationTimestamp);
// flink集群有两个主从架构:
// 1.资源管理器ResourceManager + taskExecutor
// 2.任务运行 JobMaster + StreamTask
runningJobs.put(jobGraph.getJobID(), jobManagerRunner);
final JobID jobId = jobGraph.getJobID();
final CompletableFuture<CleanupJobState> cleanupJobStateFuture =
jobManagerRunner
.getResultFuture()
.handleAsync(
(jobManagerRunnerResult, throwable) -> {
Preconditions.checkState(
runningJobs.get(jobId) == jobManagerRunner,
"The job entry in runningJobs must be bound to the lifetime of the JobManagerRunner.");
if (jobManagerRunnerResult != null) {
return handleJobManagerRunnerResult(
jobManagerRunnerResult, executionType);
} else {
return jobManagerRunnerFailed(jobId, throwable);
}
},
getMainThreadExecutor());
final CompletableFuture<Void> jobTerminationFuture =
cleanupJobStateFuture
.thenApply(cleanupJobState -> removeJob(jobId, cleanupJobState))
.thenCompose(Function.identity());
FutureUtils.assertNoException(jobTerminationFuture);
registerJobManagerRunnerTerminationFuture(jobId, jobTerminationFuture);
}
JobManagerRunner createJobManagerRunner(JobGraph jobGraph, long initializationTimestamp)
throws Exception {
final RpcService rpcService = getRpcService();
// todo 构建JobManagerRunner 内部封装了一个DefaultJobMasterServiceProcessFactory
// 此对象内部会在后面leader竞选完成后构建JobMaster并启动
JobManagerRunner runner =
jobManagerRunnerFactory.createJobManagerRunner(
jobGraph,
configuration,
rpcService,
highAvailabilityServices,
heartbeatServices,
jobManagerSharedServices,
new DefaultJobManagerJobMetricGroupFactory(jobManagerMetricGroup),
fatalErrorHandler,
initializationTimestamp);
// 开始进行JobMaster的选举,选举成功后会在ZookeeperLeaderElectionDriver的isLeader()创建JobMaster
runner.start();
return runner;
}
public JobManagerRunner createJobManagerRunner(
JobGraph jobGraph,
Configuration configuration,
RpcService rpcService,
HighAvailabilityServices highAvailabilityServices,
HeartbeatServices heartbeatServices,
JobManagerSharedServices jobManagerServices,
JobManagerJobMetricGroupFactory jobManagerJobMetricGroupFactory,
FatalErrorHandler fatalErrorHandler,
long initializationTimestamp)
throws Exception {
checkArgument(jobGraph.getNumberOfVertices() > 0, "The given job is empty");
final JobMasterConfiguration jobMasterConfiguration =
JobMasterConfiguration.fromConfiguration(configuration);
final RunningJobsRegistry runningJobsRegistry =
highAvailabilityServices.getRunningJobsRegistry();
// todo 获取选举服务,准本进行JobMaster的leader选举
final LeaderElectionService jobManagerLeaderElectionService =
highAvailabilityServices.getJobManagerLeaderElectionService(jobGraph.getJobID());
final SlotPoolServiceSchedulerFactory slotPoolServiceSchedulerFactory =
DefaultSlotPoolServiceSchedulerFactory.fromConfiguration(
configuration, jobGraph.getJobType());
if (jobMasterConfiguration.getConfiguration().get(JobManagerOptions.SCHEDULER_MODE)
== SchedulerExecutionMode.REACTIVE) {
Preconditions.checkState(
slotPoolServiceSchedulerFactory.getSchedulerType()
== JobManagerOptions.SchedulerType.Adaptive,
"Adaptive Scheduler is required for reactive mode");
}
final ShuffleMaster<?> shuffleMaster =
ShuffleServiceLoader.loadShuffleServiceFactory(configuration)
.createShuffleMaster(configuration);
final LibraryCacheManager.ClassLoaderLease classLoaderLease =
jobManagerServices
.getLibraryCacheManager()
.registerClassLoaderLease(jobGraph.getJobID());
final ClassLoader userCodeClassLoader =
classLoaderLease
.getOrResolveClassLoader(
jobGraph.getUserJarBlobKeys(), jobGraph.getClasspaths())
.asClassLoader();
// 构建DefaultJobMasterServiceFactory,封装了JobMaster启动所需的基础服务
final DefaultJobMasterServiceFactory jobMasterServiceFactory =
new DefaultJobMasterServiceFactory(
jobManagerServices.getScheduledExecutorService(),
rpcService,
jobMasterConfiguration,
jobGraph,
highAvailabilityServices,
slotPoolServiceSchedulerFactory,
jobManagerServices,
heartbeatServices,
jobManagerJobMetricGroupFactory,
fatalErrorHandler,
userCodeClassLoader,
shuffleMaster,
initializationTimestamp);
final DefaultJobMasterServiceProcessFactory jobMasterServiceProcessFactory =
new DefaultJobMasterServiceProcessFactory(
jobGraph.getJobID(),
jobGraph.getName(),
jobGraph.getCheckpointingSettings(),
initializationTimestamp,
jobMasterServiceFactory);
return new JobMasterServiceLeadershipRunner(
jobMasterServiceProcessFactory,
jobManagerLeaderElectionService,
runningJobsRegistry,
classLoaderLease,
fatalErrorHandler);
}
JobMaster的Leader的选举
在选举完成成功后则会回调当前类的isLeader方法,直接查看onGrantLeadership()
@Override
public void isLeader() {
leaderElectionEventHandler.onGrantLeadership();
}
@Override
@GuardedBy("lock")
public void onGrantLeadership() {
synchronized (lock) {
if (running) {
issuedLeaderSessionID = UUID.randomUUID();
clearConfirmedLeaderInformation();
if (LOG.isDebugEnabled()) {
LOG.debug(
"Grant leadership to contender {} with session ID {}.",
leaderContender.getDescription(),
issuedLeaderSessionID);
}
/**
leaderContender有四种情况
1.Dispatcher=DefaultDispatcherRuuner
2.JobMaster=JobMasterServiceLeadershipRunner
3.ResourceManager=ResourceManager
4.WebmonitorEndpoint=WebmonitorEndpoint
*/
leaderContender.grantLeadership(issuedLeaderSessionID);
} else {
if (LOG.isDebugEnabled()) {
LOG.debug(
"Ignoring the grant leadership notification since the {} has "
+ "already been closed.",
leaderElectionDriver);
}
}
}
}
选择JobMasterServiceLeadershipRunner实现
@Override
public void grantLeadership(UUID leaderSessionID) {
runIfStateRunning(
() -> startJobMasterServiceProcessAsync(leaderSessionID),
"starting a new JobMasterServiceProcess");
}
进入startJobMasterServiceProcessAsync()
@GuardedBy("lock")
private void startJobMasterServiceProcessAsync(UUID leaderSessionId) {
sequentialOperation =
sequentialOperation.thenRun(
// 校验leader状态
() ->
runIfValidLeader(
leaderSessionId,
ThrowingRunnable.unchecked(
// 创建JobMaster并启动
() ->
verifyJobSchedulingStatusAndCreateJobMasterServiceProcess(
leaderSessionId)),
"verify job scheduling status and create JobMasterServiceProcess"));
handleAsyncOperationError(sequentialOperation, "Could not start the job manager.");
}
@GuardedBy("lock")
private void verifyJobSchedulingStatusAndCreateJobMasterServiceProcess(UUID leaderSessionId)
throws FlinkException {
final RunningJobsRegistry.JobSchedulingStatus jobSchedulingStatus =
getJobSchedulingStatus();
if (jobSchedulingStatus == RunningJobsRegistry.JobSchedulingStatus.DONE) {
jobAlreadyDone();
} else {
// 创建JobMaster并启动
createNewJobMasterServiceProcess(leaderSessionId);
}
}
@GuardedBy("lock")
private void createNewJobMasterServiceProcess(UUID leaderSessionId) throws FlinkException {
Preconditions.checkState(jobMasterServiceProcess.closeAsync().isDone());
LOG.debug(
"Create new JobMasterServiceProcess because we were granted leadership under {}.",
leaderSessionId);
try {
// 状态注册,标识当前job为Running状态
runningJobsRegistry.setJobRunning(getJobID());
} catch (IOException e) {
throw new FlinkException(
String.format(
"Failed to set the job %s to running in the running jobs registry.",
getJobID()),
e);
}
// 创建并启动JobMaster 进入create()
jobMasterServiceProcess = jobMasterServiceProcessFactory.create(leaderSessionId);
forwardIfValidLeader(
leaderSessionId,
jobMasterServiceProcess.getJobMasterGatewayFuture(),
jobMasterGatewayFuture,
"JobMasterGatewayFuture from JobMasterServiceProcess");
forwardResultFuture(leaderSessionId, jobMasterServiceProcess.getResultFuture());
confirmLeadership(leaderSessionId, jobMasterServiceProcess.getLeaderAddressFuture());
}
@Override
public JobMasterServiceProcess create(UUID leaderSessionId) {
// 查看DefaultJobMasterServiceProcess的构造方法
return new DefaultJobMasterServiceProcess(
jobId,
leaderSessionId,
jobMasterServiceFactory,
cause -> createArchivedExecutionGraph(JobStatus.FAILED, cause));
}
public DefaultJobMasterServiceProcess(
JobID jobId,
UUID leaderSessionId,
JobMasterServiceFactory jobMasterServiceFactory,
Function<Throwable, ArchivedExecutionGraph> failedArchivedExecutionGraphFactory) {
this.jobId = jobId;
this.leaderSessionId = leaderSessionId;
// 创建JobMaster
this.jobMasterServiceFuture =
jobMasterServiceFactory.createJobMasterService(leaderSessionId, this);
jobMasterServiceFuture.whenComplete(
(jobMasterService, throwable) -> {
if (throwable != null) {
final JobInitializationException jobInitializationException =
new JobInitializationException(
jobId, "Could not start the JobMaster.", throwable);
LOG.debug(
"Initialization of the JobMasterService for job {} under leader id {} failed.",
jobId,
leaderSessionId,
jobInitializationException);
resultFuture.complete(
JobManagerRunnerResult.forInitializationFailure(
new ExecutionGraphInfo(
failedArchivedExecutionGraphFactory.apply(
jobInitializationException)),
jobInitializationException));
} else {
registerJobMasterServiceFutures(jobMasterService);
}
});
}
@Override
public CompletableFuture<JobMasterService> createJobMasterService(
UUID leaderSessionId, OnCompletionActions onCompletionActions) {
return CompletableFuture.supplyAsync(
FunctionUtils.uncheckedSupplier(
// 内部构建JobMaster 进入internalCreateJobMasterService()
() -> internalCreateJobMasterService(leaderSessionId, onCompletionActions)),
executor);
}
JobMaster的初始化和启动
private JobMasterService internalCreateJobMasterService(
UUID leaderSessionId, OnCompletionActions onCompletionActions) throws Exception {
final JobMaster jobMaster =
new JobMaster(
rpcService,
JobMasterId.fromUuidOrNull(leaderSessionId),
jobMasterConfiguration,
ResourceID.generate(),
jobGraph,
haServices,
slotPoolServiceSchedulerFactory,
jobManagerSharedServices,
heartbeatServices,
jobManagerJobMetricGroupFactory,
onCompletionActions,
fatalErrorHandler,
userCodeClassloader,
shuffleMaster,
lookup ->
new JobMasterPartitionTrackerImpl(
jobGraph.getJobID(), shuffleMaster, lookup),
new DefaultExecutionDeploymentTracker(),
DefaultExecutionDeploymentReconciler::new,
initializationTimestamp);
// JobMaster继承自Endpoint 所以在初始化完成后会回调JobMaster的onStart()方法
jobMaster.start();
return jobMaster;
}
进入JobMaster的构造方法
/**
1.向ResourceManager注册并保持心跳连接
2.解析JobGraph 得到ExecutionGraph 由之前的图不难看出ExecutionGraph就是JobGraph的并行化版本
3.JobMaster负责向ResourceManager去申请slot(一个slot启动一个streamTask)
4.派发任务运行并监控他们的状态
5.维持JobMaster和streamTask之间的心跳
6.JobMaster还需要进行zookeeper的相关操作
*/
public JobMaster(
RpcService rpcService,
JobMasterId jobMasterId,
JobMasterConfiguration jobMasterConfiguration,
ResourceID resourceId,
JobGraph jobGraph,
HighAvailabilityServices highAvailabilityService,
SlotPoolServiceSchedulerFactory slotPoolServiceSchedulerFactory,
JobManagerSharedServices jobManagerSharedServices,
HeartbeatServices heartbeatServices,
JobManagerJobMetricGroupFactory jobMetricGroupFactory,
OnCompletionActions jobCompletionActions,
FatalErrorHandler fatalErrorHandler,
ClassLoader userCodeLoader,
ShuffleMaster<?> shuffleMaster,
PartitionTrackerFactory partitionTrackerFactory,
ExecutionDeploymentTracker executionDeploymentTracker,
ExecutionDeploymentReconciler.Factory executionDeploymentReconcilerFactory,
long initializationTimestamp)
throws Exception {
// 开启RPC服务
super(rpcService, AkkaRpcServiceUtils.createRandomName(JOB_MANAGER_NAME), jobMasterId);
final ExecutionDeploymentReconciliationHandler executionStateReconciliationHandler =
new ExecutionDeploymentReconciliationHandler() {
@Override
public void onMissingDeploymentsOf(
Collection<ExecutionAttemptID> executionAttemptIds, ResourceID host) {
log.debug(
"Failing deployments {} due to no longer being deployed.",
executionAttemptIds);
for (ExecutionAttemptID executionAttemptId : executionAttemptIds) {
schedulerNG.updateTaskExecutionState(
new TaskExecutionState(
executionAttemptId,
ExecutionState.FAILED,
new FlinkException(
String.format(
"Execution %s is unexpectedly no longer running on task executor %s.",
executionAttemptId, host))));
}
}
@Override
public void onUnknownDeploymentsOf(
Collection<ExecutionAttemptID> executionAttemptIds, ResourceID host) {
log.debug(
"Canceling left-over deployments {} on task executor {}.",
executionAttemptIds,
host);
for (ExecutionAttemptID executionAttemptId : executionAttemptIds) {
Tuple2<TaskManagerLocation, TaskExecutorGateway> taskManagerInfo =
registeredTaskManagers.get(host);
if (taskManagerInfo != null) {
taskManagerInfo.f1.cancelTask(executionAttemptId, rpcTimeout);
}
}
}
};
this.executionDeploymentTracker = executionDeploymentTracker;
this.executionDeploymentReconciler =
executionDeploymentReconcilerFactory.create(executionStateReconciliationHandler);
this.jobMasterConfiguration = checkNotNull(jobMasterConfiguration);
this.resourceId = checkNotNull(resourceId);
// 保存JobGraph到JobMaster
this.jobGraph = checkNotNull(jobGraph);
this.rpcTimeout = jobMasterConfiguration.getRpcTimeout();
this.highAvailabilityServices = checkNotNull(highAvailabilityService);
this.blobWriter = jobManagerSharedServices.getBlobWriter();
this.scheduledExecutorService = jobManagerSharedServices.getScheduledExecutorService();
this.jobCompletionActions = checkNotNull(jobCompletionActions);
this.fatalErrorHandler = checkNotNull(fatalErrorHandler);
this.userCodeLoader = checkNotNull(userCodeLoader);
this.initializationTimestamp = initializationTimestamp;
this.retrieveTaskManagerHostName =
jobMasterConfiguration
.getConfiguration()
.getBoolean(JobManagerOptions.RETRIEVE_TASK_MANAGER_HOSTNAME);
final String jobName = jobGraph.getName();
final JobID jid = jobGraph.getJobID();
log.info("Initializing job {} ({}).", jobName, jid);
// ResourceManager leader地址接收器
resourceManagerLeaderRetriever =
highAvailabilityServices.getResourceManagerLeaderRetriever();
// 创建slotPoolService:负责该job的Slot的申请和释放等slot的管理工作
this.slotPoolService =
checkNotNull(slotPoolServiceSchedulerFactory).createSlotPoolService(jid);
this.registeredTaskManagers = new HashMap<>(4);
this.partitionTracker =
checkNotNull(partitionTrackerFactory)
.create(
resourceID -> {
Tuple2<TaskManagerLocation, TaskExecutorGateway>
taskManagerInfo =
registeredTaskManagers.get(resourceID);
if (taskManagerInfo == null) {
return Optional.empty();
}
return Optional.of(taskManagerInfo.f1);
});
this.shuffleMaster = checkNotNull(shuffleMaster);
this.jobManagerJobMetricGroup = jobMetricGroupFactory.create(jobGraph);
this.jobStatusListener = new JobManagerJobStatusListener();
// todo 内部会将JobGraph变成ExecutionGraph
this.schedulerNG =
createScheduler(
slotPoolServiceSchedulerFactory,
executionDeploymentTracker,
jobManagerJobMetricGroup,
jobStatusListener);
this.heartbeatServices = checkNotNull(heartbeatServices);
this.taskManagerHeartbeatManager = NoOpHeartbeatManager.getInstance();
this.resourceManagerHeartbeatManager = NoOpHeartbeatManager.getInstance();
this.resourceManagerConnection = null;
this.establishedResourceManagerConnection = null;
this.accumulators = new HashMap<>();
}
进入createScheduler()
private SchedulerNG createScheduler(
SlotPoolServiceSchedulerFactory slotPoolServiceSchedulerFactory,
ExecutionDeploymentTracker executionDeploymentTracker,
JobManagerJobMetricGroup jobManagerJobMetricGroup,
JobStatusListener jobStatusListener)
throws Exception {
final SchedulerNG scheduler =
// 进入createSheduler()
slotPoolServiceSchedulerFactory.createScheduler(
log,
jobGraph,
scheduledExecutorService,
jobMasterConfiguration.getConfiguration(),
slotPoolService,
scheduledExecutorService,
userCodeLoader,
highAvailabilityServices.getCheckpointRecoveryFactory(),
rpcTimeout,
blobWriter,
jobManagerJobMetricGroup,
jobMasterConfiguration.getSlotRequestTimeout(),
shuffleMaster,
partitionTracker,
executionDeploymentTracker,
initializationTimestamp,
getMainThreadExecutor(),
fatalErrorHandler,
jobStatusListener);
return scheduler;
}
在SlotPoolServiceSchedulerFactory接口中
SchedulerNG createScheduler(
Logger log,
JobGraph jobGraph,
ScheduledExecutorService scheduledExecutorService,
Configuration configuration,
SlotPoolService slotPoolService,
ScheduledExecutorService executorService,
ClassLoader userCodeLoader,
CheckpointRecoveryFactory checkpointRecoveryFactory,
Time rpcTimeout,
BlobWriter blobWriter,
JobManagerJobMetricGroup jobManagerJobMetricGroup,
Time slotRequestTimeout,
ShuffleMaster<?> shuffleMaster,
JobMasterPartitionTracker partitionTracker,
ExecutionDeploymentTracker executionDeploymentTracker,
long initializationTimestamp,
ComponentMainThreadExecutor mainThreadExecutor,
FatalErrorHandler fatalErrorHandler,
JobStatusListener jobStatusListener)
throws Exception;
}
public SchedulerNG createScheduler(
Logger log,
JobGraph jobGraph,
ScheduledExecutorService scheduledExecutorService,
Configuration configuration,
SlotPoolService slotPoolService,
ScheduledExecutorService executorService,
ClassLoader userCodeLoader,
CheckpointRecoveryFactory checkpointRecoveryFactory,
Time rpcTimeout,
BlobWriter blobWriter,
JobManagerJobMetricGroup jobManagerJobMetricGroup,
Time slotRequestTimeout,
ShuffleMaster<?> shuffleMaster,
JobMasterPartitionTracker partitionTracker,
ExecutionDeploymentTracker executionDeploymentTracker,
long initializationTimestamp,
ComponentMainThreadExecutor mainThreadExecutor,
FatalErrorHandler fatalErrorHandler,
JobStatusListener jobStatusListener)
throws Exception {
// 进入createInstance()
return schedulerNGFactory.createInstance(
log,
jobGraph,
scheduledExecutorService,
configuration,
slotPoolService,
executorService,
userCodeLoader,
checkpointRecoveryFactory,
rpcTimeout,
blobWriter,
jobManagerJobMetricGroup,
slotRequestTimeout,
shuffleMaster,
partitionTracker,
executionDeploymentTracker,
initializationTimestamp,
mainThreadExecutor,
fatalErrorHandler,
jobStatusListener);
}
进schedulerNGFactory.createInstance方法,选择DefaultSchedulerFactory实现
public SchedulerNG createInstance(
final Logger log,
final JobGraph jobGraph,
final Executor ioExecutor,
final Configuration jobMasterConfiguration,
final SlotPoolService slotPoolService,
final ScheduledExecutorService futureExecutor,
final ClassLoader userCodeLoader,
final CheckpointRecoveryFactory checkpointRecoveryFactory,
final Time rpcTimeout,
final BlobWriter blobWriter,
final JobManagerJobMetricGroup jobManagerJobMetricGroup,
final Time slotRequestTimeout,
final ShuffleMaster<?> shuffleMaster,
final JobMasterPartitionTracker partitionTracker,
final ExecutionDeploymentTracker executionDeploymentTracker,
long initializationTimestamp,
final ComponentMainThreadExecutor mainThreadExecutor,
final FatalErrorHandler fatalErrorHandler,
final JobStatusListener jobStatusListener)
throws Exception {
final SlotPool slotPool =
slotPoolService
.castInto(SlotPool.class)
.orElseThrow(
() ->
new IllegalStateException(
"The DefaultScheduler requires a SlotPool."));
final DefaultSchedulerComponents schedulerComponents =
createSchedulerComponents(
jobGraph.getJobType(),
jobGraph.isApproximateLocalRecoveryEnabled(),
jobMasterConfiguration,
slotPool,
slotRequestTimeout);
final RestartBackoffTimeStrategy restartBackoffTimeStrategy =
RestartBackoffTimeStrategyFactoryLoader.createRestartBackoffTimeStrategyFactory(
jobGraph.getSerializedExecutionConfig()
.deserializeValue(userCodeLoader)
.getRestartStrategy(),
jobMasterConfiguration,
jobGraph.isCheckpointingEnabled())
.create();
log.info(
"Using restart back off time strategy {} for {} ({}).",
restartBackoffTimeStrategy,
jobGraph.getName(),
jobGraph.getJobID());
final ExecutionGraphFactory executionGraphFactory =
new DefaultExecutionGraphFactory(
jobMasterConfiguration,
userCodeLoader,
executionDeploymentTracker,
futureExecutor,
ioExecutor,
rpcTimeout,
jobManagerJobMetricGroup,
blobWriter,
shuffleMaster,
partitionTracker);
// DefaultScheduler的构造方法,并进入父类的构造方法
return new DefaultScheduler(
log,
jobGraph,
ioExecutor,
jobMasterConfiguration,
schedulerComponents.getStartUpAction(),
new ScheduledExecutorServiceAdapter(futureExecutor),
userCodeLoader,
checkpointRecoveryFactory,
jobManagerJobMetricGroup,
schedulerComponents.getSchedulingStrategyFactory(),
FailoverStrategyFactoryLoader.loadFailoverStrategyFactory(jobMasterConfiguration),
restartBackoffTimeStrategy,
new DefaultExecutionVertexOperations(),
new ExecutionVertexVersioner(),
schedulerComponents.getAllocatorFactory(),
initializationTimestamp,
mainThreadExecutor,
jobStatusListener,
executionGraphFactory);
}
DefaultScheduler的构造方法,并进入父类的构造方法
public SchedulerBase(
final Logger log,
final JobGraph jobGraph,
final Executor ioExecutor,
final Configuration jobMasterConfiguration,
final ClassLoader userCodeLoader,
final CheckpointRecoveryFactory checkpointRecoveryFactory,
final JobManagerJobMetricGroup jobManagerJobMetricGroup,
final ExecutionVertexVersioner executionVertexVersioner,
long initializationTimestamp,
final ComponentMainThreadExecutor mainThreadExecutor,
final JobStatusListener jobStatusListener,
final ExecutionGraphFactory executionGraphFactory)
throws Exception {
this.log = checkNotNull(log);
this.jobGraph = checkNotNull(jobGraph);
this.executionGraphFactory = executionGraphFactory;
this.jobManagerJobMetricGroup = checkNotNull(jobManagerJobMetricGroup);
this.executionVertexVersioner = checkNotNull(executionVertexVersioner);
this.mainThreadExecutor = mainThreadExecutor;
this.checkpointsCleaner = new CheckpointsCleaner();
this.completedCheckpointStore =
SchedulerUtils.createCompletedCheckpointStoreIfCheckpointingIsEnabled(
jobGraph,
jobMasterConfiguration,
userCodeLoader,
checkNotNull(checkpointRecoveryFactory),
log);
this.checkpointIdCounter =
SchedulerUtils.createCheckpointIDCounterIfCheckpointingIsEnabled(
jobGraph, checkNotNull(checkpointRecoveryFactory));
// JobGraph向ExecutionGraph转换
// 此处入参没有JobGraph 本质上JobGraph已经是实例内部的一个成员变量了
this.executionGraph =
createAndRestoreExecutionGraph(
completedCheckpointStore,
checkpointsCleaner,
checkpointIdCounter,
initializationTimestamp,
mainThreadExecutor,
jobStatusListener);
registerShutDownCheckpointServicesOnExecutionGraphTermination(executionGraph);
this.schedulingTopology = executionGraph.getSchedulingTopology();
stateLocationRetriever =
executionVertexId ->
getExecutionVertex(executionVertexId).getPreferredLocationBasedOnState();
inputsLocationsRetriever =
new ExecutionGraphToInputsLocationsRetrieverAdapter(executionGraph);
this.kvStateHandler = new KvStateHandler(executionGraph);
this.executionGraphHandler =
new ExecutionGraphHandler(executionGraph, log, ioExecutor, this.mainThreadExecutor);
this.operatorCoordinatorHandler =
new DefaultOperatorCoordinatorHandler(executionGraph, this::handleGlobalFailure);
operatorCoordinatorHandler.initializeOperatorCoordinators(this.mainThreadExecutor);
this.exceptionHistory =
new BoundedFIFOQueue<>(
jobMasterConfiguration.getInteger(WebOptions.MAX_EXCEPTION_HISTORY_SIZE));
}
// TODO JobGraph向ExecutionGraph的转换
// TODO 此处入参没有JobGraph是因为JobGraph已经是实例内部的一个成员变量了
this.executionGraph =
createAndRestoreExecutionGraph(
completedCheckpointStore,
checkpointsCleaner,
checkpointIdCounter,
initializationTimestamp,
mainThreadExecutor,
jobStatusListener);
private ExecutionGraph createAndRestoreExecutionGraph(
CompletedCheckpointStore completedCheckpointStore,
CheckpointsCleaner checkpointsCleaner,
CheckpointIDCounter checkpointIdCounter,
long initializationTimestamp,
ComponentMainThreadExecutor mainThreadExecutor,
JobStatusListener jobStatusListener)
throws Exception {
// 创建或恢复ExecutionGraph 进入createAndRestoreExecutionGraph方法
final ExecutionGraph newExecutionGraph =
executionGraphFactory.createAndRestoreExecutionGraph(
jobGraph,
completedCheckpointStore,
checkpointsCleaner,
checkpointIdCounter,
TaskDeploymentDescriptorFactory.PartitionLocationConstraint.fromJobType(
jobGraph.getJobType()),
initializationTimestamp,
new DefaultVertexAttemptNumberStore(),
computeVertexParallelismStore(jobGraph),
log);
newExecutionGraph.setInternalTaskFailuresListener(
new UpdateSchedulerNgOnInternalFailuresListener(this));
newExecutionGraph.registerJobStatusListener(jobStatusListener);
newExecutionGraph.start(mainThreadExecutor);
return newExecutionGraph;
}
进入createAndRestoreExecutionGraph方法 在ExecutionGraphFactory接口内
ExecutionGraph createAndRestoreExecutionGraph(
JobGraph jobGraph,
CompletedCheckpointStore completedCheckpointStore,
CheckpointsCleaner checkpointsCleaner,
CheckpointIDCounter checkpointIdCounter,
TaskDeploymentDescriptorFactory.PartitionLocationConstraint partitionLocationConstraint,
long initializationTimestamp,
VertexAttemptNumberStore vertexAttemptNumberStore,
VertexParallelismStore vertexParallelismStore,
Logger log)
throws Exception;
}
具体实现:
@Override
public ExecutionGraph createAndRestoreExecutionGraph(
JobGraph jobGraph,
CompletedCheckpointStore completedCheckpointStore,
CheckpointsCleaner checkpointsCleaner,
CheckpointIDCounter checkpointIdCounter,
TaskDeploymentDescriptorFactory.PartitionLocationConstraint partitionLocationConstraint,
long initializationTimestamp,
VertexAttemptNumberStore vertexAttemptNumberStore,
VertexParallelismStore vertexParallelismStore,
Logger log)
throws Exception {
ExecutionDeploymentListener executionDeploymentListener =
new ExecutionDeploymentTrackerDeploymentListenerAdapter(executionDeploymentTracker);
ExecutionStateUpdateListener executionStateUpdateListener =
(execution, newState) -> {
if (newState.isTerminal()) {
executionDeploymentTracker.stopTrackingDeploymentOf(execution);
}
};
// 将JobGraph 转成成ExecutionGraph 进入builderGraph()方法
final ExecutionGraph newExecutionGraph =
DefaultExecutionGraphBuilder.buildGraph(
jobGraph,
configuration,
futureExecutor,
ioExecutor,
userCodeClassLoader,
completedCheckpointStore,
checkpointsCleaner,
checkpointIdCounter,
rpcTimeout,
jobManagerJobMetricGroup,
blobWriter,
log,
shuffleMaster,
jobMasterPartitionTracker,
partitionLocationConstraint,
executionDeploymentListener,
executionStateUpdateListener,
initializationTimestamp,
vertexAttemptNumberStore,
vertexParallelismStore);
// 恢复ExecutionGraph
final CheckpointCoordinator checkpointCoordinator =
newExecutionGraph.getCheckpointCoordinator();
if (checkpointCoordinator != null) {
// check whether we find a valid checkpoint
if (!checkpointCoordinator.restoreInitialCheckpointIfPresent(
new HashSet<>(newExecutionGraph.getAllVertices().values()))) {
// check whether we can restore from a savepoint
tryRestoreExecutionGraphFromSavepoint(
newExecutionGraph, jobGraph.getSavepointRestoreSettings());
}
}
return newExecutionGraph;
}
进入DefaultExecutionGraphBuilder.buildGraph()
ExecutionGraph空壳的初始化
public static DefaultExecutionGraph buildGraph(
JobGraph jobGraph,
Configuration jobManagerConfig,
ScheduledExecutorService futureExecutor,
Executor ioExecutor,
ClassLoader classLoader,
CompletedCheckpointStore completedCheckpointStore,
CheckpointsCleaner checkpointsCleaner,
CheckpointIDCounter checkpointIdCounter,
Time rpcTimeout,
MetricGroup metrics,
BlobWriter blobWriter,
Logger log,
ShuffleMaster<?> shuffleMaster,
JobMasterPartitionTracker partitionTracker,
TaskDeploymentDescriptorFactory.PartitionLocationConstraint partitionLocationConstraint,
ExecutionDeploymentListener executionDeploymentListener,
ExecutionStateUpdateListener executionStateUpdateListener,
long initializationTimestamp,
VertexAttemptNumberStore vertexAttemptNumberStore,
VertexParallelismStore vertexParallelismStore)
throws JobExecutionException, JobException {
checkNotNull(jobGraph, "job graph cannot be null");
final String jobName = jobGraph.getName();
final JobID jobId = jobGraph.getJobID();
final JobInformation jobInformation =
new JobInformation(
jobId,
jobName,
jobGraph.getSerializedExecutionConfig(),
jobGraph.getJobConfiguration(),
jobGraph.getUserJarBlobKeys(),
jobGraph.getClasspaths());
final int maxPriorAttemptsHistoryLength =
jobManagerConfig.getInteger(JobManagerOptions.MAX_ATTEMPTS_HISTORY_SIZE);
final PartitionReleaseStrategy.Factory partitionReleaseStrategyFactory =
PartitionReleaseStrategyFactoryLoader.loadPartitionReleaseStrategyFactory(
jobManagerConfig);
// create a new execution graph, if none exists so far
final DefaultExecutionGraph executionGraph;
try {
// 开始初始化ExecutionGraph的空壳
executionGraph =
new DefaultExecutionGraph(
jobInformation,
futureExecutor,
ioExecutor,
rpcTimeout,
maxPriorAttemptsHistoryLength,
classLoader,
blobWriter,
partitionReleaseStrategyFactory,
shuffleMaster,
partitionTracker,
partitionLocationConstraint,
executionDeploymentListener,
executionStateUpdateListener,
initializationTimestamp,
vertexAttemptNumberStore,
vertexParallelismStore);
} catch (IOException e) {
throw new JobException("Could not create the ExecutionGraph.", e);
}
// set the basic properties
try {
// 将JobGraph变成Json形式
executionGraph.setJsonPlan(JsonPlanGenerator.generatePlan(jobGraph));
} catch (Throwable t) {
log.warn("Cannot create JSON plan for job", t);
// give the graph an empty plan
executionGraph.setJsonPlan("{}");
}
// initialize the vertices that have a master initialization hook
// file output formats create directories here, input formats create splits
final long initMasterStart = System.nanoTime();
log.info("Running initialization on master for job {} ({}).", jobName, jobId);
// 遍历Jobgraph中的所有端点,看是否有启动类
for (JobVertex vertex : jobGraph.getVertices()) {
String executableClass = vertex.getInvokableClassName();
if (executableClass == null || executableClass.isEmpty()) {
throw new JobSubmissionException(
jobId,
"The vertex "
+ vertex.getID()
+ " ("
+ vertex.getName()
+ ") has no invokable class.");
}
try {
vertex.initializeOnMaster(classLoader);
} catch (Throwable t) {
throw new JobExecutionException(
jobId,
"Cannot initialize task '" + vertex.getName() + "': " + t.getMessage(),
t);
}
}
log.info(
"Successfully ran initialization on master in {} ms.",
(System.nanoTime() - initMasterStart) / 1_000_000);
// topologically sort the job vertices and attach the graph to the existing one
// 按照顺序将JobGraph中的端点放入集合中
List<JobVertex> sortedTopology = jobGraph.getVerticesSortedTopologicallyFromSources();
if (log.isDebugEnabled()) {
log.debug(
"Adding {} vertices from job graph {} ({}).",
sortedTopology.size(),
jobName,
jobId);
}
// 最重要的工作,生成ExecutionJobVertex,以及并行化,根据并行度生成多个ExecutionVertex
executionGraph.attachJobGraph(sortedTopology);
if (log.isDebugEnabled()) {
log.debug(
"Successfully created execution graph from job graph {} ({}).", jobName, jobId);
}
// configure the state checkpointing
// 解析checkpoint参数,构建checkpoint相关组件
if (isCheckpointingEnabled(jobGraph)) {
JobCheckpointingSettings snapshotSettings = jobGraph.getCheckpointingSettings();
// Maximum number of remembered checkpoints
int historySize = jobManagerConfig.getInteger(WebOptions.CHECKPOINTS_HISTORY_SIZE);
CheckpointStatsTracker checkpointStatsTracker =
new CheckpointStatsTracker(
historySize,
snapshotSettings.getCheckpointCoordinatorConfiguration(),
metrics);
// load the state backend from the application settings
final StateBackend applicationConfiguredBackend;
final SerializedValue<StateBackend> serializedAppConfigured =
snapshotSettings.getDefaultStateBackend();
if (serializedAppConfigured == null) {
applicationConfiguredBackend = null;
} else {
try {
applicationConfiguredBackend =
serializedAppConfigured.deserializeValue(classLoader);
} catch (IOException | ClassNotFoundException e) {
throw new JobExecutionException(
jobId, "Could not deserialize application-defined state backend.", e);
}
}
final StateBackend rootBackend;
try {
rootBackend =
StateBackendLoader.fromApplicationOrConfigOrDefault(
applicationConfiguredBackend, jobManagerConfig, classLoader, log);
} catch (IllegalConfigurationException | IOException | DynamicCodeLoadingException e) {
throw new JobExecutionException(
jobId, "Could not instantiate configured state backend", e);
}
// load the checkpoint storage from the application settings
final CheckpointStorage applicationConfiguredStorage;
final SerializedValue<CheckpointStorage> serializedAppConfiguredStorage =
snapshotSettings.getDefaultCheckpointStorage();
if (serializedAppConfiguredStorage == null) {
applicationConfiguredStorage = null;
} else {
try {
applicationConfiguredStorage =
serializedAppConfiguredStorage.deserializeValue(classLoader);
} catch (IOException | ClassNotFoundException e) {
throw new JobExecutionException(
jobId,
"Could not deserialize application-defined checkpoint storage.",
e);
}
}
final CheckpointStorage rootStorage;
try {
rootStorage =
CheckpointStorageLoader.load(
applicationConfiguredStorage,
null,
rootBackend,
jobManagerConfig,
classLoader,
log);
} catch (IllegalConfigurationException | DynamicCodeLoadingException e) {
throw new JobExecutionException(
jobId, "Could not instantiate configured checkpoint storage", e);
}
// instantiate the user-defined checkpoint hooks
final SerializedValue<MasterTriggerRestoreHook.Factory[]> serializedHooks =
snapshotSettings.getMasterHooks();
final List<MasterTriggerRestoreHook<?>> hooks;
if (serializedHooks == null) {
hooks = Collections.emptyList();
} else {
final MasterTriggerRestoreHook.Factory[] hookFactories;
try {
hookFactories = serializedHooks.deserializeValue(classLoader);
} catch (IOException | ClassNotFoundException e) {
throw new JobExecutionException(
jobId, "Could not instantiate user-defined checkpoint hooks", e);
}
final Thread thread = Thread.currentThread();
final ClassLoader originalClassLoader = thread.getContextClassLoader();
thread.setContextClassLoader(classLoader);
try {
hooks = new ArrayList<>(hookFactories.length);
for (MasterTriggerRestoreHook.Factory factory : hookFactories) {
hooks.add(MasterHooks.wrapHook(factory.create(), classLoader));
}
} finally {
thread.setContextClassLoader(originalClassLoader);
}
}
final CheckpointCoordinatorConfiguration chkConfig =
snapshotSettings.getCheckpointCoordinatorConfiguration();
executionGraph.enableCheckpointing(
chkConfig,
hooks,
checkpointIdCounter,
completedCheckpointStore,
rootBackend,
rootStorage,
checkpointStatsTracker,
checkpointsCleaner);
}
// create all the metrics for the Execution Graph
metrics.gauge(RestartTimeGauge.METRIC_NAME, new RestartTimeGauge(executionGraph));
metrics.gauge(DownTimeGauge.METRIC_NAME, new DownTimeGauge(executionGraph));
metrics.gauge(UpTimeGauge.METRIC_NAME, new UpTimeGauge(executionGraph));
return executionGraph;
}
ExecutionGraph初始化小结:
1.获取JobGraph拿到job的相关信息
2.初始化一个空的ExecutionGraph对象
3.将JobGraph变成json形式
4.判断JobGraph中所有的顶点是否包含启动类
5.按照顺序将JobGraph的顶点添加进集合中去
6.根据JobVertex以及算子并行度设置生成ExecutionVertex
7.解析checkpoint参数,构建checkpoint相关组件
8.为ExecutionGraph构建监控指标
最为核心的便是JobGraph中的JobVertex转化为ExecutionVertex的过程,我们点进下面这段代码中:
// TODO 最重要的工作,生成executionJobVertex,以及并行化,根据并行度生成多个ExecutionVertex
executionGraph.attachJobGraph(sortedTopology);
进入attachJobGraph()
@Override
public void attachJobGraph(List<JobVertex> topologiallySorted) throws JobException {
assertRunningInJobMasterMainThread();
LOG.debug(
"Attaching {} topologically sorted vertices to existing job graph with {} "
+ "vertices and {} intermediate results.",
topologiallySorted.size(),
tasks.size(),
intermediateResults.size());
final ArrayList<ExecutionJobVertex> newExecJobVertices =
new ArrayList<>(topologiallySorted.size());
final long createTimestamp = System.currentTimeMillis();
// 遍历JobGraph的端点集合
for (JobVertex jobVertex : topologiallySorted) {
if (jobVertex.isInputVertex() && !jobVertex.isStoppable()) {
this.isStoppable = false;
}
VertexParallelismInformation parallelismInfo =
parallelismStore.getParallelismInfo(jobVertex.getID());
// create the execution job vertex and attach it to the graph
// 为每个JobVertex生成一个ExecutionJobVertex
// 进入ExecutionJobVertex查看构建构建过程
ExecutionJobVertex ejv =
new ExecutionJobVertex(
this,
jobVertex,
maxPriorAttemptsHistoryLength,
rpcTimeout,
createTimestamp,
parallelismInfo,
initialAttemptCounts.getAttemptCounts(jobVertex.getID()));
// 高版本的flink(>=1.13)中ExecutionEdge被优化,由ConsumerPartionGroup 和 ConsumerVertexGroup来替代
ejv.connectToPredecessors(this.intermediateResults);
ExecutionJobVertex previousTask = this.tasks.putIfAbsent(jobVertex.getID(), ejv);
if (previousTask != null) {
throw new JobException(
String.format(
"Encountered two job vertices with ID %s : previous=[%s] / new=[%s]",
jobVertex.getID(), ejv, previousTask));
}
for (IntermediateResult res : ejv.getProducedDataSets()) {
IntermediateResult previousDataSet =
this.intermediateResults.putIfAbsent(res.getId(), res);
if (previousDataSet != null) {
throw new JobException(
String.format(
"Encountered two intermediate data set with ID %s : previous=[%s] / new=[%s]",
res.getId(), res, previousDataSet));
}
}
this.verticesInCreationOrder.add(ejv);
this.numVerticesTotal += ejv.getParallelism();
newExecJobVertices.add(ejv);
}
registerExecutionVerticesAndResultPartitions(this.verticesInCreationOrder);
// the topology assigning should happen before notifying new vertices to failoverStrategy
executionTopology = DefaultExecutionTopology.fromExecutionGraph(this);
partitionReleaseStrategy =
partitionReleaseStrategyFactory.createInstance(getSchedulingTopology());
}
1.遍历所有JobGraph中的顶点JobVertex
2.为每个JobVextex生成一个ExecutionJobVertex
3.为每个JobVertex构建ConsumedPartitionGroup和ConsumedVertexGroup
ExecutionVertex的构建
public ExecutionJobVertex(
InternalExecutionGraphAccessor graph,
JobVertex jobVertex,
int maxPriorAttemptsHistoryLength,
Time timeout,
long createTimestamp,
VertexParallelismInformation parallelismInfo,
SubtaskAttemptNumberStore initialAttemptCounts)
throws JobException {
if (graph == null || jobVertex == null) {
throw new NullPointerException();
}
this.graph = graph;
this.jobVertex = jobVertex;
// 获取算子并行度信息
this.parallelismInfo = parallelismInfo;
// verify that our parallelism is not higher than the maximum parallelism
if (this.parallelismInfo.getParallelism() > this.parallelismInfo.getMaxParallelism()) {
throw new JobException(
String.format(
"Vertex %s's parallelism (%s) is higher than the max parallelism (%s). Please lower the parallelism or increase the max parallelism.",
jobVertex.getName(),
this.parallelismInfo.getParallelism(),
this.parallelismInfo.getMaxParallelism()));
}
this.resourceProfile =
ResourceProfile.fromResourceSpec(jobVertex.getMinResources(), MemorySize.ZERO);
this.taskVertices = new ExecutionVertex[this.parallelismInfo.getParallelism()];
this.inputs = new ArrayList<>(jobVertex.getInputs().size());
// take the sharing group
this.slotSharingGroup = checkNotNull(jobVertex.getSlotSharingGroup());
this.coLocationGroup = jobVertex.getCoLocationGroup();
// create the intermediate results
// 通过JobVertex的IntermediateDataSets数量 初始化intermediateResult的空集合
this.producedDataSets =
new IntermediateResult[jobVertex.getNumberOfProducedIntermediateDataSets()];
// 对IntermediateDataSet构建一个IntermediateDataSets
for (int i = 0; i < jobVertex.getProducedDataSets().size(); i++) {
final IntermediateDataSet result = jobVertex.getProducedDataSets().get(i);
// 构建IntermediateResult
this.producedDataSets[i] =
new IntermediateResult(
result.getId(),
this,
this.parallelismInfo.getParallelism(),
result.getResultType());
}
// create all task vertices
// 根据并行度生成对应数量的ExecutionVertex
for (int i = 0; i < this.parallelismInfo.getParallelism(); i++) {
ExecutionVertex vertex =
new ExecutionVertex(
this,
i,
producedDataSets,
timeout,
createTimestamp,
maxPriorAttemptsHistoryLength,
initialAttemptCounts.getAttemptCount(i));
this.taskVertices[i] = vertex;
}
// sanity check for the double referencing between intermediate result partitions and
// execution vertices
for (IntermediateResult ir : this.producedDataSets) {
if (ir.getNumberOfAssignedPartitions() != this.parallelismInfo.getParallelism()) {
throw new RuntimeException(
"The intermediate result's partitions were not correctly assigned.");
}
}
final List<SerializedValue<OperatorCoordinator.Provider>> coordinatorProviders =
getJobVertex().getOperatorCoordinators();
if (coordinatorProviders.isEmpty()) {
this.operatorCoordinators = Collections.emptyList();
} else {
final ArrayList<OperatorCoordinatorHolder> coordinators =
new ArrayList<>(coordinatorProviders.size());
try {
for (final SerializedValue<OperatorCoordinator.Provider> provider :
coordinatorProviders) {
coordinators.add(
OperatorCoordinatorHolder.create(
provider, this, graph.getUserClassLoader()));
}
} catch (Exception | LinkageError e) {
IOUtils.closeAllQuietly(coordinators);
throw new JobException(
"Cannot instantiate the coordinator for operator " + getName(), e);
}
this.operatorCoordinators = Collections.unmodifiableList(coordinators);
}
// set up the input splits, if the vertex has any
try {
@SuppressWarnings("unchecked")
InputSplitSource<InputSplit> splitSource =
(InputSplitSource<InputSplit>) jobVertex.getInputSplitSource();
if (splitSource != null) {
Thread currentThread = Thread.currentThread();
ClassLoader oldContextClassLoader = currentThread.getContextClassLoader();
currentThread.setContextClassLoader(graph.getUserClassLoader());
try {
inputSplits =
splitSource.createInputSplits(this.parallelismInfo.getParallelism());
if (inputSplits != null) {
splitAssigner = splitSource.getInputSplitAssigner(inputSplits);
}
} finally {
currentThread.setContextClassLoader(oldContextClassLoader);
}
} else {
inputSplits = null;
}
} catch (Throwable t) {
throw new JobException(
"Creating the input splits caused an error: " + t.getMessage(), t);
}
}
1.获取并行度信息
2.通过JobVertex的IntermediateDataSet的数量初始化IntermediateResult集合
3.通过遍历JobGraph的IntermediateDataSet,对每一个IntermediateDataSet都构建一个IntermediateResult
4.根据并行度生成对应数量的ExecutionVertex
5.根据ExecutionVertex的数量来切分输入
查看根据并行度生成对应数量的ExecutionVertex
for (int i = 0; i < this.parallelismInfo.getParallelism(); i++) {
ExecutionVertex vertex =
new ExecutionVertex(
this,
i,
producedDataSets,
timeout,
createTimestamp,
maxPriorAttemptsHistoryLength,
initialAttemptCounts.getAttemptCount(i));
this.taskVertices[i] = vertex;
}
查看ExecutionVertex的构造方法
public ExecutionVertex(
ExecutionJobVertex jobVertex,
int subTaskIndex,
IntermediateResult[] producedDataSets,
Time timeout,
long createTimestamp,
int maxPriorExecutionHistoryLength,
int initialAttemptCount) {
this.jobVertex = jobVertex;
this.subTaskIndex = subTaskIndex;
this.executionVertexId = new ExecutionVertexID(jobVertex.getJobVertexId(), subTaskIndex);
this.taskNameWithSubtask =
String.format(
"%s (%d/%d)",
jobVertex.getJobVertex().getName(),
subTaskIndex + 1,
jobVertex.getParallelism());
this.resultPartitions = new LinkedHashMap<>(producedDataSets.length, 1);
for (IntermediateResult result : producedDataSets) {
IntermediateResultPartition irp =
new IntermediateResultPartition(
result,
this,
subTaskIndex,
getExecutionGraphAccessor().getEdgeManager());
result.setPartition(subTaskIndex, irp);
resultPartitions.put(irp.getPartitionId(), irp);
}
this.priorExecutions = new EvictingBoundedList<>(maxPriorExecutionHistoryLength);
/** ExecutionVertex 和 Execution
JobMaster把要部署的Task的必要信息都封装到一个对象中,然后发送给TaskExecutor
TaskExecutor接收到这个对象的时候,再次封装得到一个Task对象
部署:
JobMaster在拿到一个对应节点上的slot资源的时候,把要部署的Task的必要信息,都封装成Execution
然后执行Execution的deploy()执行部署
在deploy()方法的内部就会调用PRC请求,把必要的信息发送给TaskExecutor
TaskExecutor在接受到这些必要的信息的时候,把这些信息封装成一个Task对象 然后启动这个Task就完成了部署
*/
this.currentExecution =
new Execution(
getExecutionGraphAccessor().getFutureExecutor(),
this,
initialAttemptCount,
createTimestamp,
timeout);
getExecutionGraphAccessor().registerExecution(currentExecution);
this.timeout = timeout;
this.inputSplits = new ArrayList<>();
}
ExecutionVertex的构造方法里主要做了两件事儿
1.根据并行度对IntermediateResult进行分区,生成IntermediateResultPartition
2.对Task所需要的一些必要信息进行封装,封装为一个Exection
JobMaster在拿到一个对应节点上的Slot资源的时候,会把要部署的Task的必要信息,都封装成一个Execution,然后执行Execution的deploy()方法执行部署。在该方法内部会调用RPC请求,把必要的信息发送给TaskExecutor。TaskExecutor在接收到这些必要信息的时候,再把这些信息封装成一个Task对象,然后启动这个Task就完成了部署
ConsumedPartitionGroup和ConsumedVertexGroup的构建
ConsumedPartitionGroup和ConsumedVertexGroup的初始化流程在executionGraph.attachJobGraph方法,点击进入ejv.connectToPredecessors(this.intermediateResults) 方法内。
public void connectToPredecessors(
Map<IntermediateDataSetID, IntermediateResult> intermediateDataSets)
throws JobException {
// 获取JobVertex的输入
List<JobEdge> inputs = jobVertex.getInputs();
if (LOG.isDebugEnabled()) {
LOG.debug(
String.format(
"Connecting ExecutionJobVertex %s (%s) to %d predecessors.",
jobVertex.getID(), jobVertex.getName(), inputs.size()));
}
for (int num = 0; num < inputs.size(); num++) {
JobEdge edge = inputs.get(num);
if (LOG.isDebugEnabled()) {
if (edge.getSource() == null) {
LOG.debug(
String.format(
"Connecting input %d of vertex %s (%s) to intermediate result referenced via ID %s.",
num,
jobVertex.getID(),
jobVertex.getName(),
edge.getSourceId()));
} else {
LOG.debug(
String.format(
"Connecting input %d of vertex %s (%s) to intermediate result referenced via predecessor %s (%s).",
num,
jobVertex.getID(),
jobVertex.getName(),
edge.getSource().getProducer().getID(),
edge.getSource().getProducer().getName()));
}
}
// fetch the intermediate result via ID. if it does not exist, then it either has not
// been created, or the order
// in which this method is called for the job vertices is not a topological order
IntermediateResult ires = intermediateDataSets.get(edge.getSourceId());
if (ires == null) {
throw new JobException(
"Cannot connect this job graph to the previous graph. No previous intermediate result found for ID "
+ edge.getSourceId());
}
this.inputs.add(ires);
// 构建ConsumedPartitionGroup和ConsumedVertexGroup
EdgeManagerBuildUtil.connectVertexToResult(this, ires, edge.getDistributionPattern());
}
}
1.获取JobVertex的输入
2.根据输入获取到的Jobvertex的输入边JobEdge
3.对每一条输入边都来获取IntermediateDataSets中对应的中间结果数据IntermediateResult
4.根据IntermediateResult来构建ConsumedPartitionGroup和ConsumedVertexGroup
进入EdgeManagerBuildUtil.connectVertexToResult()
static void connectVertexToResult(
ExecutionJobVertex vertex,
IntermediateResult intermediateResult,
DistributionPattern distributionPattern) {
switch (distributionPattern) {
case POINTWISE:
connectPointwise(vertex.getTaskVertices(), intermediateResult);
break;
case ALL_TO_ALL:
connectAllToAll(vertex.getTaskVertices(), intermediateResult);
break;
default:
throw new IllegalArgumentException("Unrecognized distribution pattern.");
}
}
这里的switch匹配了三种模式:以all-to-all为例文章来源:https://www.toymoban.com/news/detail-839949.html
private static void connectAllToAll(
ExecutionVertex[] taskVertices, IntermediateResult intermediateResult) {
// todo 遍历 intermediateResultPartition 来构建ConsumedPartitionGroup
ConsumedPartitionGroup consumedPartitions =
ConsumedPartitionGroup.fromMultiplePartitions(
Arrays.stream(intermediateResult.getPartitions())
.map(IntermediateResultPartition::getPartitionId)
.collect(Collectors.toList()));
// 将ConsumedpartitionGroup连接到ExecutionVertex
for (ExecutionVertex ev : taskVertices) {
ev.addConsumedPartitionGroup(consumedPartitions);
}
// 将下游ExecutionVertex构建为ConsumerVertexGroup
ConsumerVertexGroup vertices =
ConsumerVertexGroup.fromMultipleVertices(
Arrays.stream(taskVertices)
.map(ExecutionVertex::getID)
.collect(Collectors.toList()));
// 将每一个IntermediateResultPartition链接至下游ConsumerVertexGroup
for (IntermediateResultPartition partition : intermediateResult.getPartitions()) {
partition.addConsumers(vertices);
}
}
总结:
1.遍历IntermediateResult中所有分区IntermediateResultPartition,并将这些IntermediateResultPartition构建为一个ConsumedPartitionGroup对象
2.将当前IntermediaResult所有连接的下游ExecutionVextex都连接到这个IntermediateResult构建出的ConsumedPartitionGroup
3.将当前IntermediaResult锁连接的下一级ExecutionVextex都放入ConsumerVertexGroup对象中
4.将当前IntermediateResult的IntermediateResultPartition都连接到该ConsumerVertexGroup对象上文章来源地址https://www.toymoban.com/news/detail-839949.html
到了这里,关于Flink中ExecutionGraph的构建的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!