This system orchestrates the entire job execution lifecycle within the `bclaw_runner`, managing data handling, sub-pipeline execution, and scatter-gather operations. It compiles workflows into AWS Step Functions, processes data for parallel execution, and provides abstractions for S3 interactions, ensuring robust and scalable workflow management.
Components
Workflow Execution & Data Management
The overarching component responsible for orchestrating the entire job execution lifecycle within `bclaw_runner`, including comprehensive data handling (S3 interactions, caching), and facilitating data exchange for sub-pipelines and scatter-gather operations. It acts as the central control for the runtime environment.
Referenced Source Code
Subpipe Data Manager
This component is responsible for managing file transfers and job data for sub-pipelines. It handles the submission of data to a subpipe and the retrieval of results from it, primarily interacting with S3 for data storage.
Workflow Compiler
This component is responsible for compiling and defining AWS Step Functions state machines. It orchestrates various step types, including scatter-gather, batch, subpipe, native, chooser, and parallel steps, into a coherent workflow definition.
Scatter-Gather Definition
This component defines the structure and steps for scatter-gather operations within the state machine. It includes the initial scatter step, the map step for parallel processing, and the final gather step to collect results.
Subpipe Step Builder
This component specifically builds the individual steps involved in executing a subpipe within a larger state machine. It includes steps for submitting files, running the subpipe state machine, and retrieving output files.
Scatter Data Processor
This component is responsible for processing and expanding scatter data for parallel execution. It handles various data sources and formats, including static lists, job data references, file contents, and S3 globs, to prepare data for map steps.
S3 Repository Abstraction
This foundational component provides a high-level abstraction for interacting with S3 buckets and prefixes as repositories. It simplifies file and path management, enabling consistent access and manipulation of data within the system.
Runner Orchestrator
This is the primary control flow component for the 'bclaw_runner' application. It manages the overall execution, including input/output handling, command execution, and error handling, orchestrating the various stages of a job run.
Runner S3 Data Handler
This component specifically handles all S3-related data operations for the 'bclaw_runner'. It manages reading and writing job data, checking for previous runs, downloading inputs, and uploading outputs, ensuring data persistence and run state management.
Local Execution Environment
This component provides and manages the temporary local file system workspace for the 'bclaw_runner'. It also handles the execution of shell commands within a containerized environment and manages the writing of job data files to the workspace.
Runtime Utilities
This component is a collection of supporting utilities used during runtime. It includes functionalities for caching reference inputs, performing string substitutions for dynamic values, and conducting quality control checks on execution results.