This graph illustrates the architecture of BayerCLAW, a system designed for orchestrating and executing complex bioinformatics workflows. The main flow involves the `Job Orchestration & Workflow Definition` component, which handles initial job setup, routing, and compiles high-level workflow definitions into executable state machine language. This compiled workflow is then passed to the `Workflow Execution & Data Management` component, which manages the actual execution, data handling, and specific workflow patterns like scatter-gather. The `Runner Execution Environment` provides the isolated environment for command execution. Throughout the process, `Quality Control & Termination` ensures correctness and manages instance lifecycle, while `Common Utilities & Notifications` provides shared services for data manipulation and communication.
Components
Job Orchestration & Workflow Definition
Manages the initial setup, routing, and comprehensive compilation of high-level workflow definitions into executable AWS Step Functions state machine language, including handling various step types (batch, scatter-gather, parallel, sub-pipeline, native, chooser) and their validation.
Referenced Source Code
Workflow Execution & Data Management
The central control component for the `bclaw_runner`, managing the entire job execution lifecycle, including data handling (S3 repository interactions, caching), and facilitating data exchange for sub-pipelines and scatter-gather operations during execution.
Referenced Source Code
Runner Execution Environment
Manages the local execution environment for the runner, responsible for running user-defined commands within that workspace and handling the execution of child containers using Docker-in-Docker.
Referenced Source Code
Quality Control & Termination
Performs quality control checks both within the AWS Lambda environment and during job execution, and manages the graceful termination of instances, particularly for spot instances.
Referenced Source Code
Common Utilities & Notifications
Provides shared utility functions for data manipulation, including general substitutions, repository operations, reading various file formats, and generating/sending notifications related to workflow state changes.