The `Training & Evaluation Pipeline` component in `chemicalx` is the central orchestrator for deep learning experiments, managing the entire lifecycle from data preparation to model evaluation. It embodies the Pipeline architectural pattern, ensuring a structured and reproducible approach to model training and assessment. This modular design promotes reusability and extensibility, aligning with the project's architectural bias.
Components
Training & Evaluation Pipeline
This is the core orchestrator (`chemicalx.pipeline.pipeline`) that manages the end-to-end workflow for training, validating, and evaluating deep learning models. It initializes datasets, models, optimizers, and loss functions, then executes the training loop, handles device placement, and performs model evaluation, finally packaging all results.
Result Class
The `Result` class (`chemicalx.pipeline.Result`) acts as a structured container for all outputs and metadata generated by a pipeline run. It encapsulates the trained model, predictions, losses, training/evaluation times, and calculated metrics, providing utility methods for summarization and persistence of experimental outcomes.
Dataset Loaders
This component (`chemicalx.data.datasetloader`) is responsible for loading and preparing raw datasets. It abstracts the process of accessing different data sources (e.g., DrugComb, DrugbankDDI) and provides the necessary data structures for the pipeline.
Referenced Source Code
Batch Generators
The `BatchGenerator` component (`chemicalx.data.batchgenerator`) is responsible for iterating over the loaded data in mini-batches. It prepares the data in a format suitable for model input, including handling context features, drug features, and drug molecules.
Referenced Source Code
Deep Learning Models
This component (`chemicalx.models.base.Model` and its subclasses like `CASTER`, `DeepDDI`, etc.) represents the various deep learning architectures used for training and inference. These models define the forward pass and are the primary entities being optimized by the pipeline.
Optimizers (PyTorch)
This component, leveraging PyTorch's `torch.optim` module, is responsible for updating the parameters of the deep learning model during training based on the calculated gradients of the loss function.
Referenced Source Code
Loss Functions (PyTorch)
This component, utilizing PyTorch's `torch.nn` module, quantifies the discrepancy between the model's predictions and the actual target values. The calculated loss is then used by the optimizer to adjust model parameters.
Referenced Source Code
Device Utilities
The `resolve_device` utility (`chemicalx.utils.resolve_device`) handles the placement of models and data onto the appropriate computing device (CPU or GPU), ensuring efficient execution of deep learning operations.
Metric Calculators
This component, primarily using functions from `sklearn.metrics` (e.g., `roc_auc_score`), is responsible for computing various evaluation metrics to assess the performance of the trained model on the test set.