The `SRAgent` project, an AI Agent-based Bioinformatics Data Curation and Retrieval System, is architected around a modular design with a clear separation of concerns. The system leverages an agent-oriented approach, orchestrating various AI agents and specialized tools to interact with external bioinformatics services and manage internal data.
Components
User Interface (CLI)
The primary entry point for users, responsible for parsing commands, validating inputs, and initiating specific data curation and retrieval workflows. It provides the interactive command-line experience.
Agent Workflow Orchestrator
Orchestrates complex, multi-step data curation and retrieval processes. It defines the sequence of operations, coordinates the execution of various `Core AI Agents`, and manages their interactions to achieve high-level bioinformatics tasks.
Core AI Agents
Encapsulates the LLM-driven reasoning and decision-making logic for specialized bioinformatics tasks. Each agent leverages `External Service Integration` tools to perform atomic operations and contributes to the overall workflow. This component also handles formatting and displaying agent progress.
External Service Integration
Provides a standardized and abstracted interface for interacting with various external bioinformatics APIs and services. This includes NCBI Entrez databases, SRA data processing utilities, Google Cloud BigQuery, and the UBERON tissue ontology service. It acts as a wrapper for external interactions.
Data Persistence Layer
Manages all persistent data storage and retrieval operations, primarily with the PostgreSQL database. It handles connection management, schema creation, data insertion, updates, and querying, abstracting the underlying database interactions from other components.
System Utilities & Support Scripts
A collection of foundational shared helper functions, configuration management, and standalone scripts. This includes general data manipulation, command execution, initial data ingestion, format transformations, database administration, metadata enrichment, dataset discovery, and the evaluation framework.