CodeBoarding

Initializing diagram...

The KAZU system is a comprehensive Natural Language Processing (NLP) framework designed for biomedical text analysis. Its main flow involves processing documents through a configurable NLP pipeline that performs Named Entity Recognition (NER) and Entity Linking & Disambiguation. Core data models underpin all operations, while ontology management provides the necessary knowledge base. The system also includes robust tooling for resource curation, model training and evaluation, and a web API for external integration, all supported by a suite of general utilities and quality assurance mechanisms.

Components

Core Data Models

Defines fundamental data structures (documents, entities, sections, mappings, ontology resources) used across the KAZU system.

Referenced Source Code

NLP Pipeline Management

Orchestrates the execution of NLP processing steps on documents and manages spaCy models within the pipeline.

Referenced Source Code

Named Entity Recognition (NER)

Identifies and extracts named entities from text using transformer models, rule-based approaches, and post-processing.

Entity Linking & Disambiguation

Links identified entities to external knowledge bases and disambiguates between potential links using dictionary, rule-based, and context-scoring strategies.

Ontology Management

Manages the parsing, curation, and generation of synonyms for various ontologies, supporting external knowledge integration.

Referenced Source Code

Model Training & Evaluation

Provides functionalities for training, predicting, and evaluating machine learning models, particularly for multi-label NER, including data handling and metric calculation.

Referenced Source Code

Resource Management Tools (KRT)

Offers interactive tools for managing and curating Kazu resources, including resource editing, conflict resolution, and ontology updates.

Referenced Source Code

Web API Interface

Provides RESTful API endpoints for external applications to interact with the KAZU system, enabling NER and entity linking operations.

Referenced Source Code

Shared Utilities

A collection of reusable utility functions and helper classes supporting various KAZU functionalities, including string normalization, caching, and abbreviation detection.

Referenced Source Code

Annotation & Quality Assurance

Provides tools for converting KAZU data for annotation and performing acceptance tests to ensure the quality and consistency of annotations and pipeline results.

Initializing diagram...

Components

Core Data Models

Defines fundamental data structures (documents, entities, sections, mappings, ontology resources) used across the KAZU system.

Referenced Source Code

NLP Pipeline Management

Orchestrates the execution of NLP processing steps on documents and manages spaCy models within the pipeline.

Referenced Source Code

Named Entity Recognition (NER)

Identifies and extracts named entities from text using transformer models, rule-based approaches, and post-processing.

Entity Linking & Disambiguation

Links identified entities to external knowledge bases and disambiguates between potential links using dictionary, rule-based, and context-scoring strategies.

Ontology Management

Manages the parsing, curation, and generation of synonyms for various ontologies, supporting external knowledge integration.

Referenced Source Code

Model Training & Evaluation

Provides functionalities for training, predicting, and evaluating machine learning models, particularly for multi-label NER, including data handling and metric calculation.

Referenced Source Code

Resource Management Tools (KRT)

Offers interactive tools for managing and curating Kazu resources, including resource editing, conflict resolution, and ontology updates.

Referenced Source Code

Web API Interface

Provides RESTful API endpoints for external applications to interact with the KAZU system, enabling NER and entity linking operations.

Referenced Source Code

Shared Utilities

A collection of reusable utility functions and helper classes supporting various KAZU functionalities, including string normalization, caching, and abbreviation detection.

Referenced Source Code

Annotation & Quality Assurance

Provides tools for converting KAZU data for annotation and performing acceptance tests to ensure the quality and consistency of annotations and pipeline results.