CodeBoarding

Initializing diagram...

The NLP Pipeline Management subsystem in KAZU is responsible for orchestrating the execution of Natural Language Processing (NLP) steps on documents and efficiently managing spaCy models within these pipelines. It encompasses the core pipeline execution flow, including error handling and performance monitoring, and provides foundational utilities for loading, reloading, and processing documents with spaCy models. This subsystem integrates various NLP processing steps, leverages an in-memory database for efficient lookups, and interacts with training and evaluation components, as well as ontology preprocessing for knowledge enrichment. It also supports testing and management within the Kazu Resource Tool (KRT) environment.

Components

Pipeline Orchestration

Manages the execution flow of documents through a series of NLP steps, handling pre-filtering, error management, and performance profiling. It is the central component for processing documents within KAZU.

Referenced Source Code

SpaCy Pipeline Management

A foundational utility for managing and providing access to spaCy language models and custom pipeline components, including mechanisms for adding, retrieving, and reloading models.

NLP Processing Steps

A collection of individual processing steps for Named Entity Recognition (NER) and entity linking, encompassing various model-based (e.g., LLM, HuggingFace, spaCy) and rule-based approaches.

General Utilities

Provides common utility functions used across the KAZU system, such as path handling, simple document creation, and specialized NLP utilities like abbreviation detection and spaCy object mapping.

Referenced Source Code

In-Memory Database

Manages an in-memory database primarily used for storing and retrieving synonym data, enabling efficient lookups and matching during NLP processing steps.

Model Training & Evaluation

Provides functionalities for training, predicting, and evaluating machine learning models, particularly for Named Entity Recognition (NER). It includes utilities for data handling, model wrapping, and metric calculation.

Referenced Source Code

Ontology Preprocessing

Responsible for generating and expanding synonyms and variants from ontological data, often leveraging spaCy pipelines for linguistic analysis to enrich the knowledge base.

KRT Pipeline Testing

Facilitates the testing and management of Kazu pipelines within the Kazu Resource Tool (KRT) environment, interacting with resource managers to load and test pipeline configurations.

Referenced Source Code

Initializing diagram...

Components

Pipeline Orchestration

Referenced Source Code

SpaCy Pipeline Management

A foundational utility for managing and providing access to spaCy language models and custom pipeline components, including mechanisms for adding, retrieving, and reloading models.

NLP Processing Steps

A collection of individual processing steps for Named Entity Recognition (NER) and entity linking, encompassing various model-based (e.g., LLM, HuggingFace, spaCy) and rule-based approaches.

General Utilities

Provides common utility functions used across the KAZU system, such as path handling, simple document creation, and specialized NLP utilities like abbreviation detection and spaCy object mapping.

Referenced Source Code

In-Memory Database

Manages an in-memory database primarily used for storing and retrieving synonym data, enabling efficient lookups and matching during NLP processing steps.

Model Training & Evaluation

Referenced Source Code

Ontology Preprocessing

Responsible for generating and expanding synonyms and variants from ontological data, often leveraging spaCy pipelines for linguistic analysis to enrich the knowledge base.

KRT Pipeline Testing

Facilitates the testing and management of Kazu pipelines within the Kazu Resource Tool (KRT) environment, interacting with resource managers to load and test pipeline configurations.