CodeBoarding

Initializing diagram...

The Model Training & Evaluation subsystem in KAZU provides a comprehensive suite of functionalities for developing, training, and assessing machine learning models, specifically tailored for multi-label Named Entity Recognition (NER). Its main flow involves preparing raw data, training various transformer-based models, persisting these models, and then using them for prediction and evaluation. The subsystem also integrates with external tools like Label Studio for annotation visualization and leverages KAZU's internal pipeline steps for document processing, ensuring a cohesive and efficient machine learning workflow.

Components

Training Orchestrator

Manages the entire lifecycle of model training, including initialization, data loading, optimization, evaluation, and model saving. It coordinates interactions with datasets, model architectures, and persistence mechanisms.

Data Preparation

Responsible for transforming raw documents and annotations into a format suitable for model training and evaluation. This includes tokenization, alignment of entities, and creation of multi-hot encoded labels, along with utilities for yielding documents from various sources.

Model Architectures

Defines the specific neural network models used for multi-label token classification. These models typically extend pre-trained Hugging Face Transformers models and are adapted to handle multi-label outputs.

Model Persistence

Handles the saving and loading of trained models, their configurations, and associated artifacts, ensuring that trained models can be reused for prediction or further evaluation.

Prediction & Evaluation Execution

Manages the execution of prediction and evaluation tasks. It orchestrates the loading of trained models, processing of documents through the KAZU pipeline, generation of predictions, and calculation of performance metrics.

Referenced Source Code

Label Studio Integration

Provides functionalities to interact with Label Studio, an annotation tool. It enables the creation and updating of annotation views, allowing for visualization of model predictions alongside gold annotations for human review.

KAZU Pipeline Steps

Encompasses the KAZU pipeline steps specifically designed for Named Entity Recognition (NER) using Hugging Face Transformers models. These steps integrate the Transformer models into the KAZU processing pipeline and convert their token-level outputs into structured KAZU Entity objects.

Referenced Source Code

Configuration

Manages configuration settings for model training and evaluation, such as model paths, training parameters, and data directories.

Referenced Source Code

Initializing diagram...

Components

Training Orchestrator

Data Preparation

Model Architectures

Model Persistence

Handles the saving and loading of trained models, their configurations, and associated artifacts, ensuring that trained models can be reused for prediction or further evaluation.

Prediction & Evaluation Execution

Referenced Source Code

Label Studio Integration

KAZU Pipeline Steps

Referenced Source Code

Configuration

Manages configuration settings for model training and evaluation, such as model paths, training parameters, and data directories.