The Web API Interface provides RESTful API endpoints for external applications to interact with the KAZU system, enabling Natural Language Processing (NLP) operations such as Named Entity Recognition (NER) and entity linking. It handles incoming requests by managing request IDs, authenticating users via JWT, converting various document formats into KAZU's internal representation, and orchestrating the execution of NLP pipelines. The interface also includes utilities for integration with external annotation tools like Label Studio.
Components
Request Handling
This component is responsible for processing incoming web requests, extracting relevant information like request IDs, and logging request details. The `AddRequestIdMiddleware` ensures that a unique request ID is added to each request and subsequently to the response headers, facilitating request tracking and logging.
Authentication
This component handles user authentication using JWT tokens. The `JWTAuthenticationBackend` validates the provided token, extracts user information, and manages access control for different endpoints. It also generates a request ID for each incoming request and logs authentication-related events, including warnings for invalid tokens or missing authorization headers.
Document Conversion
This component is responsible for converting various web document formats into KAZU's internal Document representation. It handles both simple and sectioned web documents, and also supports converting single entities into KAZU documents, preparing data for NLP processing.
Kazu Web API
This is the core component of the KAZU web application, providing various API endpoints for natural language processing tasks. It orchestrates the pipeline execution, handles different types of requests (NER, linking, custom pipelines), and interacts with other components for logging, document conversion, and authentication.
Label Studio Utilities
This component provides utility functions for integrating with Label Studio, a data annotation tool. It includes functionalities for creating default Label Studio annotation views and converting KAZU annotations to Label Studio format, facilitating data labeling workflows.