Configuration API¶
Configuration classes and loading utilities.
Config Loaders¶
load_extraction_config¶
load_extraction_config
¶
load_extraction_config(lister_config_cls: type[BaseFileListerConfig], reader_config_cls: type[BaseReaderConfig], converter_config_cls: type[BaseConverterConfig], extractor_config_cls: type[BaseExtractorConfig], extraction_exporter_config_cls: type[BaseExtractionExporterConfig], extraction_orchestrator_config_cls: type[ExtractionOrchestratorConfig] = ExtractionOrchestratorConfig, config_dir: Path = Path('config/yaml')) -> ExtractionPipelineConfig
Loads extraction configuration based on component filenames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lister_config_cls
|
type[BaseFileListerConfig]
|
The FileListerConfig subclass to use. |
required |
reader_config_cls
|
type[BaseReaderConfig]
|
The ReaderConfig subclass to use. |
required |
converter_config_cls
|
type[BaseConverterConfig]
|
The ConverterConfig subclass to use. |
required |
extractor_config_cls
|
type[BaseExtractorConfig]
|
The ExtractorConfig subclass to use. |
required |
extraction_exporter_config_cls
|
type[BaseExtractionExporterConfig]
|
The ExtractionExporterConfig subclass to use. |
required |
extraction_orchestrator_config_cls
|
type[ExtractionOrchestratorConfig]
|
The ExtractionOrchestratorConfig class to use. |
ExtractionOrchestratorConfig
|
config_dir
|
Path
|
Directory containing the configs. |
Path('config/yaml')
|
Returns:
| Name | Type | Description |
|---|---|---|
ExtractionPipelineConfig |
ExtractionPipelineConfig
|
The fully validated configuration. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the config directory or mapping file is missing. |
Source code in src/document_extraction_tools/config/config_loader.py
load_evaluation_config¶
load_evaluation_config
¶
load_evaluation_config(test_data_loader_config_cls: type[BaseTestDataLoaderConfig], evaluator_config_classes: list[type[BaseEvaluatorConfig]], reader_config_cls: type[BaseReaderConfig], converter_config_cls: type[BaseConverterConfig], extractor_config_cls: type[BaseExtractorConfig], evaluation_exporter_config_cls: type[BaseEvaluationExporterConfig], evaluation_orchestrator_config_cls: type[EvaluationOrchestratorConfig] = EvaluationOrchestratorConfig, config_dir: Path = Path('config/yaml')) -> EvaluationPipelineConfig
Loads evaluation configuration based on component filenames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
test_data_loader_config_cls
|
type[BaseTestDataLoaderConfig]
|
The TestDataLoaderConfig subclass to use. |
required |
evaluator_config_classes
|
list[type[BaseEvaluatorConfig]]
|
EvaluatorConfig subclasses to load using the top-level keys in evaluator.yaml. |
required |
reader_config_cls
|
type[BaseReaderConfig]
|
The ReaderConfig subclass to use. |
required |
converter_config_cls
|
type[BaseConverterConfig]
|
The ConverterConfig subclass to use. |
required |
extractor_config_cls
|
type[BaseExtractorConfig]
|
The ExtractorConfig subclass to use. |
required |
evaluation_exporter_config_cls
|
type[BaseEvaluationExporterConfig]
|
The EvaluationExporterConfig subclass to use. |
required |
evaluation_orchestrator_config_cls
|
type[EvaluationOrchestratorConfig]
|
The EvaluationOrchestratorConfig class to use. |
EvaluationOrchestratorConfig
|
config_dir
|
Path
|
Directory containing the configs. |
Path('config/yaml')
|
Returns:
| Name | Type | Description |
|---|---|---|
EvaluationPipelineConfig |
EvaluationPipelineConfig
|
The fully validated configuration. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the config directory or mapping file is missing. |
Source code in src/document_extraction_tools/config/config_loader.py
Pipeline Configs¶
These master config classes aggregate all component configurations for a pipeline.
ExtractionPipelineConfig¶
ExtractionPipelineConfig
¶
Bases: BaseModel
Master container for extraction pipeline component configurations.
This class aggregates the configurations for all pipeline components.
extraction_orchestrator
class-attribute
instance-attribute
¶
extraction_orchestrator: ExtractionOrchestratorConfig = Field(..., description='Configuration for orchestrating extraction execution.')
file_lister
class-attribute
instance-attribute
¶
reader
class-attribute
instance-attribute
¶
converter
class-attribute
instance-attribute
¶
converter: BaseConverterConfig = Field(..., description='Configuration for converting raw bytes into documents.')
extractor
class-attribute
instance-attribute
¶
extractor: BaseExtractorConfig = Field(..., description='Configuration for extracting structured data.')
extraction_exporter
class-attribute
instance-attribute
¶
extraction_exporter: BaseExtractionExporterConfig = Field(..., description='Configuration for exporting extracted data.')
EvaluationPipelineConfig¶
EvaluationPipelineConfig
¶
Bases: BaseModel
Master container for evaluation pipeline component configurations.
This class aggregates the configurations for all evaluation pipeline components.
evaluation_orchestrator
class-attribute
instance-attribute
¶
evaluation_orchestrator: EvaluationOrchestratorConfig = Field(..., description='Configuration for orchestrating evaluation execution.')
test_data_loader
class-attribute
instance-attribute
¶
test_data_loader: BaseTestDataLoaderConfig = Field(..., description='Configuration for loading evaluation examples.')
evaluators
class-attribute
instance-attribute
¶
evaluators: list[BaseEvaluatorConfig] = Field(..., description='Evaluator configurations to apply.')
reader
class-attribute
instance-attribute
¶
converter
class-attribute
instance-attribute
¶
converter: BaseConverterConfig = Field(..., description='Configuration for converting raw bytes into documents.')
extractor
class-attribute
instance-attribute
¶
extractor: BaseExtractorConfig = Field(..., description='Configuration for extracting structured data.')
evaluation_exporter
class-attribute
instance-attribute
¶
evaluation_exporter: BaseEvaluationExporterConfig = Field(..., description='Configuration for exporting evaluation results.')
Extraction Pipeline Configs¶
BaseFileListerConfig¶
BaseFileListerConfig
¶
Bases: BaseModel
Base config for File Listers.
Implementations should subclass this to add specific parameters.
BaseReaderConfig¶
BaseReaderConfig
¶
Bases: BaseModel
Base config for Readers.
Implementations should subclass this to add specific parameters.
BaseConverterConfig¶
BaseConverterConfig
¶
Bases: BaseModel
Base config for Converters.
Implementations should subclass this to add specific parameters.
BaseExtractorConfig¶
BaseExtractorConfig
¶
Bases: BaseModel
Base config for Extractors.
Implementations should subclass this to add specific parameters.
BaseExtractionExporterConfig¶
BaseExtractionExporterConfig
¶
Bases: BaseModel
Base config for Exporters.
Implementations should subclass this to add specific parameters.
ExtractionOrchestratorConfig¶
ExtractionOrchestratorConfig
¶
Evaluation Pipeline Configs¶
BaseTestDataLoaderConfig¶
BaseTestDataLoaderConfig
¶
Bases: BaseModel
Base config for Test Data Loaders.
Implementations should subclass this to add specific parameters.
BaseEvaluatorConfig¶
BaseEvaluatorConfig
¶
Bases: BaseModel
Base config for Evaluators.
Implementations should subclass this to add specific parameters.
BaseEvaluationExporterConfig¶
BaseEvaluationExporterConfig
¶
Bases: BaseModel
Base config for Evaluation Exporters.
Implementations should subclass this to add specific parameters.
EvaluationOrchestratorConfig¶
EvaluationOrchestratorConfig
¶
Creating Custom Configs¶
Subclass the base config to add your fields:
from document_extraction_tools.config import BaseExtractorConfig
class MyExtractorConfig(BaseExtractorConfig):
model_name: str
temperature: float = 0.0
max_tokens: int = 4096
api_key: str | None = None
Then create the corresponding YAML file: