Contributing¶
Contributions to Document Extraction Tools are welcome!
Getting Started¶
- Fork the repository
- Clone your fork:
- Install dependencies:
Development Workflow¶
Branch Naming¶
Use descriptive branch names with prefixes:
feat/short-description- New featuresfix/short-description- Bug fixesdocs/short-description- Documentation updatesrefactor/short-description- Code refactoringtest/short-description- Test additions/updates
Running Tests¶
Linting and Formatting¶
Run pre-commit hooks before committing:
This runs:
- Ruff - Linting and formatting
- Type checking - Via pyright/mypy
Code Style¶
The project uses:
- Ruff for linting and formatting
- Google-style docstrings
- Type hints throughout
Example:
class BaseExtractor(ABC):
"""Abstract interface for data extraction."""
def __init__(
self,
config: BaseExtractorConfig | ExtractionPipelineConfig | EvaluationPipelineConfig,
) -> None:
"""Initialize with a configuration object.
Args:
config: Component-specific config or full pipeline configuration.
"""
if isinstance(config, (ExtractionPipelineConfig, EvaluationPipelineConfig)):
self.pipeline_config = config
self.config = config.extractor
else:
self.pipeline_config = None
self.config = config
@abstractmethod
async def extract(
self,
document: Document,
schema: type[ExtractionSchema],
context: PipelineContext | None = None,
) -> ExtractionResult[ExtractionSchema]:
"""Extracts structured data from a Document to match the provided Schema.
Args:
document: The fully parsed document.
schema: The Pydantic model class defining the target structure.
context: Optional shared pipeline context.
Returns:
An ExtractionResult containing the extracted data.
"""
pass
Pull Request Process¶
- Create a new branch from
main - Make your changes
- Run tests and linting:
- Commit with clear, descriptive messages
- Push to your fork
- Open a PR against
main - Fill out the PR template with:
- Description of changes
- Related issues
- Testing performed
Reporting Issues¶
Open an issue on GitHub with:
- Clear description of the problem
- Steps to reproduce
- Expected vs actual behavior
- Environment details (Python version, OS, etc.)
Maintainers¶
Feel free to reach out if you have questions about contributing!