Scraper
omniread.pdf.scraper
Summary
PDF scraping implementation for OmniRead.
This module provides a PDF-specific scraper that coordinates PDF byte
retrieval via a client and normalizes the result into a Content object.
The scraper implements the core BaseScraper contract while delegating
all storage and access concerns to a BasePDFClient implementation.
Classes
PDFScraper
Bases: BaseScraper
Scraper for PDF sources.
Notes
Responsibilities:
1 2 3 | |
Constraints:
1 2 | |
Initialize the PDF scraper.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client |
BasePDFClient
|
PDF client responsible for retrieving raw PDF bytes. |
required |
Functions
fetch
Fetch a PDF document from the given source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source |
Any
|
Identifier of the PDF source as understood by the configured PDF client. |
required |
metadata |
Optional[Mapping[str, Any]]
|
Optional metadata to attach to the returned content. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Content |
Content
|
A |
Raises:
| Type | Description |
|---|---|
Exception
|
Retrieval-specific errors raised by the PDF client. |