Scraper
omniread.core.scraper
Summary
Abstract scraping contracts for OmniRead.
This module defines the format-agnostic scraper interface responsible for acquiring raw content from external sources.
Scrapers are responsible for:
- Locating and retrieving raw content bytes
- Attaching minimal contextual metadata
- Returning normalized
Contentobjects
Scrapers are explicitly NOT responsible for:
- Parsing or interpreting content
- Inferring structure or semantics
- Performing content-type specific processing
All interpretation must be delegated to parsers.
Classes
BaseScraper
Bases: ABC
Base interface for all scrapers.
Notes
Responsibilities:
1 2 3 4 5 6 7 | |
Constraints:
1 2 | |
Functions
fetch
abstractmethod
Fetch raw content from the given source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source |
str
|
Location identifier (URL, file path, S3 URI, etc.). |
required |
metadata |
Optional[Mapping[str, Any]]
|
Optional hints for the scraper (headers, auth, etc.). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Content |
Content
|
Content object containing raw bytes and metadata. |
Raises:
| Type | Description |
|---|---|
Exception
|
Retrieval-specific errors as defined by the implementation. |
Notes
Responsibilities:
1 2 | |