Content
omniread.core.content
Summary
Canonical content models for OmniRead.
This module defines the format-agnostic content representation used across all parsers and scrapers in OmniRead.
The models defined here represent what was extracted, not how it was retrieved or parsed. Format-specific behavior and metadata must not alter the semantic meaning of these models.
Classes
Content
dataclass
Normalized representation of extracted content.
Notes
Responsibilities:
1 2 3 4 | |
Attributes
content_type
class-attribute
instance-attribute
Optional MIME type of the content, if known.
metadata
class-attribute
instance-attribute
Optional, implementation-defined metadata associated with the content (e.g., headers, encoding hints, extraction notes).
source
instance-attribute
Identifier of the content origin (URL, file path, or logical name).
ContentType
Bases: str, Enum
Supported MIME types for extracted content.
Notes
Guarantees:
1 2 3 4 | |