Parser
omniread.pdf.parser
Summary
PDF parser base implementations for OmniRead.
This module defines the PDF-specific parser contract, extending the
format-agnostic BaseParser with constraints appropriate for PDF content.
PDF parsers are responsible for interpreting binary PDF data and producing structured representations suitable for downstream consumption.
Classes
PDFParser
Bases: BaseParser[T], Generic[T]
Base PDF parser.
Notes
Responsibilities:
1 2 | |
Constraints:
1 2 | |
Initialize the parser with content to be parsed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
content |
Content
|
Content instance to be parsed. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the content type is not supported by this parser. |
Attributes
supported_types
class-attribute
instance-attribute
Set of content types supported by this parser (PDF only).
Functions
parse
abstractmethod
Parse PDF content into a structured output.
Returns:
| Name | Type | Description |
|---|---|---|
T |
T
|
Parsed representation of type |
Raises:
| Type | Description |
|---|---|
Exception
|
Parsing-specific errors as defined by the implementation. |
Notes
Responsibilities:
1 2 | |
supports
Check whether this parser supports the content's type.
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the content type is supported; False otherwise. |