{ "module": "omniread.pdf.parser", "content": { "path": "omniread.pdf.parser", "docstring": "# Summary\n\nPDF parser base implementations for OmniRead.\n\nThis module defines the **PDF-specific parser contract**, extending the\nformat-agnostic `BaseParser` with constraints appropriate for PDF content.\n\nPDF parsers are responsible for interpreting binary PDF data and producing\nstructured representations suitable for downstream consumption.", "objects": { "Generic": { "name": "Generic", "kind": "alias", "path": "omniread.pdf.parser.Generic", "signature": "", "docstring": null }, "TypeVar": { "name": "TypeVar", "kind": "alias", "path": "omniread.pdf.parser.TypeVar", "signature": "", "docstring": null }, "abstractmethod": { "name": "abstractmethod", "kind": "alias", "path": "omniread.pdf.parser.abstractmethod", "signature": "", "docstring": null }, "ContentType": { "name": "ContentType", "kind": "class", "path": "omniread.pdf.parser.ContentType", "signature": "", "docstring": "Supported MIME types for extracted content.\n\nNotes:\n **Guarantees:**\n\n - This enum represents the declared or inferred media type of the\n content source.\n - It is primarily used for routing content to the appropriate\n parser or downstream consumer.", "members": { "HTML": { "name": "HTML", "kind": "attribute", "path": "omniread.pdf.parser.ContentType.HTML", "signature": "", "docstring": "HTML document content." }, "PDF": { "name": "PDF", "kind": "attribute", "path": "omniread.pdf.parser.ContentType.PDF", "signature": "", "docstring": "PDF document content." }, "JSON": { "name": "JSON", "kind": "attribute", "path": "omniread.pdf.parser.ContentType.JSON", "signature": "", "docstring": "JSON document content." }, "XML": { "name": "XML", "kind": "attribute", "path": "omniread.pdf.parser.ContentType.XML", "signature": "", "docstring": "XML document content." } } }, "BaseParser": { "name": "BaseParser", "kind": "class", "path": "omniread.pdf.parser.BaseParser", "signature": "", "docstring": "Base interface for all parsers.\n\nNotes:\n **Guarantees:**\n\n - A parser is a self-contained object that owns the `Content` it is\n responsible for interpreting.\n - Consumers may rely on early validation of content compatibility\n and type-stable return values from `parse()`.\n\n **Responsibilities:**\n\n - Implementations must declare supported content types via `supported_types`.\n - Implementations must raise parsing-specific exceptions from `parse()`.\n - Implementations must remain deterministic for a given input.", "members": { "supported_types": { "name": "supported_types", "kind": "attribute", "path": "omniread.pdf.parser.BaseParser.supported_types", "signature": "", "docstring": "Set of content types supported by this parser. An empty set indicates that the parser is content-type agnostic." }, "content": { "name": "content", "kind": "attribute", "path": "omniread.pdf.parser.BaseParser.content", "signature": "", "docstring": null }, "parse": { "name": "parse", "kind": "function", "path": "omniread.pdf.parser.BaseParser.parse", "signature": "", "docstring": "Parse the owned content into structured output.\n\nReturns:\n T:\n Parsed, structured representation.\n\nRaises:\n Exception:\n Parsing-specific errors as defined by the implementation.\n\nNotes:\n **Responsibilities:**\n\n - Implementations must fully consume the provided content and\n return a deterministic, structured output." }, "supports": { "name": "supports", "kind": "function", "path": "omniread.pdf.parser.BaseParser.supports", "signature": "", "docstring": "Check whether this parser supports the content's type.\n\nReturns:\n bool:\n True if the content type is supported; False otherwise." } } }, "T": { "name": "T", "kind": "attribute", "path": "omniread.pdf.parser.T", "signature": null, "docstring": null }, "PDFParser": { "name": "PDFParser", "kind": "class", "path": "omniread.pdf.parser.PDFParser", "signature": "", "docstring": "Base PDF parser.\n\nNotes:\n **Responsibilities:**\n\n - This class enforces PDF content-type compatibility and provides\n the extension point for implementing concrete PDF parsing strategies.\n\n **Constraints:**\n\n - Concrete implementations must define the output type `T` and\n implement the `parse()` method.", "members": { "supported_types": { "name": "supported_types", "kind": "attribute", "path": "omniread.pdf.parser.PDFParser.supported_types", "signature": null, "docstring": "Set of content types supported by this parser (PDF only)." }, "parse": { "name": "parse", "kind": "function", "path": "omniread.pdf.parser.PDFParser.parse", "signature": "", "docstring": "Parse PDF content into a structured output.\n\nReturns:\n T:\n Parsed representation of type `T`.\n\nRaises:\n Exception:\n Parsing-specific errors as defined by the implementation.\n\nNotes:\n **Responsibilities:**\n\n - Implementations must fully interpret the PDF binary payload and\n return a deterministic, structured output." } } } } } }