Skip to content

Client

omniread.pdf.client

PDF client abstractions for OmniRead.


Summary

This module defines the client layer responsible for retrieving raw PDF bytes from a concrete backing store.

Clients provide low-level access to PDF binaries and are intentionally decoupled from scraping and parsing logic. They do not perform validation, interpretation, or content extraction.

Typical backing stores include: - Local filesystems - Object storage (S3, GCS, etc.) - Network file systems

Classes

BasePDFClient

Bases: ABC

Abstract client responsible for retrieving PDF bytes from a specific backing store (filesystem, S3, FTP, etc.).

Notes

Responsibilities:

1
- Implementations must accept a source identifier appropriate to the backing store, return the full PDF binary payload, and raise retrieval-specific errors on failure
Functions
fetch abstractmethod
fetch(source: Any) -> bytes

Fetch raw PDF bytes from the given source.

Parameters:

Name Type Description Default
source Any

Identifier of the PDF location, such as a file path, object storage key, or remote reference.

required

Returns:

Name Type Description
bytes bytes

Raw PDF bytes.

Raises:

Type Description
Exception

Retrieval-specific errors defined by the implementation.

FileSystemPDFClient

Bases: BasePDFClient

PDF client that reads from the local filesystem.

Notes

Guarantees:

1
- This client reads PDF files directly from the disk and returns their raw binary contents
Functions
fetch
fetch(path: Path) -> bytes

Read a PDF file from the local filesystem.

Parameters:

Name Type Description Default
path Path

Filesystem path to the PDF file.

required

Returns:

Name Type Description
bytes bytes

Raw PDF bytes.

Raises:

Type Description
FileNotFoundError

If the path does not exist.

ValueError

If the path exists but is not a file.