This commit is contained in:
@@ -896,26 +896,26 @@
|
||||
|
||||
<div class="doc doc-contents first">
|
||||
|
||||
<p>PDF format implementation for OmniRead.</p>
|
||||
<hr />
|
||||
<h4 id="omniread.pdf--summary">Summary</h4>
|
||||
<h3 id="omniread.pdf--summary">Summary</h3>
|
||||
<p>PDF format implementation for OmniRead.</p>
|
||||
<p>This package provides <strong>PDF-specific implementations</strong> of the core OmniRead
|
||||
contracts defined in <code>omniread.core</code>.</p>
|
||||
<p>Unlike HTML, PDF handling requires an explicit client layer for document
|
||||
access. This package therefore includes:
|
||||
- PDF clients for acquiring raw PDF data
|
||||
- PDF scrapers that coordinate client access
|
||||
- PDF parsers that extract structured content from PDF binaries</p>
|
||||
access. This package therefore includes:</p>
|
||||
<ul>
|
||||
<li>PDF clients for acquiring raw PDF data.</li>
|
||||
<li>PDF scrapers that coordinate client access.</li>
|
||||
<li>PDF parsers that extract structured content from PDF binaries.</li>
|
||||
</ul>
|
||||
<p>Public exports from this package represent the supported PDF pipeline
|
||||
and are safe for consumers to import directly when working with PDFs.</p>
|
||||
<hr />
|
||||
<h4 id="omniread.pdf--public-api">Public API</h4>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span>
|
||||
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>FileSystemPDFClient
|
||||
PDFScraper
|
||||
PDFParser
|
||||
</code></pre></div></td></tr></table></div>
|
||||
<h3 id="omniread.pdf--public-api">Public API</h3>
|
||||
<ul>
|
||||
<li><code>FileSystemPDFClient</code></li>
|
||||
<li><code>PDFScraper</code></li>
|
||||
<li><code>PDFParser</code></li>
|
||||
</ul>
|
||||
<hr />
|
||||
|
||||
|
||||
@@ -951,7 +951,9 @@ PDFParser
|
||||
<details class="notes" open>
|
||||
<summary>Notes</summary>
|
||||
<p><strong>Guarantees:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- This client reads PDF files directly from the disk and returns their raw binary contents
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This client reads PDF files directly from the disk and returns
|
||||
their raw binary contents.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
</details>
|
||||
|
||||
@@ -1093,7 +1095,7 @@ PDFParser
|
||||
|
||||
<div class="doc doc-contents ">
|
||||
<p class="doc doc-class-bases">
|
||||
Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.parser.BaseParser" href="../omniread/core/parser/#omniread.core.parser.BaseParser">BaseParser</a>[<span title="omniread.pdf.parser.T">T</span>]</code>, <code><span title="typing.Generic">Generic</span>[<span title="omniread.pdf.parser.T">T</span>]</code></p>
|
||||
Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.parser.BaseParser" href="../core/parser/#omniread.core.parser.BaseParser">BaseParser</a>[<span title="omniread.pdf.parser.T">T</span>]</code>, <code><span title="typing.Generic">Generic</span>[<span title="omniread.pdf.parser.T">T</span>]</code></p>
|
||||
|
||||
|
||||
<p>Base PDF parser.</p>
|
||||
@@ -1102,10 +1104,14 @@ PDFParser
|
||||
<details class="notes" open>
|
||||
<summary>Notes</summary>
|
||||
<p><strong>Responsibilities:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- This class enforces PDF content-type compatibility and provides the extension point for implementing concrete PDF parsing strategies
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This class enforces PDF content-type compatibility and provides
|
||||
the extension point for implementing concrete PDF parsing strategies.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
<p><strong>Constraints:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Concrete implementations must: Define the output type `T`, implement the `parse()` method
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Concrete implementations must define the output type `T` and
|
||||
implement the `parse()` method.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
</details>
|
||||
<p>Initialize the parser with content to be parsed.</p>
|
||||
@@ -1125,7 +1131,7 @@ PDFParser
|
||||
<tr class="doc-section-item">
|
||||
<td><code>content</code></td>
|
||||
<td>
|
||||
<code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../omniread/core/content/#omniread.core.content.Content">Content</a></code>
|
||||
<code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../core/content/#omniread.core.content.Content">Content</a></code>
|
||||
</td>
|
||||
<td>
|
||||
<div class="doc-md-description">
|
||||
@@ -1268,7 +1274,9 @@ PDFParser
|
||||
<details class="notes" open>
|
||||
<summary>Notes</summary>
|
||||
<p><strong>Responsibilities:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must fully interpret the PDF binary payload and return a deterministic, structured output
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must fully interpret the PDF binary payload and
|
||||
return a deterministic, structured output.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
</details>
|
||||
</div>
|
||||
@@ -1339,7 +1347,7 @@ PDFParser
|
||||
|
||||
<div class="doc doc-contents ">
|
||||
<p class="doc doc-class-bases">
|
||||
Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="../omniread/core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>
|
||||
Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="../core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>
|
||||
|
||||
|
||||
<p>Scraper for PDF sources.</p>
|
||||
@@ -1349,11 +1357,15 @@ PDFParser
|
||||
<summary>Notes</summary>
|
||||
<p><strong>Responsibilities:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Delegates byte retrieval to a PDF client and normalizes output into Content
|
||||
- Preserves caller-provided metadata
|
||||
<span class="normal">2</span>
|
||||
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>- Delegates byte retrieval to a PDF client and normalizes output
|
||||
into `Content`.
|
||||
- Preserves caller-provided metadata.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
<p><strong>Constraints:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper: Does not perform parsing or interpretation, does not assume a specific storage backend
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper does not perform parsing or interpretation.
|
||||
- Does not assume a specific storage backend.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
</details>
|
||||
<p>Initialize the PDF scraper.</p>
|
||||
@@ -1470,7 +1482,7 @@ PDFParser
|
||||
<tbody>
|
||||
<tr class="doc-section-item">
|
||||
<td><code>Content</code></td> <td>
|
||||
<code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../omniread/core/content/#omniread.core.content.Content">Content</a></code>
|
||||
<code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../core/content/#omniread.core.content.Content">Content</a></code>
|
||||
</td>
|
||||
<td>
|
||||
<div class="doc-md-description">
|
||||
|
||||
Reference in New Issue
Block a user