This commit is contained in:
@@ -894,9 +894,8 @@
|
||||
|
||||
<div class="doc doc-contents first">
|
||||
|
||||
<p>PDF scraping implementation for OmniRead.</p>
|
||||
<hr />
|
||||
<h4 id="omniread.pdf.scraper--summary">Summary</h4>
|
||||
<h3 id="omniread.pdf.scraper--summary">Summary</h3>
|
||||
<p>PDF scraping implementation for OmniRead.</p>
|
||||
<p>This module provides a PDF-specific scraper that coordinates PDF byte
|
||||
retrieval via a client and normalizes the result into a <code>Content</code> object.</p>
|
||||
<p>The scraper implements the core <code>BaseScraper</code> contract while delegating
|
||||
@@ -927,7 +926,7 @@ all storage and access concerns to a <code>BasePDFClient</code> implementation.<
|
||||
|
||||
<div class="doc doc-contents ">
|
||||
<p class="doc doc-class-bases">
|
||||
Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="../../omniread/core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>
|
||||
Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="../../core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>
|
||||
|
||||
|
||||
<p>Scraper for PDF sources.</p>
|
||||
@@ -937,11 +936,15 @@ all storage and access concerns to a <code>BasePDFClient</code> implementation.<
|
||||
<summary>Notes</summary>
|
||||
<p><strong>Responsibilities:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Delegates byte retrieval to a PDF client and normalizes output into Content
|
||||
- Preserves caller-provided metadata
|
||||
<span class="normal">2</span>
|
||||
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>- Delegates byte retrieval to a PDF client and normalizes output
|
||||
into `Content`.
|
||||
- Preserves caller-provided metadata.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
<p><strong>Constraints:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper: Does not perform parsing or interpretation, does not assume a specific storage backend
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper does not perform parsing or interpretation.
|
||||
- Does not assume a specific storage backend.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
</details>
|
||||
<p>Initialize the PDF scraper.</p>
|
||||
@@ -1058,7 +1061,7 @@ all storage and access concerns to a <code>BasePDFClient</code> implementation.<
|
||||
<tbody>
|
||||
<tr class="doc-section-item">
|
||||
<td><code>Content</code></td> <td>
|
||||
<code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../../omniread/core/content/#omniread.core.content.Content">Content</a></code>
|
||||
<code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../../core/content/#omniread.core.content.Content">Content</a></code>
|
||||
</td>
|
||||
<td>
|
||||
<div class="doc-md-description">
|
||||
|
||||
Reference in New Issue
Block a user