This commit is contained in:
@@ -989,25 +989,26 @@
|
||||
|
||||
<div class="doc doc-contents first">
|
||||
|
||||
<p>Core domain contracts for OmniRead.</p>
|
||||
<hr />
|
||||
<h4 id="omniread.core--summary">Summary</h4>
|
||||
<h3 id="omniread.core--summary">Summary</h3>
|
||||
<p>Core domain contracts for OmniRead.</p>
|
||||
<p>This package defines the <strong>format-agnostic domain layer</strong> of OmniRead.
|
||||
It exposes canonical content models and abstract interfaces that are
|
||||
implemented by format-specific modules (HTML, PDF, etc.).</p>
|
||||
<p>Public exports from this package are considered <strong>stable contracts</strong> and
|
||||
are safe for downstream consumers to depend on.</p>
|
||||
<p>Submodules:
|
||||
- content: Canonical content models and enums
|
||||
- parser: Abstract parsing contracts
|
||||
- scraper: Abstract scraping contracts</p>
|
||||
<p>Submodules:</p>
|
||||
<ul>
|
||||
<li><code>content</code>: Canonical content models and enums.</li>
|
||||
<li><code>parser</code>: Abstract parsing contracts.</li>
|
||||
<li><code>scraper</code>: Abstract scraping contracts.</li>
|
||||
</ul>
|
||||
<p>Format-specific behavior must not be introduced at this layer.</p>
|
||||
<hr />
|
||||
<h4 id="omniread.core--public-api">Public API</h4>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>Content
|
||||
ContentType
|
||||
</code></pre></div></td></tr></table></div>
|
||||
<h3 id="omniread.core--public-api">Public API</h3>
|
||||
<ul>
|
||||
<li><code>Content</code></li>
|
||||
<li><code>ContentType</code></li>
|
||||
</ul>
|
||||
<hr />
|
||||
|
||||
|
||||
@@ -1045,15 +1046,19 @@ ContentType
|
||||
<summary>Notes</summary>
|
||||
<p><strong>Guarantees:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- A parser is a self-contained object that owns the Content it is responsible for interpreting
|
||||
- Consumers may rely on early validation of content compatibility and type-stable return values from `parse()`
|
||||
<span class="normal">2</span>
|
||||
<span class="normal">3</span>
|
||||
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code>- A parser is a self-contained object that owns the `Content` it is
|
||||
responsible for interpreting.
|
||||
- Consumers may rely on early validation of content compatibility
|
||||
and type-stable return values from `parse()`.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
<p><strong>Responsibilities:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span>
|
||||
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must declare supported content types via `supported_types`
|
||||
- Implementations must raise parsing-specific exceptions from `parse()`
|
||||
- Implementations must remain deterministic for a given input
|
||||
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must declare supported content types via `supported_types`.
|
||||
- Implementations must raise parsing-specific exceptions from `parse()`.
|
||||
- Implementations must remain deterministic for a given input.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
</details>
|
||||
<p>Initialize the parser with content to be parsed.</p>
|
||||
@@ -1073,7 +1078,7 @@ ContentType
|
||||
<tr class="doc-section-item">
|
||||
<td><code>content</code></td>
|
||||
<td>
|
||||
<code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../omniread/core/content/#omniread.core.content.Content">Content</a></code>
|
||||
<code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="content/#omniread.core.content.Content">Content</a></code>
|
||||
</td>
|
||||
<td>
|
||||
<div class="doc-md-description">
|
||||
@@ -1216,7 +1221,9 @@ ContentType
|
||||
<details class="notes" open>
|
||||
<summary>Notes</summary>
|
||||
<p><strong>Responsibilities:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must fully consume the provided content and return a deterministic, structured output
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must fully consume the provided content and
|
||||
return a deterministic, structured output.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
</details>
|
||||
</div>
|
||||
@@ -1298,13 +1305,21 @@ ContentType
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span>
|
||||
<span class="normal">3</span>
|
||||
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code>- A scraper is responsible ONLY for fetching raw content (bytes) from a source. It must not interpret or parse it
|
||||
- A scraper is a stateless acquisition component that retrieves raw content from a source and returns it as a `Content` object
|
||||
- Scrapers define how content is obtained, not what the content means
|
||||
- Implementations may vary in transport mechanism, authentication strategy, retry and backoff behavior
|
||||
<span class="normal">4</span>
|
||||
<span class="normal">5</span>
|
||||
<span class="normal">6</span>
|
||||
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><code>- A scraper is responsible ONLY for fetching raw content (bytes)
|
||||
from a source. It must not interpret or parse it.
|
||||
- A scraper is a stateless acquisition component that retrieves raw
|
||||
content from a source and returns it as a `Content` object.
|
||||
- Scrapers define how content is obtained, not what the content means.
|
||||
- Implementations may vary in transport mechanism, authentication
|
||||
strategy, retry and backoff behavior.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
<p><strong>Constraints:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must not parse content, modify content semantics, or couple scraping logic to a specific parser
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must not parse content, modify content semantics,
|
||||
or couple scraping logic to a specific parser.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
</details>
|
||||
|
||||
@@ -1358,7 +1373,7 @@ ContentType
|
||||
</td>
|
||||
<td>
|
||||
<div class="doc-md-description">
|
||||
<p>Location identifier (URL, file path, S3 URI, etc.)</p>
|
||||
<p>Location identifier (URL, file path, S3 URI, etc.).</p>
|
||||
</div>
|
||||
</td>
|
||||
<td>
|
||||
@@ -1372,7 +1387,7 @@ ContentType
|
||||
</td>
|
||||
<td>
|
||||
<div class="doc-md-description">
|
||||
<p>Optional hints for the scraper (headers, auth, etc.)</p>
|
||||
<p>Optional hints for the scraper (headers, auth, etc.).</p>
|
||||
</div>
|
||||
</td>
|
||||
<td>
|
||||
@@ -1394,7 +1409,7 @@ ContentType
|
||||
<tbody>
|
||||
<tr class="doc-section-item">
|
||||
<td><code>Content</code></td> <td>
|
||||
<code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../omniread/core/content/#omniread.core.content.Content">Content</a></code>
|
||||
<code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="content/#omniread.core.content.Content">Content</a></code>
|
||||
</td>
|
||||
<td>
|
||||
<div class="doc-md-description">
|
||||
@@ -1432,7 +1447,9 @@ ContentType
|
||||
<details class="notes" open>
|
||||
<summary>Notes</summary>
|
||||
<p><strong>Responsibilities:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must retrieve the content referenced by `source` and return it as raw bytes wrapped in a `Content` object
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must retrieve the content referenced by `source`
|
||||
and return it as raw bytes wrapped in a `Content` object.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
</details>
|
||||
</div>
|
||||
@@ -1473,8 +1490,12 @@ ContentType
|
||||
<summary>Notes</summary>
|
||||
<p><strong>Responsibilities:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- A `Content` instance represents a raw content payload along with minimal contextual metadata describing its origin and type
|
||||
- This class is the primary exchange format between Scrapers, Parsers, and Downstream consumers
|
||||
<span class="normal">2</span>
|
||||
<span class="normal">3</span>
|
||||
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code>- A `Content` instance represents a raw content payload along with
|
||||
minimal contextual metadata describing its origin and type.
|
||||
- This class is the primary exchange format between scrapers,
|
||||
parsers, and downstream consumers.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
</details>
|
||||
|
||||
@@ -1615,8 +1636,12 @@ ContentType
|
||||
<summary>Notes</summary>
|
||||
<p><strong>Guarantees:</strong></p>
|
||||
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
|
||||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This enum represents the declared or inferred media type of the content source
|
||||
- It is primarily used for routing content to the appropriate parser or downstream consumer
|
||||
<span class="normal">2</span>
|
||||
<span class="normal">3</span>
|
||||
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code>- This enum represents the declared or inferred media type of the
|
||||
content source.
|
||||
- It is primarily used for routing content to the appropriate
|
||||
parser or downstream consumer.
|
||||
</code></pre></div></td></tr></table></div>
|
||||
</details>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user