updated mcp
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
2026-03-08 17:57:34 +05:30
parent 9191de9dff
commit 0e49f02c4c
167 changed files with 7632 additions and 98942 deletions

View File

@@ -914,20 +914,23 @@
<div class="doc doc-contents first">
<p>HTML scraping implementation for OmniRead.</p>
<hr />
<h4 id="omniread.html.scraper--summary">Summary</h4>
<h3 id="omniread.html.scraper--summary">Summary</h3>
<p>HTML scraping implementation for OmniRead.</p>
<p>This module provides an HTTP-based scraper for retrieving HTML documents.
It implements the core <code>BaseScraper</code> contract using <code>httpx</code> as the transport
layer.</p>
<p>This scraper is responsible for:
- Fetching raw HTML bytes over HTTP(S)
- Validating response content type
- Attaching HTTP metadata to the returned content</p>
<p>This scraper is not responsible for:
- Parsing or interpreting HTML
- Retrying failed requests
- Managing crawl policies or rate limiting</p>
<p>This scraper is responsible for:</p>
<ul>
<li>Fetching raw HTML bytes over HTTP(S)</li>
<li>Validating response content type</li>
<li>Attaching HTTP metadata to the returned content</li>
</ul>
<p>This scraper is not responsible for:</p>
<ul>
<li>Parsing or interpreting HTML</li>
<li>Retrying failed requests</li>
<li>Managing crawl policies or rate limiting</li>
</ul>
@@ -954,21 +957,29 @@ layer.</p>
<div class="doc doc-contents ">
<p class="doc doc-class-bases">
Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="../../omniread/core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>
Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="../../core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>
<p>Base HTML scraper using httpx.</p>
<p>Base HTML scraper using <code>httpx</code>.</p>
<details class="notes" open>
<summary>Notes</summary>
<p><strong>Responsibilities:</strong></p>
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This scraper retrieves HTML documents over HTTP(S) and returns them as raw content wrapped in a `Content` object
- Fetches raw bytes and metadata only. The scraper uses `httpx.Client` for HTTP requests, enforces an HTML content type, preserves HTTP response metadata
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><code>- This scraper retrieves HTML documents over HTTP(S) and returns
them as raw content wrapped in a `Content` object.
- Fetches raw bytes and metadata only.
- The scraper uses `httpx.Client` for HTTP requests, enforces an
HTML content type, and preserves HTTP response metadata.
</code></pre></div></td></tr></table></div>
<p><strong>Constraints:</strong></p>
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper does not: Parse HTML, perform retries or backoff, handle non-HTML responses
<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper does not: Parse HTML, perform retries or backoff,
handle non-HTML responses.
</code></pre></div></td></tr></table></div>
</details>
<p>Initialize the HTML scraper.</p>
@@ -1127,7 +1138,7 @@ layer.</p>
<tbody>
<tr class="doc-section-item">
<td><code>Content</code></td> <td>
<code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../../omniread/core/content/#omniread.core.content.Content">Content</a></code>
<code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../../core/content/#omniread.core.content.Content">Content</a></code>
</td>
<td>
<div class="doc-md-description">