updated mcp

2026-03-08 17:57:34 +05:30
parent 9191de9dff
commit 0e49f02c4c
167 changed files with 7632 additions and 98942 deletions
--- a/libs/omniread/site/html/index.html
+++ b/libs/omniread/site/html/index.html
@@ -902,26 +902,29 @@

    <div class="doc doc-contents first">

-      <p>HTML format implementation for OmniRead.</p>
-<hr />
-<h4 id="omniread.html--summary">Summary</h4>
+      <h3 id="omniread.html--summary">Summary</h3>
+<p>HTML format implementation for OmniRead.</p>
 <p>This package provides <strong>HTML-specific implementations</strong> of the core OmniRead
 contracts defined in <code>omniread.core</code>.</p>
-<p>It includes:
- HTML parsers that interpret HTML content
- HTML scrapers that retrieve HTML documents</p>
-<p>This package:
- Implements, but does not redefine, core contracts
- May contain HTML-specific behavior and edge-case handling
- Produces canonical content models defined in <code>omniread.core.content</code></p>
+<p>It includes:</p>
+<ul>
+<li>HTML parsers that interpret HTML content.</li>
+<li>HTML scrapers that retrieve HTML documents.</li>
+</ul>
+<p>Key characteristics:</p>
+<ul>
+<li>Implements, but does not redefine, core contracts.</li>
+<li>May contain HTML-specific behavior and edge-case handling.</li>
+<li>Produces canonical content models defined in <code>omniread.core.content</code>.</li>
+</ul>
 <p>Consumers should depend on <code>omniread.core</code> interfaces wherever possible and
 use this package only when HTML-specific behavior is required.</p>
 <hr />
-<h4 id="omniread.html--public-api">Public API</h4>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>HTMLScraper
-HTMLParser
-</code></pre></div></td></tr></table></div>
+<h3 id="omniread.html--public-api">Public API</h3>
+<ul>
+<li><code>HTMLScraper</code></li>
+<li><code>HTMLParser</code></li>
+</ul>
 <hr />


@@ -949,7 +952,7 @@ HTMLParser

    <div class="doc doc-contents ">
            <p class="doc doc-class-bases">
-              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.parser.BaseParser" href="../omniread/core/parser/#omniread.core.parser.BaseParser">BaseParser</a>[<span title="omniread.html.parser.T">T</span>]</code>, <code><span title="typing.Generic">Generic</span>[<span title="omniread.html.parser.T">T</span>]</code></p>
+              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.parser.BaseParser" href="../core/parser/#omniread.core.parser.BaseParser">BaseParser</a>[<span title="omniread.html.parser.T">T</span>]</code>, <code><span title="typing.Generic">Generic</span>[<span title="omniread.html.parser.T">T</span>]</code></p>


      <p>Base HTML parser.</p>
@@ -959,14 +962,24 @@ HTMLParser
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
 <div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This class extends the core `BaseParser` with HTML-specific behavior, including DOM parsing via BeautifulSoup and reusable extraction helpers
- Provides reusable helpers for HTML extraction. Concrete parsers must explicitly define the return type
+<span class="normal">2</span>
+<span class="normal">3</span>
+<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code>- This class extends the core `BaseParser` with HTML-specific behavior,
+  including DOM parsing via BeautifulSoup and reusable extraction helpers.
+- Provides reusable helpers for HTML extraction. Concrete parsers must
+  explicitly define the return type.
 </code></pre></div></td></tr></table></div>
 <p><strong>Guarantees:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Characteristics: Accepts only HTML content, owns a parsed BeautifulSoup DOM tree, provides pure helper utilities for common HTML structures
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span>
+<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>- Accepts only HTML content.
+- Owns a parsed BeautifulSoup DOM tree.
+- Provides pure helper utilities for common HTML structures.
 </code></pre></div></td></tr></table></div>
 <p><strong>Constraints:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Concrete subclasses must define the output type `T` and implement the `parse()` method
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Concrete subclasses must define the output type `T` and implement
+  the `parse()` method.
 </code></pre></div></td></tr></table></div>
 </details>
      <p>Initialize the HTML parser.</p>
@@ -986,7 +999,7 @@ HTMLParser
          <tr class="doc-section-item">
            <td><code>content</code></td>
            <td>
-                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../omniread/core/content/#omniread.core.content.Content">Content</a></code>
+                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../core/content/#omniread.core.content.Content">Content</a></code>
            </td>
            <td>
              <div class="doc-md-description">
@@ -1120,7 +1133,9 @@ HTMLParser
 <details class="notes" open>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must fully interpret the HTML DOM and return a deterministic, structured output
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must fully interpret the HTML DOM and return a
+  deterministic, structured output.
 </code></pre></div></td></tr></table></div>
 </details>
    </div>
@@ -1336,8 +1351,10 @@ Dictionary containing extracted metadata.</p>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
 <div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Extract high-level metadata from the HTML document
- This includes: Document title, `&lt;meta&gt;` tag name/property → content mappings
+<span class="normal">2</span>
+<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>- Extract high-level metadata from the HTML document.
+- This includes: Document title, `&lt;meta&gt;` tag name/property to
+  content mappings.
 </code></pre></div></td></tr></table></div>
 </details>
    </div>
@@ -1484,21 +1501,29 @@ A list of rows, where each row is a list of cell text values.</p>

    <div class="doc doc-contents ">
            <p class="doc doc-class-bases">
-              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="../omniread/core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>
+              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="../core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>


-      <p>Base HTML scraper using httpx.</p>
+      <p>Base HTML scraper using <code>httpx</code>.</p>


 <details class="notes" open>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
 <div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This scraper retrieves HTML documents over HTTP(S) and returns them as raw content wrapped in a `Content` object
- Fetches raw bytes and metadata only. The scraper uses `httpx.Client` for HTTP requests, enforces an HTML content type, preserves HTTP response metadata
+<span class="normal">2</span>
+<span class="normal">3</span>
+<span class="normal">4</span>
+<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><code>- This scraper retrieves HTML documents over HTTP(S) and returns
+  them as raw content wrapped in a `Content` object.
+- Fetches raw bytes and metadata only.
+- The scraper uses `httpx.Client` for HTTP requests, enforces an
+  HTML content type, and preserves HTTP response metadata.
 </code></pre></div></td></tr></table></div>
 <p><strong>Constraints:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper does not: Parse HTML, perform retries or backoff, handle non-HTML responses
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper does not: Parse HTML, perform retries or backoff,
+  handle non-HTML responses.
 </code></pre></div></td></tr></table></div>
 </details>
      <p>Initialize the HTML scraper.</p>
@@ -1657,7 +1682,7 @@ A list of rows, where each row is a list of cell text values.</p>
      <tbody>
          <tr class="doc-section-item">
 <td><code>Content</code></td>            <td>
-                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../omniread/core/content/#omniread.core.content.Content">Content</a></code>
+                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../core/content/#omniread.core.content.Content">Content</a></code>
            </td>
            <td>
              <div class="doc-md-description">
--- a/libs/omniread/site/html/parser/index.html
+++ b/libs/omniread/site/html/parser/index.html
@@ -1034,15 +1034,16 @@

    <div class="doc doc-contents first">

-      <p>HTML parser base implementations for OmniRead.</p>
-<hr />
-<h4 id="omniread.html.parser--summary">Summary</h4>
+      <h3 id="omniread.html.parser--summary">Summary</h3>
+<p>HTML parser base implementations for OmniRead.</p>
 <p>This module provides reusable HTML parsing utilities built on top of
 the abstract parser contracts defined in <code>omniread.core.parser</code>.</p>
-<p>It supplies:
- Content-type enforcement for HTML inputs
- BeautifulSoup initialization and lifecycle management
- Common helper methods for extracting structured data from HTML elements</p>
+<p>It supplies:</p>
+<ul>
+<li>Content-type enforcement for HTML inputs</li>
+<li>BeautifulSoup initialization and lifecycle management</li>
+<li>Common helper methods for extracting structured data from HTML elements</li>
+</ul>
 <p>Concrete parsers must subclass <code>HTMLParser</code> and implement the <code>parse()</code> method
 to return a structured representation appropriate for their use case.</p>

@@ -1071,7 +1072,7 @@ to return a structured representation appropriate for their use case.</p>

    <div class="doc doc-contents ">
            <p class="doc doc-class-bases">
-              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.parser.BaseParser" href="../../omniread/core/parser/#omniread.core.parser.BaseParser">BaseParser</a>[<span title="omniread.html.parser.T">T</span>]</code>, <code><span title="typing.Generic">Generic</span>[<span title="omniread.html.parser.T">T</span>]</code></p>
+              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.parser.BaseParser" href="../../core/parser/#omniread.core.parser.BaseParser">BaseParser</a>[<span title="omniread.html.parser.T">T</span>]</code>, <code><span title="typing.Generic">Generic</span>[<span title="omniread.html.parser.T">T</span>]</code></p>


      <p>Base HTML parser.</p>
@@ -1081,14 +1082,24 @@ to return a structured representation appropriate for their use case.</p>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
 <div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This class extends the core `BaseParser` with HTML-specific behavior, including DOM parsing via BeautifulSoup and reusable extraction helpers
- Provides reusable helpers for HTML extraction. Concrete parsers must explicitly define the return type
+<span class="normal">2</span>
+<span class="normal">3</span>
+<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code>- This class extends the core `BaseParser` with HTML-specific behavior,
+  including DOM parsing via BeautifulSoup and reusable extraction helpers.
+- Provides reusable helpers for HTML extraction. Concrete parsers must
+  explicitly define the return type.
 </code></pre></div></td></tr></table></div>
 <p><strong>Guarantees:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Characteristics: Accepts only HTML content, owns a parsed BeautifulSoup DOM tree, provides pure helper utilities for common HTML structures
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span>
+<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>- Accepts only HTML content.
+- Owns a parsed BeautifulSoup DOM tree.
+- Provides pure helper utilities for common HTML structures.
 </code></pre></div></td></tr></table></div>
 <p><strong>Constraints:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Concrete subclasses must define the output type `T` and implement the `parse()` method
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Concrete subclasses must define the output type `T` and implement
+  the `parse()` method.
 </code></pre></div></td></tr></table></div>
 </details>
      <p>Initialize the HTML parser.</p>
@@ -1108,7 +1119,7 @@ to return a structured representation appropriate for their use case.</p>
          <tr class="doc-section-item">
            <td><code>content</code></td>
            <td>
-                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../../omniread/core/content/#omniread.core.content.Content">Content</a></code>
+                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../../core/content/#omniread.core.content.Content">Content</a></code>
            </td>
            <td>
              <div class="doc-md-description">
@@ -1242,7 +1253,9 @@ to return a structured representation appropriate for their use case.</p>
 <details class="notes" open>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must fully interpret the HTML DOM and return a deterministic, structured output
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must fully interpret the HTML DOM and return a
+  deterministic, structured output.
 </code></pre></div></td></tr></table></div>
 </details>
    </div>
@@ -1458,8 +1471,10 @@ Dictionary containing extracted metadata.</p>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
 <div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Extract high-level metadata from the HTML document
- This includes: Document title, `&lt;meta&gt;` tag name/property → content mappings
+<span class="normal">2</span>
+<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>- Extract high-level metadata from the HTML document.
+- This includes: Document title, `&lt;meta&gt;` tag name/property to
+  content mappings.
 </code></pre></div></td></tr></table></div>
 </details>
    </div>
--- a/libs/omniread/site/html/scraper/index.html
+++ b/libs/omniread/site/html/scraper/index.html
@@ -914,20 +914,23 @@

    <div class="doc doc-contents first">

-      <p>HTML scraping implementation for OmniRead.</p>
-<hr />
-<h4 id="omniread.html.scraper--summary">Summary</h4>
+      <h3 id="omniread.html.scraper--summary">Summary</h3>
+<p>HTML scraping implementation for OmniRead.</p>
 <p>This module provides an HTTP-based scraper for retrieving HTML documents.
 It implements the core <code>BaseScraper</code> contract using <code>httpx</code> as the transport
 layer.</p>
-<p>This scraper is responsible for:
- Fetching raw HTML bytes over HTTP(S)
- Validating response content type
- Attaching HTTP metadata to the returned content</p>
-<p>This scraper is not responsible for:
- Parsing or interpreting HTML
- Retrying failed requests
- Managing crawl policies or rate limiting</p>
+<p>This scraper is responsible for:</p>
+<ul>
+<li>Fetching raw HTML bytes over HTTP(S)</li>
+<li>Validating response content type</li>
+<li>Attaching HTTP metadata to the returned content</li>
+</ul>
+<p>This scraper is not responsible for:</p>
+<ul>
+<li>Parsing or interpreting HTML</li>
+<li>Retrying failed requests</li>
+<li>Managing crawl policies or rate limiting</li>
+</ul>



@@ -954,21 +957,29 @@ layer.</p>

    <div class="doc doc-contents ">
            <p class="doc doc-class-bases">
-              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="../../omniread/core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>
+              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="../../core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>


-      <p>Base HTML scraper using httpx.</p>
+      <p>Base HTML scraper using <code>httpx</code>.</p>


 <details class="notes" open>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
 <div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This scraper retrieves HTML documents over HTTP(S) and returns them as raw content wrapped in a `Content` object
- Fetches raw bytes and metadata only. The scraper uses `httpx.Client` for HTTP requests, enforces an HTML content type, preserves HTTP response metadata
+<span class="normal">2</span>
+<span class="normal">3</span>
+<span class="normal">4</span>
+<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><code>- This scraper retrieves HTML documents over HTTP(S) and returns
+  them as raw content wrapped in a `Content` object.
+- Fetches raw bytes and metadata only.
+- The scraper uses `httpx.Client` for HTTP requests, enforces an
+  HTML content type, and preserves HTTP response metadata.
 </code></pre></div></td></tr></table></div>
 <p><strong>Constraints:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper does not: Parse HTML, perform retries or backoff, handle non-HTML responses
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper does not: Parse HTML, perform retries or backoff,
+  handle non-HTML responses.
 </code></pre></div></td></tr></table></div>
 </details>
      <p>Initialize the HTML scraper.</p>
@@ -1127,7 +1138,7 @@ layer.</p>
      <tbody>
          <tr class="doc-section-item">
 <td><code>Content</code></td>            <td>
-                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../../omniread/core/content/#omniread.core.content.Content">Content</a></code>
+                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="../../core/content/#omniread.core.content.Content">Content</a></code>
            </td>
            <td>
              <div class="doc-md-description">