updated mcp

2026-03-08 17:57:34 +05:30
parent 9191de9dff
commit 0e49f02c4c
167 changed files with 7632 additions and 98942 deletions
--- a/libs/omniread/site/index.html
+++ b/libs/omniread/site/index.html
@@ -298,7 +298,8 @@
    </span>
  </a>
  
-</li>
+    <nav class="md-nav" aria-label="Installation">
+      <ul class="md-nav__list">
        
          <li class="md-nav__item">
  <a href="#omniread--quick-start" class="md-nav__link">
@@ -307,6 +308,11 @@
    </span>
  </a>
  
+</li>
+        
+      </ul>
+    </nav>
+  
 </li>
        
          <li class="md-nav__item">
@@ -316,6 +322,15 @@
    </span>
  </a>
  
+</li>
+        
+          <li class="md-nav__item">
+  <a href="#omniread--core-philosophy" class="md-nav__link">
+    <span class="md-ellipsis">
+      Core Philosophy
+    </span>
+  </a>
+  
 </li>
        
          <li class="md-nav__item">
@@ -1237,7 +1252,8 @@
    </span>
  </a>
  
-</li>
+    <nav class="md-nav" aria-label="Installation">
+      <ul class="md-nav__list">
        
          <li class="md-nav__item">
  <a href="#omniread--quick-start" class="md-nav__link">
@@ -1246,6 +1262,11 @@
    </span>
  </a>
  
+</li>
+        
+      </ul>
+    </nav>
+  
 </li>
        
          <li class="md-nav__item">
@@ -1255,6 +1276,15 @@
    </span>
  </a>
  
+</li>
+        
+          <li class="md-nav__item">
+  <a href="#omniread--core-philosophy" class="md-nav__link">
+    <span class="md-ellipsis">
+      Core Philosophy
+    </span>
+  </a>
+  
 </li>
        
          <li class="md-nav__item">
@@ -1746,105 +1776,118 @@

    <div class="doc doc-contents first">

-      <p>OmniRead — format-agnostic content acquisition and parsing framework.</p>
-<hr />
-<h4 id="omniread--summary">Summary</h4>
-<p>OmniRead provides a <strong>cleanly layered architecture</strong> for fetching, parsing,
+      <h3 id="omniread--summary">Summary</h3>
+<p><code>OmniRead</code> — format-agnostic content acquisition and parsing framework.</p>
+<p><code>OmniRead</code> provides a <strong>cleanly layered architecture</strong> for fetching, parsing,
 and normalizing content from heterogeneous sources such as HTML documents
 and PDF files.</p>
 <p>The library is structured around three core concepts:</p>
 <ol>
-<li><strong>Content</strong>: A canonical, format-agnostic container representing raw content bytes and minimal contextual metadata.</li>
-<li><strong>Scrapers</strong>: Components responsible for <em>acquiring</em> raw content from a source (HTTP, filesystem, object storage, etc.). Scrapers never interpret content.</li>
-<li><strong>Parsers</strong>: Components responsible for <em>interpreting</em> acquired content and converting it into structured, typed representations.</li>
+<li><strong><code>Content</code></strong>: A canonical, format-agnostic container representing raw content
+    bytes and minimal contextual metadata.</li>
+<li><strong><code>Scrapers</code></strong>: Components responsible for <em>acquiring</em> raw content from a
+    source (HTTP, filesystem, object storage, etc.). <code>Scrapers</code> never interpret
+    content.</li>
+<li><strong><code>Parsers</code></strong>: Components responsible for <em>interpreting</em> acquired content and
+    converting it into structured, typed representations.</li>
 </ol>
-<p>OmniRead deliberately separates these responsibilities to ensure:
- Clear boundaries between IO and interpretation
- Replaceable implementations per format
- Predictable, testable behavior</p>
-<hr />
-<h4 id="omniread--installation">Installation</h4>
-<p>Install OmniRead using pip:</p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>pip install omniread
-</code></pre></div></td></tr></table></div>
-<p>Or with Poetry:</p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>poetry add omniread
-</code></pre></div></td></tr></table></div>
+<p><code>OmniRead</code> deliberately separates these responsibilities to ensure:</p>
+<ul>
+<li>Clear boundaries between IO and interpretation.</li>
+<li>Replaceable implementations per format.</li>
+<li>Predictable, testable behavior.</li>
+</ul>
+<h3 id="omniread--installation">Installation</h3>
+<p>Install <code>OmniRead</code> using pip:</p>
+<div class="language-bash highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal"><a href="#__codelineno-0-1">1</a></span></pre></div></td><td class="code"><div><pre><span></span><code><span id="__span-0-1"><a id="__codelineno-0-1" name="__codelineno-0-1"></a>pip<span class="w"> </span>install<span class="w"> </span>omniread
+</span></code></pre></div></td></tr></table></div>
+<p>Install OmniRead using Poetry:
+<div class="language-bash highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal"><a href="#__codelineno-1-1">1</a></span></pre></div></td><td class="code"><div><pre><span></span><code><span id="__span-1-1"><a id="__codelineno-1-1" name="__codelineno-1-1"></a>poetry<span class="w"> </span>add<span class="w"> </span>omniread
+</span></code></pre></div></td></tr></table></div></p>
 <hr />
 <h4 id="omniread--quick-start">Quick start</h4>
-<p>HTML example:</p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal"> 1</span>
-<span class="normal"> 2</span>
-<span class="normal"> 3</span>
-<span class="normal"> 4</span>
-<span class="normal"> 5</span>
-<span class="normal"> 6</span>
-<span class="normal"> 7</span>
-<span class="normal"> 8</span>
-<span class="normal"> 9</span>
-<span class="normal">10</span>
-<span class="normal">11</span></pre></div></td><td class="code"><div><pre><span></span><code>from omniread import HTMLScraper, HTMLParser

-scraper = HTMLScraper()
-content = scraper.fetch(&quot;https://example.com&quot;)

-class TitleParser(HTMLParser[str]):
-    def parse(self) -&gt; str:
-        return self._soup.title.string
-
-parser = TitleParser(content)
-title = parser.parse()
-</code></pre></div></td></tr></table></div>
-<p>PDF example:</p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal"> 1</span>
-<span class="normal"> 2</span>
-<span class="normal"> 3</span>
-<span class="normal"> 4</span>
-<span class="normal"> 5</span>
-<span class="normal"> 6</span>
-<span class="normal"> 7</span>
-<span class="normal"> 8</span>
-<span class="normal"> 9</span>
-<span class="normal">10</span>
-<span class="normal">11</span>
-<span class="normal">12</span>
-<span class="normal">13</span>
-<span class="normal">14</span></pre></div></td><td class="code"><div><pre><span></span><code>from omniread import FileSystemPDFClient, PDFScraper, PDFParser
-from pathlib import Path
-
-client = FileSystemPDFClient()
-scraper = PDFScraper(client=client)
-content = scraper.fetch(Path(&quot;document.pdf&quot;))
-
-class TextPDFParser(PDFParser[str]):
-    def parse(self) -&gt; str:
-        # implement PDF text extraction
-        ...
-
-parser = TextPDFParser(content)
-result = parser.parse()
-</code></pre></div></td></tr></table></div>
-<hr />
-<h4 id="omniread--public-api">Public API</h4>
+<details class="example" open>
+  <summary>Example</summary>
+  <p>HTML example:
+    <div class="language-python highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal"><a href="#__codelineno-0-1"> 1</a></span>
+<span class="normal"><a href="#__codelineno-0-2"> 2</a></span>
+<span class="normal"><a href="#__codelineno-0-3"> 3</a></span>
+<span class="normal"><a href="#__codelineno-0-4"> 4</a></span>
+<span class="normal"><a href="#__codelineno-0-5"> 5</a></span>
+<span class="normal"><a href="#__codelineno-0-6"> 6</a></span>
+<span class="normal"><a href="#__codelineno-0-7"> 7</a></span>
+<span class="normal"><a href="#__codelineno-0-8"> 8</a></span>
+<span class="normal"><a href="#__codelineno-0-9"> 9</a></span>
+<span class="normal"><a href="#__codelineno-0-10">10</a></span>
+<span class="normal"><a href="#__codelineno-0-11">11</a></span></pre></div></td><td class="code"><div><pre><span></span><code><span id="__span-0-1"><a id="__codelineno-0-1" name="__codelineno-0-1"></a><span class="kn">from</span><span class="w"> </span><span class="nn">omniread</span><span class="w"> </span><span class="kn">import</span> <span class="n">HTMLScraper</span><span class="p">,</span> <span class="n">HTMLParser</span>
+</span><span id="__span-0-2"><a id="__codelineno-0-2" name="__codelineno-0-2"></a>
+</span><span id="__span-0-3"><a id="__codelineno-0-3" name="__codelineno-0-3"></a><span class="n">scraper</span> <span class="o">=</span> <span class="n">HTMLScraper</span><span class="p">()</span>
+</span><span id="__span-0-4"><a id="__codelineno-0-4" name="__codelineno-0-4"></a><span class="n">content</span> <span class="o">=</span> <span class="n">scraper</span><span class="o">.</span><span class="n">fetch</span><span class="p">(</span><span class="s2">&quot;https://example.com&quot;</span><span class="p">)</span>
+</span><span id="__span-0-5"><a id="__codelineno-0-5" name="__codelineno-0-5"></a>
+</span><span id="__span-0-6"><a id="__codelineno-0-6" name="__codelineno-0-6"></a><span class="k">class</span><span class="w"> </span><span class="nc">TitleParser</span><span class="p">(</span><span class="n">HTMLParser</span><span class="p">[</span><span class="nb">str</span><span class="p">]):</span>
+</span><span id="__span-0-7"><a id="__codelineno-0-7" name="__codelineno-0-7"></a>    <span class="k">def</span><span class="w"> </span><span class="nf">parse</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
+</span><span id="__span-0-8"><a id="__codelineno-0-8" name="__codelineno-0-8"></a>        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_soup</span><span class="o">.</span><span class="n">title</span><span class="o">.</span><span class="n">string</span>
+</span><span id="__span-0-9"><a id="__codelineno-0-9" name="__codelineno-0-9"></a>
+</span><span id="__span-0-10"><a id="__codelineno-0-10" name="__codelineno-0-10"></a><span class="n">parser</span> <span class="o">=</span> <span class="n">TitleParser</span><span class="p">(</span><span class="n">content</span><span class="p">)</span>
+</span><span id="__span-0-11"><a id="__codelineno-0-11" name="__codelineno-0-11"></a><span class="n">title</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse</span><span class="p">()</span>
+</span></code></pre></div></td></tr></table></div></p>
+<p>PDF example:
+    <div class="language-python highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal"><a href="#__codelineno-1-1"> 1</a></span>
+<span class="normal"><a href="#__codelineno-1-2"> 2</a></span>
+<span class="normal"><a href="#__codelineno-1-3"> 3</a></span>
+<span class="normal"><a href="#__codelineno-1-4"> 4</a></span>
+<span class="normal"><a href="#__codelineno-1-5"> 5</a></span>
+<span class="normal"><a href="#__codelineno-1-6"> 6</a></span>
+<span class="normal"><a href="#__codelineno-1-7"> 7</a></span>
+<span class="normal"><a href="#__codelineno-1-8"> 8</a></span>
+<span class="normal"><a href="#__codelineno-1-9"> 9</a></span>
+<span class="normal"><a href="#__codelineno-1-10">10</a></span>
+<span class="normal"><a href="#__codelineno-1-11">11</a></span>
+<span class="normal"><a href="#__codelineno-1-12">12</a></span>
+<span class="normal"><a href="#__codelineno-1-13">13</a></span>
+<span class="normal"><a href="#__codelineno-1-14">14</a></span></pre></div></td><td class="code"><div><pre><span></span><code><span id="__span-1-1"><a id="__codelineno-1-1" name="__codelineno-1-1"></a><span class="kn">from</span><span class="w"> </span><span class="nn">omniread</span><span class="w"> </span><span class="kn">import</span> <span class="n">FileSystemPDFClient</span><span class="p">,</span> <span class="n">PDFScraper</span><span class="p">,</span> <span class="n">PDFParser</span>
+</span><span id="__span-1-2"><a id="__codelineno-1-2" name="__codelineno-1-2"></a><span class="kn">from</span><span class="w"> </span><span class="nn">pathlib</span><span class="w"> </span><span class="kn">import</span> <span class="n">Path</span>
+</span><span id="__span-1-3"><a id="__codelineno-1-3" name="__codelineno-1-3"></a>
+</span><span id="__span-1-4"><a id="__codelineno-1-4" name="__codelineno-1-4"></a><span class="n">client</span> <span class="o">=</span> <span class="n">FileSystemPDFClient</span><span class="p">()</span>
+</span><span id="__span-1-5"><a id="__codelineno-1-5" name="__codelineno-1-5"></a><span class="n">scraper</span> <span class="o">=</span> <span class="n">PDFScraper</span><span class="p">(</span><span class="n">client</span><span class="o">=</span><span class="n">client</span><span class="p">)</span>
+</span><span id="__span-1-6"><a id="__codelineno-1-6" name="__codelineno-1-6"></a><span class="n">content</span> <span class="o">=</span> <span class="n">scraper</span><span class="o">.</span><span class="n">fetch</span><span class="p">(</span><span class="n">Path</span><span class="p">(</span><span class="s2">&quot;document.pdf&quot;</span><span class="p">))</span>
+</span><span id="__span-1-7"><a id="__codelineno-1-7" name="__codelineno-1-7"></a>
+</span><span id="__span-1-8"><a id="__codelineno-1-8" name="__codelineno-1-8"></a><span class="k">class</span><span class="w"> </span><span class="nc">TextPDFParser</span><span class="p">(</span><span class="n">PDFParser</span><span class="p">[</span><span class="nb">str</span><span class="p">]):</span>
+</span><span id="__span-1-9"><a id="__codelineno-1-9" name="__codelineno-1-9"></a>    <span class="k">def</span><span class="w"> </span><span class="nf">parse</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
+</span><span id="__span-1-10"><a id="__codelineno-1-10" name="__codelineno-1-10"></a>        <span class="c1"># implement PDF text extraction</span>
+</span><span id="__span-1-11"><a id="__codelineno-1-11" name="__codelineno-1-11"></a>        <span class="o">...</span>
+</span><span id="__span-1-12"><a id="__codelineno-1-12" name="__codelineno-1-12"></a>
+</span><span id="__span-1-13"><a id="__codelineno-1-13" name="__codelineno-1-13"></a><span class="n">parser</span> <span class="o">=</span> <span class="n">TextPDFParser</span><span class="p">(</span><span class="n">content</span><span class="p">)</span>
+</span><span id="__span-1-14"><a id="__codelineno-1-14" name="__codelineno-1-14"></a><span class="n">result</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse</span><span class="p">()</span>
+</span></code></pre></div></td></tr></table></div></p>
+</details>      <hr />
+<h3 id="omniread--public-api">Public API</h3>
 <p>This module re-exports the <strong>recommended public entry points</strong> of OmniRead.
 Consumers are encouraged to import from this namespace rather than from
 format-specific submodules directly, unless advanced customization is
 required.</p>
-<p><strong>Core:</strong>
- Content
- ContentType</p>
-<p><strong>HTML:</strong>
- HTMLScraper
- HTMLParser</p>
-<p><strong>PDF:</strong>
- FileSystemPDFClient
- PDFScraper
- PDFParser</p>
-<p><strong>Core Philosophy:</strong>
-<code>OmniRead</code> is designed as a <strong>decoupled content engine</strong>:
-1. <strong>Separation of Concerns</strong>: Scrapers <em>fetch</em>, Parsers <em>interpret</em>. Neither knows about the other.
-2. <strong>Normalized Exchange</strong>: All components communicate via the <code>Content</code> model, ensuring a consistent contract.
-3. <strong>Format Agnosticism</strong>: The core logic is independent of whether the input is HTML, PDF, or JSON.</p>
+<ul>
+<li><code>Content</code>: Canonical content model.</li>
+<li><code>ContentType</code>: Supported media types.</li>
+<li><code>HTMLScraper</code>: HTTP-based HTML acquisition.</li>
+<li><code>HTMLParser</code>: Base parser for HTML DOM interpretation.</li>
+<li><code>FileSystemPDFClient</code>: Local filesystem PDF access.</li>
+<li><code>PDFScraper</code>: PDF-specific content acquisition.</li>
+<li><code>PDFParser</code>: Base parser for PDF binary interpretation.</li>
+</ul>
+<hr />
+<h3 id="omniread--core-philosophy">Core Philosophy</h3>
+<p><code>OmniRead</code> is designed as a <strong>decoupled content engine</strong>:</p>
+<ol>
+<li><strong>Separation of Concerns</strong>: Scrapers <em>fetch</em>, Parsers <em>interpret</em>. Neither
+   knows about the other.</li>
+<li><strong>Normalized Exchange</strong>: All components communicate via the <code>Content</code> model,
+   ensuring a consistent contract.</li>
+<li><strong>Format Agnosticism</strong>: The core logic is independent of whether the input
+   is HTML, PDF, or JSON.</li>
+</ol>
 <hr />


@@ -1884,8 +1927,12 @@ required.</p>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
 <div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- A `Content` instance represents a raw content payload along with minimal contextual metadata describing its origin and type
- This class is the primary exchange format between Scrapers, Parsers, and Downstream consumers
+<span class="normal">2</span>
+<span class="normal">3</span>
+<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code>- A `Content` instance represents a raw content payload along with
+  minimal contextual metadata describing its origin and type.
+- This class is the primary exchange format between scrapers,
+  parsers, and downstream consumers.
 </code></pre></div></td></tr></table></div>
 </details>

@@ -2026,8 +2073,12 @@ required.</p>
  <summary>Notes</summary>
  <p><strong>Guarantees:</strong></p>
 <div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This enum represents the declared or inferred media type of the content source
- It is primarily used for routing content to the appropriate parser or downstream consumer
+<span class="normal">2</span>
+<span class="normal">3</span>
+<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code>- This enum represents the declared or inferred media type of the
+  content source.
+- It is primarily used for routing content to the appropriate
+  parser or downstream consumer.
 </code></pre></div></td></tr></table></div>
 </details>

@@ -2169,7 +2220,9 @@ required.</p>
 <details class="notes" open>
  <summary>Notes</summary>
  <p><strong>Guarantees:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- This client reads PDF files directly from the disk and returns their raw binary contents
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This client reads PDF files directly from the disk and returns
+  their raw binary contents.
 </code></pre></div></td></tr></table></div>
 </details>

@@ -2311,7 +2364,7 @@ required.</p>

    <div class="doc doc-contents ">
            <p class="doc doc-class-bases">
-              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.parser.BaseParser" href="omniread/core/parser/#omniread.core.parser.BaseParser">BaseParser</a>[<span title="omniread.html.parser.T">T</span>]</code>, <code><span title="typing.Generic">Generic</span>[<span title="omniread.html.parser.T">T</span>]</code></p>
+              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.parser.BaseParser" href="core/parser/#omniread.core.parser.BaseParser">BaseParser</a>[<span title="omniread.html.parser.T">T</span>]</code>, <code><span title="typing.Generic">Generic</span>[<span title="omniread.html.parser.T">T</span>]</code></p>


      <p>Base HTML parser.</p>
@@ -2321,14 +2374,24 @@ required.</p>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
 <div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This class extends the core `BaseParser` with HTML-specific behavior, including DOM parsing via BeautifulSoup and reusable extraction helpers
- Provides reusable helpers for HTML extraction. Concrete parsers must explicitly define the return type
+<span class="normal">2</span>
+<span class="normal">3</span>
+<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><code>- This class extends the core `BaseParser` with HTML-specific behavior,
+  including DOM parsing via BeautifulSoup and reusable extraction helpers.
+- Provides reusable helpers for HTML extraction. Concrete parsers must
+  explicitly define the return type.
 </code></pre></div></td></tr></table></div>
 <p><strong>Guarantees:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Characteristics: Accepts only HTML content, owns a parsed BeautifulSoup DOM tree, provides pure helper utilities for common HTML structures
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span>
+<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>- Accepts only HTML content.
+- Owns a parsed BeautifulSoup DOM tree.
+- Provides pure helper utilities for common HTML structures.
 </code></pre></div></td></tr></table></div>
 <p><strong>Constraints:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Concrete subclasses must define the output type `T` and implement the `parse()` method
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Concrete subclasses must define the output type `T` and implement
+  the `parse()` method.
 </code></pre></div></td></tr></table></div>
 </details>
      <p>Initialize the HTML parser.</p>
@@ -2348,7 +2411,7 @@ required.</p>
          <tr class="doc-section-item">
            <td><code>content</code></td>
            <td>
-                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="omniread/core/content/#omniread.core.content.Content">Content</a></code>
+                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="core/content/#omniread.core.content.Content">Content</a></code>
            </td>
            <td>
              <div class="doc-md-description">
@@ -2482,7 +2545,9 @@ required.</p>
 <details class="notes" open>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must fully interpret the HTML DOM and return a deterministic, structured output
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must fully interpret the HTML DOM and return a
+  deterministic, structured output.
 </code></pre></div></td></tr></table></div>
 </details>
    </div>
@@ -2698,8 +2763,10 @@ Dictionary containing extracted metadata.</p>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
 <div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Extract high-level metadata from the HTML document
- This includes: Document title, `&lt;meta&gt;` tag name/property → content mappings
+<span class="normal">2</span>
+<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>- Extract high-level metadata from the HTML document.
+- This includes: Document title, `&lt;meta&gt;` tag name/property to
+  content mappings.
 </code></pre></div></td></tr></table></div>
 </details>
    </div>
@@ -2846,21 +2913,29 @@ A list of rows, where each row is a list of cell text values.</p>

    <div class="doc doc-contents ">
            <p class="doc doc-class-bases">
-              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="omniread/core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>
+              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>


-      <p>Base HTML scraper using httpx.</p>
+      <p>Base HTML scraper using <code>httpx</code>.</p>


 <details class="notes" open>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
 <div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This scraper retrieves HTML documents over HTTP(S) and returns them as raw content wrapped in a `Content` object
- Fetches raw bytes and metadata only. The scraper uses `httpx.Client` for HTTP requests, enforces an HTML content type, preserves HTTP response metadata
+<span class="normal">2</span>
+<span class="normal">3</span>
+<span class="normal">4</span>
+<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><code>- This scraper retrieves HTML documents over HTTP(S) and returns
+  them as raw content wrapped in a `Content` object.
+- Fetches raw bytes and metadata only.
+- The scraper uses `httpx.Client` for HTTP requests, enforces an
+  HTML content type, and preserves HTTP response metadata.
 </code></pre></div></td></tr></table></div>
 <p><strong>Constraints:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper does not: Parse HTML, perform retries or backoff, handle non-HTML responses
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper does not: Parse HTML, perform retries or backoff,
+  handle non-HTML responses.
 </code></pre></div></td></tr></table></div>
 </details>
      <p>Initialize the HTML scraper.</p>
@@ -3019,7 +3094,7 @@ A list of rows, where each row is a list of cell text values.</p>
      <tbody>
          <tr class="doc-section-item">
 <td><code>Content</code></td>            <td>
-                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="omniread/core/content/#omniread.core.content.Content">Content</a></code>
+                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="core/content/#omniread.core.content.Content">Content</a></code>
            </td>
            <td>
              <div class="doc-md-description">
@@ -3160,7 +3235,7 @@ A list of rows, where each row is a list of cell text values.</p>

    <div class="doc doc-contents ">
            <p class="doc doc-class-bases">
-              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.parser.BaseParser" href="omniread/core/parser/#omniread.core.parser.BaseParser">BaseParser</a>[<span title="omniread.pdf.parser.T">T</span>]</code>, <code><span title="typing.Generic">Generic</span>[<span title="omniread.pdf.parser.T">T</span>]</code></p>
+              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.parser.BaseParser" href="core/parser/#omniread.core.parser.BaseParser">BaseParser</a>[<span title="omniread.pdf.parser.T">T</span>]</code>, <code><span title="typing.Generic">Generic</span>[<span title="omniread.pdf.parser.T">T</span>]</code></p>


      <p>Base PDF parser.</p>
@@ -3169,10 +3244,14 @@ A list of rows, where each row is a list of cell text values.</p>
 <details class="notes" open>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- This class enforces PDF content-type compatibility and provides the extension point for implementing concrete PDF parsing strategies
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- This class enforces PDF content-type compatibility and provides
+  the extension point for implementing concrete PDF parsing strategies.
 </code></pre></div></td></tr></table></div>
 <p><strong>Constraints:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Concrete implementations must: Define the output type `T`, implement the `parse()` method
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Concrete implementations must define the output type `T` and
+  implement the `parse()` method.
 </code></pre></div></td></tr></table></div>
 </details>
      <p>Initialize the parser with content to be parsed.</p>
@@ -3192,7 +3271,7 @@ A list of rows, where each row is a list of cell text values.</p>
          <tr class="doc-section-item">
            <td><code>content</code></td>
            <td>
-                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="omniread/core/content/#omniread.core.content.Content">Content</a></code>
+                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="core/content/#omniread.core.content.Content">Content</a></code>
            </td>
            <td>
              <div class="doc-md-description">
@@ -3335,7 +3414,9 @@ A list of rows, where each row is a list of cell text values.</p>
 <details class="notes" open>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must fully interpret the PDF binary payload and return a deterministic, structured output
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Implementations must fully interpret the PDF binary payload and
+  return a deterministic, structured output.
 </code></pre></div></td></tr></table></div>
 </details>
    </div>
@@ -3406,7 +3487,7 @@ A list of rows, where each row is a list of cell text values.</p>

    <div class="doc doc-contents ">
            <p class="doc doc-class-bases">
-              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="omniread/core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>
+              Bases: <code><a class="autorefs autorefs-internal" title="omniread.core.scraper.BaseScraper" href="core/scraper/#omniread.core.scraper.BaseScraper">BaseScraper</a></code></p>


      <p>Scraper for PDF sources.</p>
@@ -3416,11 +3497,15 @@ A list of rows, where each row is a list of cell text values.</p>
  <summary>Notes</summary>
  <p><strong>Responsibilities:</strong></p>
 <div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
-<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- Delegates byte retrieval to a PDF client and normalizes output into Content
- Preserves caller-provided metadata
+<span class="normal">2</span>
+<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><code>- Delegates byte retrieval to a PDF client and normalizes output
+  into `Content`.
+- Preserves caller-provided metadata.
 </code></pre></div></td></tr></table></div>
 <p><strong>Constraints:</strong></p>
-<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper: Does not perform parsing or interpretation, does not assume a specific storage backend
+<div class="language-text highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span></span><span class="normal">1</span>
+<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><code>- The scraper does not perform parsing or interpretation.
+- Does not assume a specific storage backend.
 </code></pre></div></td></tr></table></div>
 </details>
      <p>Initialize the PDF scraper.</p>
@@ -3537,7 +3622,7 @@ A list of rows, where each row is a list of cell text values.</p>
      <tbody>
          <tr class="doc-section-item">
 <td><code>Content</code></td>            <td>
-                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="omniread/core/content/#omniread.core.content.Content">Content</a></code>
+                  <code><a class="autorefs autorefs-internal" title="omniread.core.content.Content" href="core/content/#omniread.core.content.Content">Content</a></code>
            </td>
            <td>
              <div class="doc-md-description">
@@ -3590,9 +3675,7 @@ A list of rows, where each row is a list of cell text values.</p>

    </div>

-</div><ul>
-<li><a href="omniread/">Omniread</a></li>
-</ul>
+</div>