omniread

Author	SHA1	Message	Date
Vishesh 'ironeagle' Bangotra	7f1b0d9c10	docs: add contract-oriented docstrings across core, html, and pdf layers - docs(core): document Content and ContentType canonical models - docs(core): define BaseParser contract and parsing semantics - docs(core): define BaseScraper contract and acquisition semantics - docs(html): document HTML package purpose and scope - docs(html): add HTMLParser base with DOM helpers and contracts - docs(html): add HTTP-based HTMLScraper with content-type enforcement - docs(pdf): document PDF package structure and public pipeline - docs(pdf): add BasePDFClient abstraction and filesystem implementation - docs(pdf): add PDFParser base contract for binary parsing - docs(pdf): add PDFScraper coordinating client and Content normalization - docs(api): expand top-level omniread module with install instructions and examples	2026-01-09 15:51:22 +05:30
Vishesh 'ironeagle' Bangotra	b2173f3ef0	refactor(tests): use omniread public API instead of internal module imports - Replace deep imports with top-level omniread exports in tests - Ensure tests validate only the supported public API surface - Align HTML and PDF tests with documented library usage	2026-01-02 19:02:20 +05:30
Vishesh 'ironeagle' Bangotra	de67c7b0b1	feat(pdf): add PDF client, scraper, parser, and end-to-end tests - Introduce PDF submodule with client, scraper, and generic parser - Add filesystem PDF client and test-only mock routing - Add end-to-end PDF scrape → parse tests with typed output - Mirror HTML module architecture for consistency - Expose PDF primitives via omniread public API	2026-01-02 18:59:36 +05:30
Vishesh 'ironeagle' Bangotra	390eb22e1b	moved html mocks to html sub folder and updated conftest.py to read from new location with better path and endpoint handling	2026-01-02 18:44:26 +05:30
Vishesh 'ironeagle' Bangotra	358abc9b36	feat(api): expose core and html primitives via top-level package exports - Re-export Content and ContentType from omniread.core - Re-export HTMLScraper and HTMLParser from omniread.html - Define explicit __all__ for stable public API surface	2026-01-02 18:36:29 +05:30
Vishesh 'ironeagle' Bangotra	07293e4651	feat(testing): add end-to-end HTML scraping and parsing tests with typed parsers - Add smart httpx MockTransport routing based on endpoint paths - Render HTML fixtures via Jinja templates populated from JSON data - Introduce explicit, typed HTML parsers for semantic and table-based content - Add end-to-end tests covering scraper → content → parser → Pydantic models - Enforce explicit output contracts and avoid default dict-based parsing	2026-01-02 18:31:34 +05:30
Vishesh 'ironeagle' Bangotra	fa14a79ec9	simple test case	2026-01-02 18:20:03 +05:30
Vishesh 'ironeagle' Bangotra	55245cf241	added validation for content type	2026-01-02 18:19:47 +05:30
Vishesh 'ironeagle' Bangotra	202329e190	refactor(html-scraper): normalize Content-Type and inject httpx client - Inject httpx.Client for testability and reuse - Validate and normalize Content-Type header before returning Content - Emit ContentType.HTML instead of raw header strings - Avoid per-request client creation - Preserve metadata while allowing caller overrides	2026-01-02 18:08:46 +05:30
Vishesh 'ironeagle' Bangotra	f59024ddd5	added pydantic	2026-01-02 18:08:37 +05:30
Vishesh 'ironeagle' Bangotra	32ee43e77a	omni read basic modules	2025-12-31 14:28:50 +05:30
Vishesh 'ironeagle' Bangotra	c0959cb8d1	init commit	2025-12-31 13:00:10 +05:30

12 Commits