Commit Graph

15 Commits

Author SHA1 Message Date
fc29f49d41 python file to generate docs. useful for pycharm on windows 2026-01-09 15:52:27 +05:30
3d6655084f added doc packages in requirements.txt 2026-01-09 15:52:14 +05:30
5af411020c docs: add mkdocs configuration and API documentation structure
- docs(mkdocs): add mkdocs.yml with material theme and plugin configuration
- docs(mkdocs): configure navigation for core, html, and pdf modules
- docs(docs): add documentation root and homepage
- docs(docs): add core contracts documentation pages
- docs(docs): add html implementation documentation pages
- docs(docs): add pdf implementation documentation pages
- docs(docs): wire mkdocstrings directives for API reference rendering
2026-01-09 15:51:54 +05:30
7f1b0d9c10 docs: add contract-oriented docstrings across core, html, and pdf layers
- docs(core): document Content and ContentType canonical models
- docs(core): define BaseParser contract and parsing semantics
- docs(core): define BaseScraper contract and acquisition semantics
- docs(html): document HTML package purpose and scope
- docs(html): add HTMLParser base with DOM helpers and contracts
- docs(html): add HTTP-based HTMLScraper with content-type enforcement
- docs(pdf): document PDF package structure and public pipeline
- docs(pdf): add BasePDFClient abstraction and filesystem implementation
- docs(pdf): add PDFParser base contract for binary parsing
- docs(pdf): add PDFScraper coordinating client and Content normalization
- docs(api): expand top-level omniread module with install instructions and examples
2026-01-09 15:51:22 +05:30
b2173f3ef0 refactor(tests): use omniread public API instead of internal module imports
- Replace deep imports with top-level omniread exports in tests
- Ensure tests validate only the supported public API surface
- Align HTML and PDF tests with documented library usage
2026-01-02 19:02:20 +05:30
de67c7b0b1 feat(pdf): add PDF client, scraper, parser, and end-to-end tests
- Introduce PDF submodule with client, scraper, and generic parser
- Add filesystem PDF client and test-only mock routing
- Add end-to-end PDF scrape → parse tests with typed output
- Mirror HTML module architecture for consistency
- Expose PDF primitives via omniread public API
2026-01-02 18:59:36 +05:30
390eb22e1b moved html mocks to html sub folder and updated conftest.py to read from new location with better path and endpoint handling 2026-01-02 18:44:26 +05:30
358abc9b36 feat(api): expose core and html primitives via top-level package exports
- Re-export Content and ContentType from omniread.core
- Re-export HTMLScraper and HTMLParser from omniread.html
- Define explicit __all__ for stable public API surface
2026-01-02 18:36:29 +05:30
07293e4651 feat(testing): add end-to-end HTML scraping and parsing tests with typed parsers
- Add smart httpx MockTransport routing based on endpoint paths
- Render HTML fixtures via Jinja templates populated from JSON data
- Introduce explicit, typed HTML parsers for semantic and table-based content
- Add end-to-end tests covering scraper → content → parser → Pydantic models
- Enforce explicit output contracts and avoid default dict-based parsing
2026-01-02 18:31:34 +05:30
fa14a79ec9 simple test case 2026-01-02 18:20:03 +05:30
55245cf241 added validation for content type 2026-01-02 18:19:47 +05:30
202329e190 refactor(html-scraper): normalize Content-Type and inject httpx client
- Inject httpx.Client for testability and reuse
- Validate and normalize Content-Type header before returning Content
- Emit ContentType.HTML instead of raw header strings
- Avoid per-request client creation
- Preserve metadata while allowing caller overrides
2026-01-02 18:08:46 +05:30
f59024ddd5 added pydantic 2026-01-02 18:08:37 +05:30
32ee43e77a omni read basic modules 2025-12-31 14:28:50 +05:30
c0959cb8d1 init commit 2025-12-31 13:00:10 +05:30