feat(pdf): add PDF client, scraper, parser, and end-to-end tests
- Introduce PDF submodule with client, scraper, and generic parser - Add filesystem PDF client and test-only mock routing - Add end-to-end PDF scrape → parse tests with typed output - Mirror HTML module architecture for consistency - Expose PDF primitives via omniread public API
This commit is contained in:
32
tests/mocks/pdf/simple.pdf
Normal file
32
tests/mocks/pdf/simple.pdf
Normal file
@@ -0,0 +1,32 @@
|
||||
%PDF-1.4
|
||||
1 0 obj
|
||||
<< /Type /Catalog /Pages 2 0 R >>
|
||||
endobj
|
||||
2 0 obj
|
||||
<< /Type /Pages /Kids [3 0 R] /Count 1 >>
|
||||
endobj
|
||||
3 0 obj
|
||||
<< /Type /Page /Parent 2 0 R /MediaBox [0 0 612 792] /Contents 4 0 R >>
|
||||
endobj
|
||||
4 0 obj
|
||||
<< /Length 44 >>
|
||||
stream
|
||||
BT
|
||||
/F1 12 Tf
|
||||
72 720 Td
|
||||
(Simple PDF Test) Tj
|
||||
ET
|
||||
endstream
|
||||
endobj
|
||||
xref
|
||||
0 5
|
||||
0000000000 65535 f
|
||||
0000000010 00000 n
|
||||
0000000061 00000 n
|
||||
0000000116 00000 n
|
||||
0000000203 00000 n
|
||||
trailer
|
||||
<< /Size 5 /Root 1 0 R >>
|
||||
startxref
|
||||
300
|
||||
%%EOF
|
||||
Reference in New Issue
Block a user