updated docs strings and added README.md

This commit is contained in:
2026-03-08 17:59:53 +05:30
parent 0453fdd88a
commit c541577788
46 changed files with 863 additions and 681 deletions

View File

@@ -1,10 +1,8 @@
"""
# Summary
Mail Intake — provider-agnostic, read-only email ingestion framework.
---
## Summary
Mail Intake is a **contract-first library** designed to ingest, parse, and
normalize email data from external providers (such as Gmail) into clean,
provider-agnostic domain models.
@@ -12,109 +10,126 @@ provider-agnostic domain models.
The library is intentionally structured around clear layers, each exposed
as a first-class module at the package root:
- adapters: provider-specific access (e.g. Gmail)
- auth: authentication providers and credential lifecycle management
- credentials: credential persistence abstractions and implementations
- parsers: extraction and normalization of message content
- ingestion: orchestration and high-level ingestion workflows
- models: canonical, provider-agnostic data representations
- config: explicit global configuration
- exceptions: library-defined error hierarchy
- `adapters`: Provider-specific access (e.g., Gmail).
- `auth`: Authentication providers and credential lifecycle management.
- `credentials`: Credential persistence abstractions and implementations.
- `parsers`: Extraction and normalization of message content.
- `ingestion`: Orchestration and high-level ingestion workflows.
- `models`: Canonical, provider-agnostic data representations.
- `config`: Explicit global configuration.
- `exceptions`: Library-defined error hierarchy.
The package root acts as a **namespace**, not a facade. Consumers are
expected to import functionality explicitly from the appropriate module.
---
## Installation
# Installation
Install using pip:
pip install mail-intake
```bash
pip install mail-intake
```
Or with Poetry:
poetry add mail-intake
```bash
poetry add mail-intake
```
Mail Intake is pure Python and has no runtime dependencies beyond those
required by the selected provider (for example, Google APIs for Gmail).
---
## Quick start
# Quick Start
Minimal Gmail ingestion example (local development):
from mail_intake.ingestion import MailIntakeReader
from mail_intake.adapters import MailIntakeGmailAdapter
from mail_intake.auth import MailIntakeGoogleAuth
from mail_intake.credentials import PickleCredentialStore
```python
from mail_intake.ingestion import MailIntakeReader
from mail_intake.adapters import MailIntakeGmailAdapter
from mail_intake.auth import MailIntakeGoogleAuth
from mail_intake.credentials import PickleCredentialStore
store = PickleCredentialStore(path="token.pickle")
store = PickleCredentialStore(path="token.pickle")
auth = MailIntakeGoogleAuth(
credentials_path="credentials.json",
store=store,
scopes=["https://www.googleapis.com/auth/gmail.readonly"],
)
auth = MailIntakeGoogleAuth(
credentials_path="credentials.json",
store=store,
scopes=["https://www.googleapis.com/auth/gmail.readonly"],
)
adapter = MailIntakeGmailAdapter(auth_provider=auth)
reader = MailIntakeReader(adapter)
adapter = MailIntakeGmailAdapter(auth_provider=auth)
reader = MailIntakeReader(adapter)
for message in reader.iter_messages("from:recruiter@example.com"):
print(message.subject, message.from_email)
for message in reader.iter_messages("from:recruiter@example.com"):
print(message.subject, message.from_email)
```
Iterating over threads:
for thread in reader.iter_threads("subject:Interview"):
print(thread.normalized_subject, len(thread.messages))
```python
for thread in reader.iter_threads("subject:Interview"):
print(thread.normalized_subject, len(thread.messages))
```
---
## Architecture
# Architecture
Mail Intake is designed to be extensible via **public contracts** exposed
through its modules:
- Users MAY implement their own mail adapters by subclassing ``adapters.MailIntakeAdapter``
- Users MAY implement their own authentication providers by subclassing ``auth.MailIntakeAuthProvider[T]``
- Users MAY implement their own credential persistence layers by implementing ``credentials.CredentialStore[T]``
- Users MAY implement their own mail adapters by subclassing
`adapters.MailIntakeAdapter`.
- Users MAY implement their own authentication providers by subclassing
`auth.MailIntakeAuthProvider[T]`.
- Users MAY implement their own credential persistence layers by implementing
`credentials.CredentialStore[T]`.
Users SHOULD NOT subclass built-in adapter implementations. Built-in
adapters (such as Gmail) are reference implementations and may change
internally without notice.
**Design Guarantees:**
- Read-only access: no mutation of provider state
- Provider-agnostic domain models
- Explicit configuration and dependency injection
- No implicit global state or environment reads
- Deterministic, testable behavior
- Distributed-safe authentication design
- Read-only access: no mutation of provider state.
- Provider-agnostic domain models.
- Explicit configuration and dependency injection.
- No implicit global state or environment reads.
- Deterministic, testable behavior.
- Distributed-safe authentication design.
Mail Intake favors correctness, clarity, and explicitness over convenience
shortcuts.
**Core Philosophy:**
`Mail Intake` is built as a **contract-first ingestion pipeline**:
1. **Layered Decoupling**: Adapters handle transport, Parsers handle format normalization, and Ingestion orchestrates.
2. **Provider Agnosticism**: Domain models and core logic never depend on provider-specific (e.g., Gmail) API internals.
3. **Stateless Workflows**: The library functions as a read-only pipe, ensuring side-effect-free ingestion.
1. **Layered Decoupling**: Adapters handle transport, Parsers handle format
normalization, and Ingestion orchestrates.
2. **Provider Agnosticism**: Domain models and core logic never depend on
provider-specific (e.g., Gmail) API internals.
3. **Stateless Workflows**: The library functions as a read-only pipe, ensuring
side-effect-free ingestion.
---
## Public API
# Public API
The supported public API consists of the following top-level modules:
- mail_intake.ingestion
- mail_intake.adapters
- mail_intake.auth
- mail_intake.credentials
- mail_intake.parsers
- mail_intake.models
- mail_intake.config
- mail_intake.exceptions
- `mail_intake.ingestion`
- `mail_intake.adapters`
- `mail_intake.auth`
- `mail_intake.credentials`
- `mail_intake.parsers`
- `mail_intake.models`
- `mail_intake.config`
- `mail_intake.exceptions`
Classes and functions should be imported explicitly from these modules.
No individual symbols are re-exported at the package root.

View File

@@ -1,19 +1,18 @@
"""
# Summary
Mail provider adapter implementations for Mail Intake.
---
## Summary
This package contains **adapter-layer implementations** responsible for
interfacing with external mail providers and exposing a normalized,
provider-agnostic contract to the rest of the system.
Adapters in this package:
- Implement the `MailIntakeAdapter` interface
- Encapsulate all provider-specific APIs and semantics
- Perform read-only access to mail data
- Return provider-native payloads without interpretation
- Implement the `MailIntakeAdapter` interface.
- Encapsulate all provider-specific APIs and semantics.
- Perform read-only access to mail data.
- Return provider-native payloads without interpretation.
Provider-specific logic **must not leak** outside of adapter implementations.
All parsings, normalizations, and transformations must be handled by downstream
@@ -21,10 +20,10 @@ components.
---
## Public API
# Public API
MailIntakeAdapter
MailIntakeGmailAdapter
- `MailIntakeAdapter`
- `MailIntakeGmailAdapter`
---
"""

View File

@@ -1,10 +1,8 @@
"""
# Summary
Mail provider adapter contracts for Mail Intake.
---
## Summary
This module defines the **provider-agnostic adapter interface** used for
read-only mail ingestion.
@@ -24,13 +22,13 @@ class MailIntakeAdapter(ABC):
Notes:
**Guarantees:**
- discover messages matching a query
- retrieve full message payloads
- retrieve full thread payloads
- Discover messages matching a query.
- Retrieve full message payloads.
- Retrieve full thread payloads.
**Lifecycle:**
- adapters are intentionally read-only and must not mutate provider state
- Adapters are intentionally read-only and must not mutate provider state.
"""
@abstractmethod
@@ -49,15 +47,18 @@ class MailIntakeAdapter(ABC):
Notes:
**Guarantees:**
- Implementations must yield dictionaries containing at least ``message_id`` and ``thread_id``
- Implementations must yield dictionaries containing at least
`message_id` and `thread_id`.
Example:
Typical yield:
{
"message_id": "...",
"thread_id": "..."
}
```python
{
"message_id": "...",
"thread_id": "..."
}
```
"""
raise NotImplementedError

View File

@@ -1,17 +1,16 @@
"""
# Summary
Gmail adapter implementation for Mail Intake.
---
## Summary
This module provides a **Gmail-specific implementation** of the
`MailIntakeAdapter` contract.
It is the only place in the codebase where:
- `googleapiclient` is imported
- Gmail REST API semantics are known
- Low-level `.execute()` calls are made
- `googleapiclient` is imported.
- Gmail REST API semantics are known.
- Low-level `.execute()` calls are made.
All Gmail-specific behavior must be strictly contained within this module.
"""
@@ -37,15 +36,15 @@ class MailIntakeGmailAdapter(MailIntakeAdapter):
Notes:
**Responsibilities:**
- This class is the ONLY place where googleapiclient is imported
- Gmail REST semantics are known
- .execute() is called
- This class is the ONLY place where `googleapiclient` is imported.
- Gmail REST semantics are known.
- `.execute()` is called.
**Constraints:**
- Must remain thin and imperative
- Must not perform parsing or interpretation
- Must not expose Gmail-specific types beyond this class
- Must remain thin and imperative.
- Must not perform parsing or interpretation.
- Must not expose Gmail-specific types beyond this class.
"""
def __init__(

View File

@@ -1,31 +1,31 @@
"""
# Summary
Authentication provider implementations for Mail Intake.
---
## Summary
This package defines the **authentication layer** used by mail adapters
to obtain provider-specific credentials.
It exposes:
- A stable, provider-agnostic authentication contract
- Concrete authentication providers for supported platforms
- A stable, provider-agnostic authentication contract.
- Concrete authentication providers for supported platforms.
Authentication providers:
- Are responsible for credential acquisition and lifecycle management
- Are intentionally decoupled from adapter logic
- May be extended by users to support additional providers
- Are responsible for credential acquisition and lifecycle management.
- Are intentionally decoupled from adapter logic.
- May be extended by users to support additional providers.
Consumers should depend on the abstract interface and use concrete
implementations only where explicitly required.
---
## Public API
# Public API
MailIntakeAuthProvider
MailIntakeGoogleAuth
- `MailIntakeAuthProvider`
- `MailIntakeGoogleAuth`
---
"""

View File

@@ -1,10 +1,8 @@
"""
# Summary
Authentication provider contracts for Mail Intake.
---
## Summary
This module defines the **authentication abstraction layer** used by mail
adapters to obtain provider-specific credentials.
@@ -30,15 +28,17 @@ class MailIntakeAuthProvider(ABC, Generic[T]):
Notes:
**Responsibilities:**
- Acquire credentials from an external provider
- Refresh or revalidate credentials as needed
- Handle authentication-specific failure modes
- Coordinate with credential persistence layers where applicable
- Acquire credentials from an external provider.
- Refresh or revalidate credentials as needed.
- Handle authentication-specific failure modes.
- Coordinate with credential persistence layers where applicable.
**Constraints:**
- Mail adapters must treat returned credentials as opaque and provider-specific
- Mail adapters rely only on the declared credential type expected by the adapter
- Mail adapters must treat returned credentials as opaque and
provider-specific.
- Mail adapters rely only on the declared credential type expected
by the adapter.
"""
@abstractmethod
@@ -48,7 +48,7 @@ class MailIntakeAuthProvider(ABC, Generic[T]):
Returns:
T:
Credentials of type ``T`` suitable for immediate use by the
Credentials of type `T` suitable for immediate use by the
corresponding mail adapter.
Raises:
@@ -59,8 +59,10 @@ class MailIntakeAuthProvider(ABC, Generic[T]):
Notes:
**Guarantees:**
- This method is synchronous by design
- Represents the sole entry point through which adapters obtain authentication material
- Implementations must either return credentials of the declared type ``T`` that are valid at the time of return or raise an exception
- This method is synchronous by design.
- Represents the sole entry point through which adapters obtain
authentication material.
- Implementations must either return credentials of the declared
type `T` that are valid at the time of return or raise an exception.
"""
raise NotImplementedError

View File

@@ -1,18 +1,17 @@
"""
# Summary
Google authentication provider implementation for Mail Intake.
---
## Summary
This module provides a **Google OAuthbased authentication provider**
used primarily for Gmail access.
It encapsulates all Google-specific authentication concerns, including:
- Credential loading and persistence
- Token refresh handling
- Interactive OAuth flow initiation
- Coordination with a credential persistence layer
- Credential loading and persistence.
- Token refresh handling.
- Interactive OAuth flow initiation.
- Coordination with a credential persistence layer.
No Google authentication details should leak outside this module.
"""
@@ -40,14 +39,15 @@ class MailIntakeGoogleAuth(MailIntakeAuthProvider):
Notes:
**Responsibilities:**
- Load cached credentials from a credential store when available
- Refresh expired credentials when possible
- Initiate an interactive OAuth flow only when required
- Persist refreshed or newly obtained credentials via the store
- Load cached credentials from a credential store when available.
- Refresh expired credentials when possible.
- Initiate an interactive OAuth flow only when required.
- Persist refreshed or newly obtained credentials via the store.
**Guarantees:**
- This class is synchronous by design and maintains a minimal internal state
- This class is synchronous by design and maintains a minimal
internal state.
"""
def __init__(
@@ -79,7 +79,7 @@ class MailIntakeGoogleAuth(MailIntakeAuthProvider):
Returns:
Credentials:
A ``google.oauth2.credentials.Credentials`` instance suitable
A `google.oauth2.credentials.Credentials` instance suitable
for use with Google API clients.
Raises:
@@ -90,10 +90,10 @@ class MailIntakeGoogleAuth(MailIntakeAuthProvider):
Notes:
**Lifecycle:**
- Load cached credentials from the configured credential store
- Refresh expired credentials when possible
- Perform an interactive OAuth login as a fallback
- Persist valid credentials for future use
- Load cached credentials from the configured credential store.
- Refresh expired credentials when possible.
- Perform an interactive OAuth login as a fallback.
- Persist valid credentials for future use.
"""
creds = self.store.load()

View File

@@ -1,10 +1,8 @@
"""
# Summary
Global configuration models for Mail Intake.
---
## Summary
This module defines the **top-level configuration object** used to control
mail ingestion behavior across adapters, authentication providers, and
ingestion workflows.
@@ -20,16 +18,17 @@ from typing import Optional
@dataclass(frozen=True)
class MailIntakeConfig:
"""
Global configuration for mail-intake.
Global configuration for `mail-intake`.
Notes:
**Guarantees:**
- This configuration is intentionally explicit and immutable
- No implicit environment reads or global state
- Explicit configuration over implicit defaults
- No direct environment or filesystem access
- This model is safe to pass across layers and suitable for serialization
- This configuration is intentionally explicit and immutable.
- No implicit environment reads or global state.
- Explicit configuration over implicit defaults.
- No direct environment or filesystem access.
- This model is safe to pass across layers and suitable for
serialization.
"""
provider: str = "gmail"

View File

@@ -1,10 +1,8 @@
"""
# Summary
Credential persistence interfaces and implementations for Mail Intake.
---
## Summary
This package defines the abstractions and concrete implementations used
to persist authentication credentials across Mail Intake components.
@@ -14,20 +12,21 @@ credential acquisition, validation, and refresh, while implementations
within this package are responsible solely for storage and retrieval.
The package provides:
- A generic ``CredentialStore`` abstraction defining the persistence contract
- Local filesystembased storage for development and single-node use
- Distributed, Redis-backed storage for production and scaled deployments
- A generic `CredentialStore` abstraction defining the persistence contract.
- Local filesystembased storage for development and single-node use.
- Distributed, Redis-backed storage for production and scaled deployments.
Credential lifecycle management, interpretation, and security policy
decisions remain the responsibility of authentication providers.
---
## Public API
# Public API
CredentialStore
PickleCredentialStore
RedisCredentialStore
- `CredentialStore`
- `PickleCredentialStore`
- `RedisCredentialStore`
---
"""

View File

@@ -1,18 +1,16 @@
"""
# Summary
Local filesystembased credential persistence for Mail Intake.
---
## Summary
This module provides a file-backed implementation of the
``CredentialStore`` abstraction using Python's ``pickle`` module.
`CredentialStore` abstraction using Python's `pickle` module.
The pickle-based credential store is intended for local development,
The `pickle`-based credential store is intended for local development,
single-node deployments, and controlled environments where credentials
do not need to be shared across processes or machines.
Due to the security and portability risks associated with pickle-based
Due to the security and portability risks associated with `pickle`-based
serialization, this implementation is not suitable for distributed or
untrusted environments.
"""
@@ -36,13 +34,14 @@ class PickleCredentialStore(CredentialStore[T]):
Notes:
**Guarantees:**
- Stores credentials on the local filesystem
- Uses pickle for serialization and deserialization
- Does not provide encryption, locking, or concurrency guarantees
- Stores credentials on the local filesystem.
- Uses `pickle` for serialization and deserialization.
- Does not provide encryption, locking, or concurrency guarantees.
**Constraints:**
- Credential lifecycle management, validation, and refresh logic are explicitly out of scope for this class
- Credential lifecycle management, validation, and refresh logic are
explicitly out of scope for this class.
"""
def __init__(self, path: str):
@@ -62,14 +61,16 @@ class PickleCredentialStore(CredentialStore[T]):
Returns:
Optional[T]:
An instance of type ``T`` if credentials are present and
successfully deserialized; otherwise ``None``.
An instance of type `T` if credentials are present and
successfully deserialized; otherwise `None`.
Notes:
**Guarantees:**
- If the credential file does not exist or cannot be successfully deserialized, this method returns ``None``
- The store does not attempt to validate or interpret the returned credentials
- If the credential file does not exist or cannot be successfully
deserialized, this method returns `None`.
- The store does not attempt to validate or interpret the
returned credentials.
"""
try:
with open(self.path, "rb") as fh:

View File

@@ -1,12 +1,10 @@
"""
# Summary
Redis-backed credential persistence for Mail Intake.
---
## Summary
This module provides a Redis-based implementation of the
``CredentialStore`` abstraction, enabling credential persistence
`CredentialStore` abstraction, enabling credential persistence
across distributed and horizontally scaled deployments.
The Redis credential store is designed for environments where
@@ -15,10 +13,11 @@ processes, containers, or nodes, such as container orchestration
platforms and microservice architectures.
Key characteristics:
- Distributed-safe, shared storage using Redis
- Explicit, caller-defined serialization and deserialization
- No reliance on unsafe mechanisms such as pickle
- Optional time-to-live (TTL) support for automatic credential expiry
- Distributed-safe, shared storage using Redis.
- Explicit, caller-defined serialization and deserialization.
- No reliance on unsafe mechanisms such as `pickle`.
- Optional time-to-live (TTL) support for automatic credential expiry.
This module is responsible solely for persistence concerns.
Credential validation, refresh, rotation, and acquisition remain the
@@ -35,7 +34,7 @@ T = TypeVar("T")
class RedisCredentialStore(CredentialStore[T]):
"""
Redis-backed implementation of ``CredentialStore``.
Redis-backed implementation of `CredentialStore`.
This store persists credentials in Redis and is suitable for
distributed and horizontally scaled deployments where credentials
@@ -44,13 +43,16 @@ class RedisCredentialStore(CredentialStore[T]):
Notes:
**Responsibilities:**
- This class is responsible only for persistence and retrieval
- It does not interpret, validate, refresh, or otherwise manage the lifecycle of the credentials being stored
- This class is responsible only for persistence and retrieval.
- It does not interpret, validate, refresh, or otherwise manage the
lifecycle of the credentials being stored.
**Guarantees:**
- The store is intentionally generic and delegates all serialization concerns to caller-provided functions
- This avoids unsafe mechanisms such as pickle and allows credential formats to be explicitly controlled and audited
- The store is intentionally generic and delegates all serialization
concerns to caller-provided functions.
- This avoids unsafe mechanisms such as `pickle` and allows
credential formats to be explicitly controlled and audited.
"""
def __init__(
@@ -92,14 +94,18 @@ class RedisCredentialStore(CredentialStore[T]):
Returns:
Optional[T]:
An instance of type ``T`` if credentials are present and
successfully deserialized; otherwise ``None``.
An instance of type `T` if credentials are present and
successfully deserialized; otherwise `None`.
Notes:
**Guarantees:**
- If no value exists for the configured key, or if the stored payload cannot be successfully deserialized, this method returns ``None``
- The store does not attempt to validate the returned credentials or determine whether they are expired or otherwise usable
- If no value exists for the configured key, or if the stored
payload cannot be successfully deserialized, this method
returns `None`.
- The store does not attempt to validate the returned
credentials or determine whether they are expired or
otherwise usable.
"""
raw = self.redis.get(self.key)
if not raw:

View File

@@ -1,14 +1,12 @@
"""
# Summary
Credential persistence abstractions for Mail Intake.
---
## Summary
This module defines the generic persistence contract used to store and
retrieve authentication credentials across Mail Intake components.
The ``CredentialStore`` abstraction establishes a strict separation
The `CredentialStore` abstraction establishes a strict separation
between credential *lifecycle management* and credential *storage*.
Authentication providers are responsible for acquiring, validating,
refreshing, and revoking credentials, while concrete store
@@ -30,21 +28,23 @@ T = TypeVar("T")
class CredentialStore(ABC, Generic[T]):
"""
Abstract base class defining a generic persistence interface for
authentication credentials.
Abstract base class defining a generic persistence interface.
Used for authentication credentials across different backends.
Notes:
**Responsibilities:**
- Provide persistent storage separating life-cycle management from storage mechanics
- Keep implementation focused only on persistence
- Provide persistent storage separating life-cycle management from
storage mechanics.
- Keep implementation focused only on persistence.
**Constraints:**
- The store is intentionally agnostic to:
- The concrete credential type being stored
- The serialization format used to persist credentials
- The underlying storage backend or durability guarantees
- The concrete credential type being stored.
- The serialization format used to persist credentials.
- The underlying storage backend or durability guarantees.
"""
@abstractmethod
@@ -54,14 +54,17 @@ class CredentialStore(ABC, Generic[T]):
Returns:
Optional[T]:
An instance of type ``T`` if credentials are available and
loadable; otherwise ``None``.
An instance of type `T` if credentials are available and
loadable; otherwise `None`.
Notes:
**Guarantees:**
- Implementations should return ``None`` when no credentials are present or when stored credentials cannot be successfully decoded or deserialized
- The store must not attempt to validate, refresh, or otherwise interpret the returned credentials
- Implementations should return `None` when no credentials are
present or when stored credentials cannot be successfully
decoded or deserialized.
- The store must not attempt to validate, refresh, or otherwise
interpret the returned credentials.
"""
@abstractmethod

View File

@@ -34,7 +34,8 @@ class MailIntakeAuthError(MailIntakeError):
Notes:
**Lifecycle:**
- Raised when authentication providers are unable to acquire, refresh, or persist valid credentials
- Raised when authentication providers are unable to acquire,
refresh, or persist valid credentials.
"""
@@ -45,7 +46,8 @@ class MailIntakeAdapterError(MailIntakeError):
Notes:
**Lifecycle:**
- Raised when a provider adapter encounters API errors, transport failures, or invalid provider responses
- Raised when a provider adapter encounters API errors, transport
failures, or invalid provider responses.
"""
@@ -56,5 +58,6 @@ class MailIntakeParsingError(MailIntakeError):
Notes:
**Lifecycle:**
- Raised when raw provider payloads cannot be interpreted or normalized into internal domain models
- Raised when raw provider payloads cannot be interpreted or
normalized into internal domain models.
"""

View File

@@ -1,10 +1,8 @@
"""
# Summary
Mail ingestion orchestration for Mail Intake.
---
## Summary
This package contains **high-level ingestion components** responsible for
coordinating mail retrieval, parsing, normalization, and model construction.
@@ -12,19 +10,20 @@ It represents the **top of the ingestion pipeline** and is intended to be the
primary interaction surface for library consumers.
Components in this package:
- Are provider-agnostic
- Depend only on adapter and parser contracts
- Contain no provider-specific API logic
- Expose read-only ingestion workflows
- Are provider-agnostic.
- Depend only on adapter and parser contracts.
- Contain no provider-specific API logic.
- Expose read-only ingestion workflows.
Consumers are expected to construct a mail adapter and pass it to the
ingestion layer to begin processing messages and threads.
---
## Public API
# Public API
MailIntakeReader
- `MailIntakeReader`
---
"""

View File

@@ -1,18 +1,17 @@
"""
# Summary
High-level mail ingestion orchestration for Mail Intake.
---
## Summary
This module provides the primary, provider-agnostic entry point for
reading and processing mail data.
It coordinates:
- Mail adapter access
- Message and thread iteration
- Header and body parsing
- Normalization and model construction
- Mail adapter access.
- Message and thread iteration.
- Header and body parsing.
- Normalization and model construction.
No provider-specific logic or API semantics are permitted in this layer.
"""
@@ -36,12 +35,18 @@ class MailIntakeReader:
Notes:
**Responsibilities:**
- This class is the primary entry point for consumers of the Mail Intake library
- It orchestrates the full ingestion pipeline: Querying the adapter for message references, fetching raw provider messages, parsing and normalizing message data, constructing domain models
- This class is the primary entry point for consumers of the
Mail Intake library.
- It orchestrates the full ingestion pipeline:
- Querying the adapter for message references.
- Fetching raw provider messages.
- Parsing and normalizing message data.
- Constructing domain models.
**Constraints:**
- This class is intentionally: Provider-agnostic, stateless beyond iteration scope, read-only
- This class is intentionally: Provider-agnostic, stateless beyond
iteration scope, read-only.
"""
def __init__(self, adapter: MailIntakeAdapter):
@@ -87,13 +92,14 @@ class MailIntakeReader:
An iterator of `MailIntakeThread` instances.
Raises:
MailIntakeParsingError:
`MailIntakeParsingError`:
If a message cannot be parsed.
Notes:
**Guarantees:**
- Messages are grouped by `thread_id` and yielded as complete thread objects containing all associated messages
- Messages are grouped by `thread_id` and yielded as complete
thread objects containing all associated messages.
"""
threads: Dict[str, MailIntakeThread] = {}

View File

@@ -1,27 +1,26 @@
"""
# Summary
Domain models for Mail Intake.
---
## Summary
This package defines the **canonical, provider-agnostic data models**
used throughout the Mail Intake ingestion pipeline.
Models in this package:
- Represent fully parsed and normalized mail data
- Are safe to persist, serialize, and index
- Contain no provider-specific payloads or API semantics
- Serve as stable inputs for downstream processing and analysis
- Represent fully parsed and normalized mail data.
- Are safe to persist, serialize, and index.
- Contain no provider-specific payloads or API semantics.
- Serve as stable inputs for downstream processing and analysis.
These models form the core internal data contract of the library.
---
## Public API
# Public API
MailIntakeMessage
MailIntakeThread
- `MailIntakeMessage`
- `MailIntakeThread`
---
"""

View File

@@ -1,10 +1,8 @@
"""
# Summary
Message domain models for Mail Intake.
---
## Summary
This module defines the **canonical, provider-agnostic representation**
of an individual email message as used internally by the Mail Intake
ingestion pipeline.
@@ -26,12 +24,14 @@ class MailIntakeMessage:
Notes:
**Guarantees:**
- This model represents a fully parsed and normalized email message
- It is intentionally provider-agnostic and suitable for persistence, indexing, and downstream processing
- This model represents a fully parsed and normalized email message.
- It is intentionally provider-agnostic and suitable for
persistence, indexing, and downstream processing.
**Constraints:**
- No provider-specific identifiers, payloads, or API semantics should appear in this model
- No provider-specific identifiers, payloads, or API semantics
should appear in this model.
"""
message_id: str

View File

@@ -1,10 +1,8 @@
"""
# Summary
Thread domain models for Mail Intake.
---
## Summary
This module defines the **canonical, provider-agnostic representation**
of an email thread as used internally by the Mail Intake ingestion pipeline.
@@ -27,9 +25,11 @@ class MailIntakeThread:
Notes:
**Guarantees:**
- A thread groups multiple related messages under a single subject and participant set
- It is designed to support reasoning over conversational context such as job applications, interviews, follow-ups, and ongoing discussions
- This model is provider-agnostic and safe to persist
- A thread groups multiple related messages under a single subject
and participant set.
- It is designed to support reasoning over conversational context
such as job applications, interviews, follow-ups, and ongoing discussions.
- This model is provider-agnostic and safe to persist.
"""
thread_id: str
@@ -68,9 +68,9 @@ class MailIntakeThread:
Notes:
**Responsibilities:**
- Appends the message to the thread
- Tracks unique participants
- Updates the last activity timestamp
- Appends the message to the thread.
- Tracks unique participants.
- Updates the last activity timestamp.
"""
self.messages.append(message)

View File

@@ -1,34 +1,34 @@
"""
# Summary
Message parsing utilities for Mail Intake.
---
## Summary
This package contains **provider-aware but adapter-agnostic parsing helpers**
used to extract and normalize structured information from raw mail payloads.
Parsers in this package are responsible for:
- Interpreting provider-native message structures
- Extracting meaningful fields such as headers, body text, and subjects
- Normalizing data into consistent internal representations
- Interpreting provider-native message structures.
- Extracting meaningful fields such as headers, body text, and subjects.
- Normalizing data into consistent internal representations.
This package does not:
- Perform network or IO operations
- Contain provider API logic
- Construct domain models directly
- Perform network or IO operations.
- Contain provider API logic.
- Construct domain models directly.
Parsing functions are designed to be composable and are orchestrated by the
ingestion layer.
---
## Public API
# Public API
extract_body
parse_headers
extract_sender
normalize_subject
- `extract_body`
- `parse_headers`
- `extract_sender`
- `normalize_subject`
---
"""

View File

@@ -1,4 +1,6 @@
"""
# Summary
Message body extraction utilities for Mail Intake.
This module contains helper functions for extracting a best-effort
@@ -24,13 +26,16 @@ def _decode_base64(data: str) -> str:
omit padding and use non-standard characters.
Args:
data: URL-safe base64-encoded string.
data (str):
URL-safe base64-encoded string.
Returns:
Decoded UTF-8 text with replacement for invalid characters.
str:
Decoded UTF-8 text with replacement for invalid characters.
Raises:
MailIntakeParsingError: If decoding fails.
MailIntakeParsingError:
If decoding fails.
"""
try:
padded = data.replace("-", "+").replace("_", "/")
@@ -45,14 +50,17 @@ def _extract_from_part(part: Dict[str, Any]) -> Optional[str]:
Extract text content from a single MIME part.
Supports:
- text/plain
- text/html (converted to plain text)
- `text/plain`
- `text/html` (converted to plain text)
Args:
part: MIME part dictionary from a provider payload.
part (Dict[str, Any]):
MIME part dictionary from a provider payload.
Returns:
Extracted plain-text content, or None if unsupported or empty.
Optional[str]:
Extracted plain-text content, or `None` if unsupported or empty.
"""
mime_type = part.get("mimeType")
body = part.get("body", {})
@@ -79,16 +87,19 @@ def extract_body(payload: Dict[str, Any]) -> str:
Extract the best-effort message body from a Gmail payload.
Priority:
1. text/plain
2. text/html (stripped to text)
1. `text/plain`
2. `text/html` (stripped to text)
3. Single-part body
4. empty string (if nothing usable found)
4. Empty string (if nothing usable found)
Args:
payload: Provider-native message payload dictionary.
payload (Dict[str, Any]):
Provider-native message payload dictionary.
Returns:
Extracted plain-text message body.
str:
Extracted plain-text message body.
"""
if not payload:
return ""

View File

@@ -1,10 +1,8 @@
"""
# Summary
Message header parsing utilities for Mail Intake.
---
## Summary
This module provides helper functions for normalizing and extracting
useful information from provider-native message headers.
@@ -21,7 +19,7 @@ def parse_headers(raw_headers: List[Dict[str, str]]) -> Dict[str, str]:
Args:
raw_headers (List[Dict[str, str]]):
List of header dictionaries, each containing ``name`` and ``value`` keys.
List of header dictionaries, each containing `name` and `value` keys.
Returns:
Dict[str, str]:
@@ -30,23 +28,27 @@ def parse_headers(raw_headers: List[Dict[str, str]]) -> Dict[str, str]:
Notes:
**Guarantees:**
- Provider payloads (such as Gmail) typically represent headers as a list of name/value mappings
- This function normalizes them into a case-insensitive dictionary keyed by lowercase header names
- Provider payloads (such as Gmail) typically represent headers as a
list of name/value mappings.
- This function normalizes them into a case-insensitive dictionary
keyed by lowercase header names.
Example:
Typical usage:
Input:
[
{"name": "From", "value": "John Doe <john@example.com>"},
{"name": "Subject", "value": "Re: Interview Update"},
]
Output:
{
"from": "John Doe <john@example.com>",
"subject": "Re: Interview Update",
}
```python
Input:
[
{"name": "From", "value": "John Doe <john@example.com>"},
{"name": "Subject", "value": "Re: Interview Update"},
]
Output:
{
"from": "John Doe <john@example.com>",
"subject": "Re: Interview Update",
}
```
"""
headers: Dict[str, str] = {}
@@ -68,22 +70,24 @@ def extract_sender(headers: Dict[str, str]) -> Tuple[str, Optional[str]]:
Args:
headers (Dict[str, str]):
Normalized header dictionary as returned by :func:`parse_headers`.
Normalized header dictionary as returned by `parse_headers()`.
Returns:
Tuple[str, Optional[str]]:
A tuple ``(email, name)`` where ``email`` is the sender email address and ``name`` is the display name, or ``None`` if unavailable.
A tuple `(email, name)` where `email` is the sender email address
and `name` is the display name, or `None` if unavailable.
Notes:
**Responsibilities:**
- This function parses the ``From`` header and attempts to extract sender email address and optional human-readable display name
- This function parses the `From` header and attempts to extract
sender email address and optional human-readable display name.
Example:
Typical values:
``"John Doe <john@example.com>"`` -> ``("john@example.com", "John Doe")``
``"john@example.com"`` -> ``("john@example.com", None)``
- `"John Doe <john@example.com>"` -> `("john@example.com", "John Doe")`
- `"john@example.com"` -> `("john@example.com", None)`
"""
from_header = headers.get("from")
if not from_header:

View File

@@ -1,10 +1,8 @@
"""
# Summary
Subject line normalization utilities for Mail Intake.
---
## Summary
This module provides helper functions for normalizing email subject lines
to enable reliable thread-level comparison and grouping.
@@ -36,14 +34,15 @@ def normalize_subject(subject: str) -> str:
Notes:
**Responsibilities:**
- Strips common prefixes such as ``Re:``, ``Fwd:``, and ``FW:``
- Repeats prefix stripping to handle stacked prefixes
- Collapses excessive whitespace
- Preserves original casing (no lowercasing)
- Strips common prefixes such as `Re:`, `Fwd:`, and `FW:`.
- Repeats prefix stripping to handle stacked prefixes.
- Collapses excessive whitespace.
- Preserves original casing (no lowercasing).
**Guarantees:**
- This function is intentionally conservative and avoids aggressive transformations that could alter the semantic meaning of the subject
- This function is intentionally conservative and avoids aggressive
transformations that could alter the semantic meaning of the subject.
"""
if not subject:
return ""