Parsers
mail_intake.parsers
Message parsing utilities for Mail Intake.
This package contains provider-aware but adapter-agnostic parsing helpers used to extract and normalize structured information from raw mail payloads.
Parsers in this package are responsible for: - Interpreting provider-native message structures - Extracting meaningful fields such as headers, body text, and subjects - Normalizing data into consistent internal representations
This package does not: - Perform network or IO operations - Contain provider API logic - Construct domain models directly
Parsing functions are designed to be composable and are orchestrated by the ingestion layer.
extract_body
extract_body(payload: Dict[str, Any]) -> str
Extract the best-effort message body from a Gmail payload.
Priority: 1. text/plain 2. text/html (stripped to text) 3. Single-part body 4. empty string (if nothing usable found)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
payload |
Dict[str, Any]
|
Provider-native message payload dictionary. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Extracted plain-text message body. |
extract_sender
extract_sender(headers: Dict[str, str]) -> Tuple[str, Optional[str]]
Extract sender email and optional display name from headers.
This function parses the From header and attempts to extract:
- Sender email address
- Optional human-readable display name
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
headers |
Dict[str, str]
|
Normalized header dictionary as returned by
:func: |
required |
Returns:
| Type | Description |
|---|---|
str
|
A tuple |
Optional[str]
|
|
Tuple[str, Optional[str]]
|
|
Examples:
"John Doe <john@example.com>" → ("john@example.com", "John Doe")
"john@example.com" → ("john@example.com", None)
normalize_subject
normalize_subject(subject: str) -> str
Normalize an email subject for thread-level comparison.
Operations:
- Strips common prefixes such as Re:, Fwd:, and FW:
- Repeats prefix stripping to handle stacked prefixes
- Collapses excessive whitespace
- Preserves original casing (no lowercasing)
This function is intentionally conservative and avoids aggressive transformations that could alter the semantic meaning of the subject.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject |
str
|
Raw subject line from a message header. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Normalized subject string suitable for thread grouping. |
parse_headers
parse_headers(raw_headers: List[Dict[str, str]]) -> Dict[str, str]
Convert a list of Gmail-style headers into a normalized dict.
Provider payloads (such as Gmail) typically represent headers as a list of name/value mappings. This function normalizes them into a case-insensitive dictionary keyed by lowercase header names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_headers |
List[Dict[str, str]]
|
List of header dictionaries, each containing
|
required |
Returns:
| Type | Description |
|---|---|
Dict[str, str]
|
Dictionary mapping lowercase header names to stripped values. |
Example
Input: [ {"name": "From", "value": "John Doe john@example.com"}, {"name": "Subject", "value": "Re: Interview Update"}, ]
Output: { "from": "John Doe john@example.com", "subject": "Re: Interview Update", }