Skip to content

Parsers

mail_intake.parsers

Message parsing utilities for Mail Intake.


Summary

This package contains provider-aware but adapter-agnostic parsing helpers used to extract and normalize structured information from raw mail payloads.

Parsers in this package are responsible for: - Interpreting provider-native message structures - Extracting meaningful fields such as headers, body text, and subjects - Normalizing data into consistent internal representations

This package does not: - Perform network or IO operations - Contain provider API logic - Construct domain models directly

Parsing functions are designed to be composable and are orchestrated by the ingestion layer.


Public API

1
2
3
4
extract_body
parse_headers
extract_sender
normalize_subject

Functions

extract_body

extract_body(payload: Dict[str, Any]) -> str

Extract the best-effort message body from a Gmail payload.

Priority: 1. text/plain 2. text/html (stripped to text) 3. Single-part body 4. empty string (if nothing usable found)

Parameters:

Name Type Description Default
payload Dict[str, Any]

Provider-native message payload dictionary.

required

Returns:

Type Description
str

Extracted plain-text message body.

extract_sender

extract_sender(headers: Dict[str, str]) -> Tuple[str, Optional[str]]

Extract sender email and optional display name from headers.

Parameters:

Name Type Description Default
headers Dict[str, str]

Normalized header dictionary as returned by :func:parse_headers.

required

Returns:

Type Description
Tuple[str, Optional[str]]

Tuple[str, Optional[str]]: A tuple (email, name) where email is the sender email address and name is the display name, or None if unavailable.

Notes

Responsibilities:

1
- This function parses the ``From`` header and attempts to extract sender email address and optional human-readable display name
Example

Typical values:

1
2
``"John Doe <john@example.com>"`` -> ``("john@example.com", "John Doe")``
``"john@example.com"`` -> ``("john@example.com", None)``

normalize_subject

normalize_subject(subject: str) -> str

Normalize an email subject for thread-level comparison.

Parameters:

Name Type Description Default
subject str

Raw subject line from a message header.

required

Returns:

Name Type Description
str str

Normalized subject string suitable for thread grouping.

Notes

Responsibilities:

1
2
3
4
- Strips common prefixes such as ``Re:``, ``Fwd:``, and ``FW:``
- Repeats prefix stripping to handle stacked prefixes
- Collapses excessive whitespace
- Preserves original casing (no lowercasing)

Guarantees:

1
- This function is intentionally conservative and avoids aggressive transformations that could alter the semantic meaning of the subject

parse_headers

parse_headers(raw_headers: List[Dict[str, str]]) -> Dict[str, str]

Convert a list of Gmail-style headers into a normalized dict.

Parameters:

Name Type Description Default
raw_headers List[Dict[str, str]]

List of header dictionaries, each containing name and value keys.

required

Returns:

Type Description
Dict[str, str]

Dict[str, str]: Dictionary mapping lowercase header names to stripped values.

Notes

Guarantees:

1
2
- Provider payloads (such as Gmail) typically represent headers as a list of name/value mappings
- This function normalizes them into a case-insensitive dictionary keyed by lowercase header names
Example

Typical usage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
Input:
    [
        {"name": "From", "value": "John Doe <john@example.com>"},
        {"name": "Subject", "value": "Re: Interview Update"},
    ]

Output:
    {
        "from": "John Doe <john@example.com>",
        "subject": "Re: Interview Update",
    }