Skip to content

Parsers

mail_intake.parsers

Summary

Message parsing utilities for Mail Intake.

This package contains provider-aware but adapter-agnostic parsing helpers used to extract and normalize structured information from raw mail payloads.

Parsers in this package are responsible for:

  • Interpreting provider-native message structures.
  • Extracting meaningful fields such as headers, body text, and subjects.
  • Normalizing data into consistent internal representations.

This package does not:

  • Perform network or IO operations.
  • Contain provider API logic.
  • Construct domain models directly.

Parsing functions are designed to be composable and are orchestrated by the ingestion layer.


Public API

  • extract_body
  • parse_headers
  • extract_sender
  • normalize_subject

Functions

extract_body

extract_body(payload: Dict[str, Any]) -> str

Extract the best-effort message body from a Gmail payload.

Priority:

  1. text/plain
  2. text/html (stripped to text)
  3. Single-part body
  4. Empty string (if nothing usable found)

Parameters:

Name Type Description Default
payload Dict[str, Any]

Provider-native message payload dictionary.

required

Returns:

Name Type Description
str str

Extracted plain-text message body.

extract_sender

extract_sender(headers: Dict[str, str]) -> Tuple[str, Optional[str]]

Extract sender email and optional display name from headers.

Parameters:

Name Type Description Default
headers Dict[str, str]

Normalized header dictionary as returned by parse_headers().

required

Returns:

Type Description
Tuple[str, Optional[str]]

Tuple[str, Optional[str]]: A tuple (email, name) where email is the sender email address and name is the display name, or None if unavailable.

Notes

Responsibilities:

1
2
- This function parses the `From` header and attempts to extract
  sender email address and optional human-readable display name.
Example

Typical values:

  • "John Doe <john@example.com>" -> ("john@example.com", "John Doe")
  • "john@example.com" -> ("john@example.com", None)

normalize_subject

normalize_subject(subject: str) -> str

Normalize an email subject for thread-level comparison.

Parameters:

Name Type Description Default
subject str

Raw subject line from a message header.

required

Returns:

Name Type Description
str str

Normalized subject string suitable for thread grouping.

Notes

Responsibilities:

1
2
3
4
- Strips common prefixes such as `Re:`, `Fwd:`, and `FW:`.
- Repeats prefix stripping to handle stacked prefixes.
- Collapses excessive whitespace.
- Preserves original casing (no lowercasing).

Guarantees:

1
2
- This function is intentionally conservative and avoids aggressive
  transformations that could alter the semantic meaning of the subject.

parse_headers

parse_headers(raw_headers: List[Dict[str, str]]) -> Dict[str, str]

Convert a list of Gmail-style headers into a normalized dict.

Parameters:

Name Type Description Default
raw_headers List[Dict[str, str]]

List of header dictionaries, each containing name and value keys.

required

Returns:

Type Description
Dict[str, str]

Dict[str, str]: Dictionary mapping lowercase header names to stripped values.

Notes

Guarantees:

1
2
3
4
- Provider payloads (such as Gmail) typically represent headers as a
  list of name/value mappings.
- This function normalizes them into a case-insensitive dictionary
  keyed by lowercase header names.
Example

Typical usage:

Input:
    [
        {"name": "From", "value": "John Doe <john@example.com>"},
        {"name": "Subject", "value": "Re: Interview Update"},
    ]

Output:
    {
        "from": "John Doe <john@example.com>",
        "subject": "Re: Interview Update",
    }