Skip to content

Parsers

mail_intake.parsers

Message parsing utilities for Mail Intake.

This package contains provider-aware but adapter-agnostic parsing helpers used to extract and normalize structured information from raw mail payloads.

Parsers in this package are responsible for: - Interpreting provider-native message structures - Extracting meaningful fields such as headers, body text, and subjects - Normalizing data into consistent internal representations

This package does not: - Perform network or IO operations - Contain provider API logic - Construct domain models directly

Parsing functions are designed to be composable and are orchestrated by the ingestion layer.

extract_body

extract_body(payload: Dict[str, Any]) -> str

Extract the best-effort message body from a Gmail payload.

Priority: 1. text/plain 2. text/html (stripped to text) 3. Single-part body 4. empty string (if nothing usable found)

Parameters:

Name Type Description Default
payload Dict[str, Any]

Provider-native message payload dictionary.

required

Returns:

Type Description
str

Extracted plain-text message body.

extract_sender

extract_sender(headers: Dict[str, str]) -> Tuple[str, Optional[str]]

Extract sender email and optional display name from headers.

This function parses the From header and attempts to extract: - Sender email address - Optional human-readable display name

Parameters:

Name Type Description Default
headers Dict[str, str]

Normalized header dictionary as returned by :func:parse_headers.

required

Returns:

Type Description
str

A tuple (email, name) where:

Optional[str]
  • email is the sender email address
Tuple[str, Optional[str]]
  • name is the display name, or None if unavailable

Examples:

"John Doe <john@example.com>"("john@example.com", "John Doe") "john@example.com"("john@example.com", None)

normalize_subject

normalize_subject(subject: str) -> str

Normalize an email subject for thread-level comparison.

Operations: - Strips common prefixes such as Re:, Fwd:, and FW: - Repeats prefix stripping to handle stacked prefixes - Collapses excessive whitespace - Preserves original casing (no lowercasing)

This function is intentionally conservative and avoids aggressive transformations that could alter the semantic meaning of the subject.

Parameters:

Name Type Description Default
subject str

Raw subject line from a message header.

required

Returns:

Type Description
str

Normalized subject string suitable for thread grouping.

parse_headers

parse_headers(raw_headers: List[Dict[str, str]]) -> Dict[str, str]

Convert a list of Gmail-style headers into a normalized dict.

Provider payloads (such as Gmail) typically represent headers as a list of name/value mappings. This function normalizes them into a case-insensitive dictionary keyed by lowercase header names.

Parameters:

Name Type Description Default
raw_headers List[Dict[str, str]]

List of header dictionaries, each containing name and value keys.

required

Returns:

Type Description
Dict[str, str]

Dictionary mapping lowercase header names to stripped values.

Example

Input: [ {"name": "From", "value": "John Doe john@example.com"}, {"name": "Subject", "value": "Re: Interview Update"}, ]

Output: { "from": "John Doe john@example.com", "subject": "Re: Interview Update", }