AMD: Answering Machine Detection Engine

In outbound calling, 30-40% of calls reach voicemail. Every second an agent waits for a machine to finish its greeting is wasted labor. The AMD Engine classifies answering entities in under 3 seconds using a four-feature decision tree — detecting humans, voicemail systems, beep tones, and unknown entities with explainable confidence scores.

1. The Outbound Calling Problem

Outbound contact centers face a persistent efficiency challenge: a significant fraction of calls connect to answering machines rather than humans. Without AMD:

Agents wait 5-15 seconds listening to voicemail greetings before realizing no human is present
False positives (human classified as machine) cause premature hangups and lost customers
False negatives (machine classified as human) waste agent time on voicemail recordings
Beep detection failures mean agents miss the opportunity to leave automated messages

Traditional AMD systems rely on single features (usually silence duration or greeting length) and take 5-8 seconds to reach a verdict — by which time, a human caller has already said “hello” and heard silence.

2. Four-Feature Decision Architecture

The AMD Engine extracts four orthogonal features from the first 3 seconds of audio:

Initial Silence

Human: typically 0-500msVoicemail: typically 1000-3000msThreshold: 1500ms separates human from machineMeasured via energy-level gate

Greeting Duration

Human: 0.5-2.0 seconds (hello, hi there)Machine: 3.0-8.0 seconds (you have reached...)Threshold: 2500ms for human/machine boundaryMeasured from speech onset to first pause

Speech Cadence

speech_continuity: flow vs. fragmentationcadence_regularity: rhythm uniformityHuman: irregular, natural pausesMachine: regular, recorded cadence

Beep Detection

Spectral analysis of 440-1000Hz bandbeep_tone_detected: boolean flagbeep_frequency_hz: dominant frequencyBeep = definitive machine indicator

3. Decision Tree Classifier

The four features feed into a priority-ordered decision tree:

AMD Decision Tree

1. Beep Check

If beep_tone_detected = true, verdict = machine (voicemail with beep). Confidence: 0.95+. This is the strongest single indicator.

2. Machine Features

If initial_silence > 1500ms AND greeting_duration > 2500ms AND cadence_regularity > 0.7, verdict = machine. Confidence based on feature margin.

3. Human Features

If initial_silence < 800ms AND greeting_duration < 2000ms AND speech_continuity < 0.6, verdict = human. Confidence based on feature margin.

4. Fallback

If no clear classification, verdict = unknown. Low confidence. Flag for human review or retry.

Why a Decision Tree?

The AMD classifier deliberately uses a rule-based decision tree rather than a neural network for three reasons:

Explainability — every verdict comes with a human-readable reason (“initial_silence=1800ms exceeds human threshold of 1500ms”)
Calibration — thresholds can be tuned per carrier, region, or campaign without retraining
Determinism — the same audio always produces the same verdict, critical for audit compliance

4. AMD Contracts

@dataclass(frozen=True)
class AmdFeatures:
    initial_silence_ms: int       # Silence after connect
    greeting_duration_ms: int     # First speech segment length
    speech_continuity: float      # 0.0-1.0 flow score
    cadence_regularity: float     # 0.0-1.0 rhythm uniformity
    beep_tone_detected: bool      # Frequency-domain beep flag
    beep_frequency_hz: float      # Dominant beep frequency
    background_noise_level: float # Ambient noise floor
    total_duration_ms: int        # Total analyzed duration

@dataclass(frozen=True)
class AmdResult:
    result_id: str
    correlation_id: str
    ts_ms: int
    verdict: str       # "human" | "machine" | "beep" | "unknown"
    confidence: float  # 0.0 - 1.0
    features: AmdFeatures
    reason: str        # Human-readable explanation

5. SignalOS Integration

AMD results automatically convert to SignalOS signals:

AMD Signal Flow

Audio Ingest

Twilio/RTP/SIPREC adapter receives first audio frames after call connect

Feature Extraction

AMD feature extractor analyzes initial silence, greeting, cadence, and beep in first 3 seconds

Classification

Decision tree produces AmdResult with verdict, confidence, and reason

Signal Emission

amd_to_signal() converts result to SignalEnvelope (kind=amd_event, domain=stream_io)

Governance Pipeline

Signal enters SignalOS bus for risk grading, routing, and downstream action

Downstream Actions

Verdict	Action
human	Route call to next available agent immediately
machine	Optionally leave pre-recorded message after beep
beep	Trigger automated message playback
unknown	Hold briefly, retry classification, or route to agent

6. Evidence Pack Integration

AMD results are automatically included in the session evidence pack:

AMD Card: verdict, confidence, all four feature values, decision reason
Timeline: AMD classification timestamp relative to call connect
Audit Trail: feature thresholds used, calibration version, carrier info

This ensures that every AMD decision is traceable and auditable for compliance.

7. Performance Characteristics

Metric	Value
Classification latency	< 3 seconds from call connect
Beep detection accuracy	> 98% (strongest single feature)
Human/machine accuracy	> 92% (four-feature fusion)
False positive rate (human as machine)	< 3%
Kernel tests	12 (classifier + contracts)
Frozen contracts	2 (AmdFeatures, AmdResult)
Pure domain modules	1 (amd_classifier.py)
Gate violations	0