AMD: Answering Machine Detection Engine

In outbound calling, 30-40% of calls reach voicemail. Every second an agent waits for a machine to finish its greeting is wasted labor. The AMD Engine classifies answering entities in under 3 seconds using a four-feature decision tree — detecting humans, voicemail systems, beep tones, and unknown entities with explainable confidence scores.


1. The Outbound Calling Problem

Outbound contact centers face a persistent efficiency challenge: a significant fraction of calls connect to answering machines rather than humans. Without AMD:

  • Agents wait 5-15 seconds listening to voicemail greetings before realizing no human is present
  • False positives (human classified as machine) cause premature hangups and lost customers
  • False negatives (machine classified as human) waste agent time on voicemail recordings
  • Beep detection failures mean agents miss the opportunity to leave automated messages

Traditional AMD systems rely on single features (usually silence duration or greeting length) and take 5-8 seconds to reach a verdict — by which time, a human caller has already said “hello” and heard silence.


2. Four-Feature Decision Architecture

The AMD Engine extracts four orthogonal features from the first 3 seconds of audio:

Initial Silence
Human: typically 0-500msVoicemail: typically 1000-3000msThreshold: 1500ms separates human from machineMeasured via energy-level gate
Greeting Duration
Human: 0.5-2.0 seconds (hello, hi there)Machine: 3.0-8.0 seconds (you have reached...)Threshold: 2500ms for human/machine boundaryMeasured from speech onset to first pause
Speech Cadence
speech_continuity: flow vs. fragmentationcadence_regularity: rhythm uniformityHuman: irregular, natural pausesMachine: regular, recorded cadence
Beep Detection
Spectral analysis of 440-1000Hz bandbeep_tone_detected: boolean flagbeep_frequency_hz: dominant frequencyBeep = definitive machine indicator

3. Decision Tree Classifier

The four features feed into a priority-ordered decision tree:

AMD Decision Tree
1
1. Beep Check
If beep_tone_detected = true, verdict = machine (voicemail with beep). Confidence: 0.95+. This is the strongest single indicator.
2
2. Machine Features
If initial_silence > 1500ms AND greeting_duration > 2500ms AND cadence_regularity > 0.7, verdict = machine. Confidence based on feature margin.
3
3. Human Features
If initial_silence < 800ms AND greeting_duration < 2000ms AND speech_continuity < 0.6, verdict = human. Confidence based on feature margin.
4
4. Fallback
If no clear classification, verdict = unknown. Low confidence. Flag for human review or retry.

Why a Decision Tree?

The AMD classifier deliberately uses a rule-based decision tree rather than a neural network for three reasons:

  1. Explainability — every verdict comes with a human-readable reason (“initial_silence=1800ms exceeds human threshold of 1500ms”)
  2. Calibration — thresholds can be tuned per carrier, region, or campaign without retraining
  3. Determinism — the same audio always produces the same verdict, critical for audit compliance

4. AMD Contracts

@dataclass(frozen=True)
class AmdFeatures:
    initial_silence_ms: int       # Silence after connect
    greeting_duration_ms: int     # First speech segment length
    speech_continuity: float      # 0.0-1.0 flow score
    cadence_regularity: float     # 0.0-1.0 rhythm uniformity
    beep_tone_detected: bool      # Frequency-domain beep flag
    beep_frequency_hz: float      # Dominant beep frequency
    background_noise_level: float # Ambient noise floor
    total_duration_ms: int        # Total analyzed duration

@dataclass(frozen=True)
class AmdResult:
    result_id: str
    correlation_id: str
    ts_ms: int
    verdict: str       # "human" | "machine" | "beep" | "unknown"
    confidence: float  # 0.0 - 1.0
    features: AmdFeatures
    reason: str        # Human-readable explanation

5. SignalOS Integration

AMD results automatically convert to SignalOS signals:

AMD Signal Flow
1
Audio Ingest
Twilio/RTP/SIPREC adapter receives first audio frames after call connect
2
Feature Extraction
AMD feature extractor analyzes initial silence, greeting, cadence, and beep in first 3 seconds
3
Classification
Decision tree produces AmdResult with verdict, confidence, and reason
4
Signal Emission
amd_to_signal() converts result to SignalEnvelope (kind=amd_event, domain=stream_io)
5
Governance Pipeline
Signal enters SignalOS bus for risk grading, routing, and downstream action

Downstream Actions

VerdictAction
humanRoute call to next available agent immediately
machineOptionally leave pre-recorded message after beep
beepTrigger automated message playback
unknownHold briefly, retry classification, or route to agent

6. Evidence Pack Integration

AMD results are automatically included in the session evidence pack:

  • AMD Card: verdict, confidence, all four feature values, decision reason
  • Timeline: AMD classification timestamp relative to call connect
  • Audit Trail: feature thresholds used, calibration version, carrier info

This ensures that every AMD decision is traceable and auditable for compliance.


7. Performance Characteristics

MetricValue
Classification latency< 3 seconds from call connect
Beep detection accuracy> 98% (strongest single feature)
Human/machine accuracy> 92% (four-feature fusion)
False positive rate (human as machine)< 3%
Kernel tests12 (classifier + contracts)
Frozen contracts2 (AmdFeatures, AmdResult)
Pure domain modules1 (amd_classifier.py)
Gate violations0
OctopusOS
How can we help?