Presentation Material
Abstract
As organizations are adding security layers to online interactions attackers are targeting the voice-channel to take over an account. That has resulted in a 210% increase in voice fraud since 2013. This talk will outline the voice fraud landscape including the profile of the attackers, the tools they use including ANI spoofing (Pepe), credential stuffing (Mr. Roboto) and voice distortion (Chipmunk). It will then show how to detect these using a variety of acoustical analysis techniques including features that exist in the non-voiced audio (e.g. spectral characteristics), voiced audio (e.g. speaker recognition) and call signaling (e.g. ANI velocity). We will also look at the architecture and algorithm modifications required to do audio feature extraction and machine learning at scale that currently handles over 5 million calls minutes every single day. Finally, we will talk about the open challenges that still exist in identifying these attacks.
AI Generated Summary
The talk presented research on detecting fraudulent calls to call centers, shifting from the original “Pin Drop” PhD thesis focus on call provenance to real-world fraud prevention. The core problem is the voice channel’s insecurity, with 61% of account takeovers initiating via phone call, a figure predicted to rise to 75% by 2020, representing a multi-billion dollar problem. Attackers exploit weak knowledge-based authentication and use techniques like caller ID spoofing, VoIP gateways, burner phones, voice distortion, and synthesis to bypass defenses.
The presented solution, Pin Drop, analyzes acoustic signatures from call audio at scale. It extracts approximately 1,380 features from both voiced and non-voiced segments, examining frequency cut-offs, comfort noise (algorithmic background noise), packet loss patterns, vocal tract modeling, and codec dictionary artifacts. These features reveal device and network fingerprints, allowing detection of VoIP usage (responsible for 44% of attacks), voice modification, and synthetic speech. For instance, unnatural vocal tract transitions can indicate synthesis, while specific frequency distributions identify VoIP gateways.
The system processes massive datasets, having analyzed 731 million calls from 131 million numbers across 87 million accounts, detecting 942,000 fraud events from 43,000 unique attackers. Fraud rates have increased 210% since 2013, from 1 in 2,900 calls to 1 in 638. The technology is deployed by eight of the top ten U.S. banks and major insurers, providing real-time agent-level alerts during calls. Key takeaways include the necessity of audio-based detection beyond phone number checks, the prevalence of VoIP in attacks, and the effectiveness of modeling non-voiced audio and vocal tract anomalies to combat evolving voice fraud at scale.