A group of academics has devised a "deep learning-based acoustic side-channel attack" that can be used to classify laptop keystrokes that are recorded using a nearby phone with 95% accuracy.
"When trained on keystrokes recorded using the video conferencing software Zoom, an accuracy of 93% was achieved, a new best for the medium," researchers Joshua Harrison, Ehsan Toreini, and Maryam Mehrnezhad said in a new study published last week.
Side-channel attacks refer to a class of security exploits that aim to glean insights from a system by monitoring and measuring its physical effects during the processing of sensitive data. Some of the common observable effects include runtime behavior, power consumption, electromagnetic radiation, acoustics, and cache accesses.
Although a completely side-channel-free implementation does not exist, practical attacks of this kind can have damaging consequences for user privacy and security as they could be weaponized by a malicious actor to obtain passwords and other confidential data.
"The ubiquity of keyboard acoustic emanations makes them not only a readily available attack vector, but also prompts victims to underestimate (and therefore not try to hide) their output," the researchers said. "For example, when typing a password, people will regularly hide their screen but will do little to obfuscate their keyboard's sound."
To pull off the attack, the researchers first carried out experiments in which 36 of the Apple MacBook Pro's keys were used (0-9, a-z), with each key being pressed 25 times in a row, varying in pressure and finger. This information was recorded both via a phone in close physical proximity to the laptop and Zoom.
The next phase entailed isolating the individual keystrokes and converting them into a mel-spectrogram, on which a deep learning model called CoAtNet (pronounced "coat" nets and short for convolution and self-attention networks) was run to classify the keystroke images.
As countermeasures, the researchers recommend typing style changes, using randomized passwords as opposed to passwords containing full words, and adding randomly generated fake keystrokes for voice call-based attacks.