In January 2019, a critical flaw was reported in Apple's FaceTime group chats feature that made it possible for users to initiate a FaceTime video call and eavesdrop on targets by adding their own number as a third person in a group chat even before the person on the other end accepted the incoming call.
The vulnerability was deemed so severe that the iPhone maker removed the FaceTime group chats feature altogether before the issue was resolved in a subsequent iOS update.
Since then, a number of similar shortcomings have been discovered in multiple video chat apps such as Signal, JioChat, Mocha, Google Duo, and Facebook Messenger — all thanks to the work of Google Project Zero researcher Natalie Silvanovich.
"While [the Group FaceTime] bug was soon fixed, the fact that such a serious and easy to reach vulnerability had occurred due to a logic bug in a calling state machine — an attack scenario I had never seen considered on any platform — made me wonder whether other state machines had similar vulnerabilities as well," Silvanovich wrote in a Tuesday deep-dive of her year-long investigation.
How Signaling in WebRTC Works?
Although a majority of the messaging apps today rely on WebRTC for communication, the connections themselves are created by exchanging call set-up information using Session Description Protocol (SDP) between peers in what's called signaling, which typically works by sending an SDP offer from the caller's end, to which the callee responds with an SDP answer.
Put differently, when a user starts a WebRTC call to another user, a session description called an "offer" is created containing all the information necessary setting up a connection — the kind of media being sent, its format, the transfer protocol used, and the endpoint's IP address and port, among others. The recipient then responds with an "answer," including a description of its endpoint.
The entire process is a state machine, which indicates "where in the process of signaling the exchange of offer and answer the connection currently is."
Also included optionally as part of the offer/answer exchange is the ability of the two peers to trade SDP candidates to each other so as to negotiate the actual connection between them. It details the methods that can be used to communicate, regardless of the network topology — a WebRTC framework called Interactive Connectivity Establishment (ICE).
Once the two peers agree upon a mutually-compatible candidate, that candidate's SDP is used by each peer to construct and open a connection, through which media then begins to flow.
In this way, both devices share with one another the information needed in order to exchange audio or video over the peer-to-peer connection. But before this relay can happen, the captured media data has to be attached to the connection using a feature called tracks.
While it's expected that callee consent is ensured ahead of audio or video transmission and that no data is shared until the receiver has interacted with the application to answer the call (i.e., before adding any tracks to the connection), Silvanovich observed behavior to the contrary.
Multiple Messaging Apps Affected
Not only did the flaws in the apps allow calls to be connected without interaction from the callee, but they also potentially permitted the caller to force a callee device to transmit audio or video data.
The common root cause? Logic bugs in the signaling state machines, which Silvanovich said "are a concerning and under-investigated attack surface of video conferencing applications."
- Signal (fixed in September 2019) - A audio call flaw in Signal's Android app made it possible for the caller to hear the callee's surroundings due to the fact that the app didn't check if the device receiving the connect message from the callee was the caller device.
- JioChat (fixed in July 2020) and Mocha (fixed in August 2020) - Adding candidates to the offers created by Reliance JioChat and Viettel's Mocha Android apps that allowed a caller to force the target device to send audio (and video) without a user's consent. The flaws stemmed from the fact that the peer-to-peer connection had been set up even before the callee answered the call, thus increasing the "remote attack surface of WebRTC."
- Facebook Messenger (fixed in November 2020) - A vulnerability that could have granted an attacker who is logged into the app to simultaneously initiate a call and send a specially crafted message to a target who is signed in to both the app as well as another Messenger client such as the web browser, and begin receiving audio from the callee device.
- Google Duo (fixed in December 2020) - A race condition between disabling the video and setting up the connection that, in some situations, could cause the callee to leak video packets from unanswered calls.
Other messaging apps like Telegram and Viber were found to have none of the above flaws, although Silvanovich noted that significant reverse engineering challenges when analyzing Viber made the investigation "less rigorous" than the others.
"The majority of calling state machines I investigated had logic vulnerabilities that allowed audio or video content to be transmitted from the callee to the caller without the callee's consent," Silvanovich concluded. "This is clearly an area that is often overlooked when securing WebRTC applications."
"The majority of the bugs did not appear to be due to developer misunderstanding of WebRTC features. Instead, they were due to errors in how the state machines are implemented. That said, a lack of awareness of these types of issues was likely a factor," she added.
"It is also concerning to note that I did not look at any group calling features of these applications, and all the vulnerabilities reported were found in peer-to-peer calls. This is an area for future work that could reveal additional problems."