Google Unveils RETVec - Gmail's New Defense Against Spam and Malicious Emails
Nov 30, 2023
Machine Learning / Email Security
Google has revealed a new multilingual text vectorizer called RETVec (short for Resilient and Efficient Text Vectorizer) to help detect potentially harmful content such as spam and malicious emails in Gmail. "RETVec is trained to be resilient against character-level manipulations including insertion, deletion, typos, homoglyphs, LEET substitution, and more," according to the project's description on GitHub. "The RETVec model is trained on top of a novel character encoder which can encode all UTF-8 characters and words efficiently." While huge platforms like Gmail and YouTube rely on text classification models to spot phishing attacks, inappropriate comments, and scams, threat actors are known to devise counter-strategies to bypass these defense measures. They have been observed resorting to adversarial text manipulations, which range from the use of homoglyphs to keyword stuffing to invisible characters. RETVec , which works on over 100 langua...