A malicious Python package on the Python Package Index (PyPI) repository has been found to use Unicode as a trick to evade detection and deploy an info-stealing malware.
The package in question, named onyxproxy, was uploaded to PyPI on March 15, 2023, and comes with capabilities to harvest and exfiltrate credentials and other valuable data. It has since been taken down, but not before attracting a total of 183 downloads.
According to software supply chain security firm Phylum, the package incorporates its malicious behavior in a setup script that's packed with thousands of seemingly legitimate code strings.
These strings include a mix of bold and italic fonts and are still readable and can be parsed by the Python interpreter, only to activate the execution of the stealer malware upon installation of the package.
"An obvious and immediate benefit of this strange scheme is readability," the company noted. "Moreover, these visible differences do not prevent the code from running, which it does."
This is made possible owing to the use of Unicode variants of what appears to be the same character (aka homoglyphs) to camouflage its true colors (e.g., self vs. ππ¦ππ§) among innocuous-looking functions and variables.
The use of Unicode to inject vulnerabilities into source code was previously disclosed by Cambridge University researchers Nicholas Boucher and Ross Anderson in an attack technique dubbed Trojan Source.
What the method lacks in sophistication, it makes up for it by creating a novel piece of obfuscated code, despite exhibiting telltale signs of copy-paste efforts from other sources.
The development highlights continued attempts on part of threat actors to find new ways to slip through string-matching based defenses, leveraging "how the Python interpreter handles Unicode to obfuscate their malware."
On a related note, Canadian cybersecurity company PyUp detailed the discovery of three new fraudulent Python packages – aiotoolbox, asyncio-proxy, and pycolorz – that were downloaded cumulatively over 1,000 times and designed to retrieve obfuscated code from a remote server.