SoReL-20M: A Huge Dataset of 20 Million Malware Samples Released Online
Dec 14, 2020
Cybersecurity firms Sophos and ReversingLabs on Monday jointly released the first-ever production-scale malware research dataset to be made available to the general public that aims to build effective defenses and drive industry-wide improvements in security detection and response. " SoReL-20M " (short for So phos- Re versing L abs – 20 M illion), as it's called, is a dataset containing metadata, labels, and features for 20 million Windows Portable Executable (.PE) files, including 10 million disarmed malware samples, with the goal of devising machine-learning approaches for better malware detection capabilities. "Open knowledge and understanding about cyber threats also leads to more predictive cybersecurity," Sophos AI group said. "Defenders will be able to anticipate what attackers are doing and be better prepared for their next move." Accompanying the release are a set of PyTorch and LightGBM -based machine learning models pre-trained...