Skip to content

undo-ransomware/ransomware-dataset

Repository files navigation

Ransomware samples dataset

Our ransomware dataset is based on VirusShare's collection of 33.9M samples. We used John Seymour's dataset containing the VirusTotal labels of all 33.2M samples from June 2012 to February 2019.

We downloaded the Raw dataset and filtered it for all ransom detections. These 456856 samples are then further filtered for Windows executables using the VirusShare filetypes dataset. Filtering by filetype is mostly meant to remove a significant number of browser-based HTML ransom demands, which are scary but harmless (in an up-to-date browser).

The resulting 339594 samples were then classified using the AVClass malware labeling tool to group them by family. This yielded 23616 SINGLETONs (samples with generic names only), 1562 "families" containing only one sample, and 1671 ransomware families with 2 or more members. Filtering out the SINGLETONs leaves a base set of 315978 samples.

almost but not quite a power law

To the surprise of absolutely no one, it's the usual long-tailed distribution. What is surprising is that the 2-sample families do contain some ransomware that did make the news, eg. GoldenEye, ZeroLocker and Bad Rabbit. The 1-sample families contain many generic names like 940677ecdf or aawj, but also known ransomware like Alcatraz Locker.

The head end:

Zeus, Winwebsec, Virlock, ZeroAccess, PornoBlocker, …

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published