Intelligence and Security Informatics Data Sets

Data Infrastructure Building Blocks for ISI. A Project of the University of Arizona (NSF #ACI-1443019), Drexel University,

University of Virginia, University of Texas at Dallas, and University of Utah

‚ÄčAZSecure Hacker Assets Portal

AZSecure Hacker Assets Portal is one of several projects within the Artificial Intelligence Lab's Hacker Web program directed by Dr. Hsinchun Chen at the Artificial Intelligence Lab at the University of Arizona, which aims to provide hacker forum contents and analysis for Scholarship-for-Service (SFS) education, research, training, and development of cyber threat intelligence capabilities.  The provided datasets specifically focus on attachments, and source code examples from hacker forums. The provided assets have been collected from English, Russian, and Arabic hacker communities. With this compiled collection of assets, the goal is to provide educators with the means to facilitate research and derive insights about hacker assets and the hacker community. Please refer to AZSecure Hacker Portal for visualization of these assets and searching them.

  • Attachments dataset: Attachments are malicious files (exploits, binaries, etc.) attached to forum posts. They can often directly execute malicious cyberattacks. This collection contains 14,865 links to attachments from Russian, English, and Arabic hacker forums that were exchanged in three forums with largest amount of attachments, Ashiyane, Opensc, and Tuts4you. The attachments cover a vast range of malicious hacking tools such as keyloggers, Zeus malware, BlackPOS malware, DDoS attacks, Remote Administration Tools (RATs), bots, crypters, and mobile malware. The attachments date from 5/30/2003 to 9/25/2016.
    • Suggested analytics: The data set can be used for dynamic malware analysis as well as pinpointing the key hackers who share exploits in hacker community.
    • Suggested techniques: Dynamic assessment of malware tools via creation of a sandbox environment for downloaded links
    • Suggested tools: VirusTotal, Cukoo Sandbox


             (53.7 MB)

  • Source codes dataset: Source codes are uncompiled code that are embedded in a forum's post. This collection includes 15,582 source code snippets from Russian, English, and Arabic hacker forums. These source code snippets have been extracted from four of the forums with largest concentration of source codes, Ashiyane, Opensc, Exelab and Xeksec forums.  These source codes were extracted from English, Russian, and Arabic hacker communities. Examples of assets in this collection include SQL injections, Zeus code, worms, and crypters. The source codes date from 2/7/2005 to 10/27/2016.
    • Suggested analytics: The data set can be used for static malware analysis, source code visualization, and identifying the key specialized hackers who have the ability of creating tools in hacker forums.
    • Suggested techniques: Applying text mining techniques to gain insight about the used languages, and the attack vectors of the provided source code assets via unsupervised topic modeling and SOM clustering
    • Suggested tools: VirusTotal, Cukoo Sandbox, D3, Scikit-learn

             (38.8 MB)