Data Infrastructure Building Blocks for ISI. A Project of the University of Arizona (NSF #ACI-1443019), Drexel University,
University of Virginia, University of Texas at Dallas, and University of Utah
Intelligence and Security Informatics Data Sets
Internet Relay Chat Channels (IRC)
IRC channels serves as an anonymous medium for hackers and hacktivist groups to discuss and share knowledge. Unlike the content in website-based platforms such as forums, historical conversations in an IRC channel are not archived and hence must be collected in real-time. IRC channels differ from other platforms in hacker community in the sense that they require real-time data collection and analysis. The Artificial Intelligence lab at the University of Arizona has collected Anonops and Hacker which are main IRC channels that are affiliated to the well-known hacktivist group, Anonymous. These datasets can help understand hacker communication behaviors, potential attack targets, and emerging threats in a proactive manner. Each dataset is acompanied with a ReadMe file that contains the detail information about the dataset. The size of the files are approximate.
- Anonops IRC channel has been affiliated with the activities of Anonymous hacktivist group through which the group discusses a variety of topics such as planning, coordinating and sometimes announcing their future attack targets. Therefore, the dataset is crucial to predictive and proactive analysis of hacktivist communities. The dataset contains 1,874,984 messages dating from September, 2016 to May, 2018. For more information please refer to the ReadMe file.
- Suggested analytics: Identifying adept hacktivist via time to event analysis, Proactive cyber threat prediction using time series
- Suggested techniques: Cox survival analysis model, Time series analysis
- Suggested Tools: R and Scikit-learn
Anonops.zip (163 MB)
- Hacker IRC channel is another medium that is known for facilitating the activities of Anonymous hacktivist group. Similar to Anonops IRC channel monitoring this channel is important to gain intelligence about the future activities of hacktivists. The dataset contains 231,994 messages collected from September, 2016 to May, 2018.
Hacker.zip (29.5 MB)
- Ed IRC Channel has not been originally intended for hacker discussion, but due to its popularity and anonymity, a significant number of hackers and hacktivist use this IRC channel to communicate and share knowledge. Despite having a lower concentration of hacking topics, the dataset is important to monitor non-professional hackers and the interactions among them to prevent non-sophisticated attacks. The dataset contains 829,457 messages dating from from September, 2016 to May, 2018.
- Suggested analytics: Studying the interactions among non-professional hackers and identifying when they are collaborating with sophisticated hackers
- Suggested techniques: Deep text classification techniques using stylometric features
- Suggested Tools: TensorFlow, Pytorch, Keras, and Scikit-learn
Ed.zip (51.8 MB)