(129 MB zipped, 276 MB unzipped)


  • Spoof Financial Files: Contains 389 spoof financial websites (e.g., banks, PayPal, eBay, etc.). URLs came from PhishTank at These were mostly used in phishing email-based attacks. Warning – for each site, all files on the server were collected (text, images, code, etc.). Some folders may contain malware. Please review the DIBBs-ISI Malware Handling Protocol


                         (159 MB zipped, 402 MB unzipped)

  • Legitimate Financial Files: Contains 50 legitimate financial websites (e.g., banks, PayPal, eBay, escrow sites, etc.). Can be paired up with the Financial Spoof or Concocted Escrow websites for classification tasks.

​                     readme.txt

                          (180 MB zipped, 570 MB unzipped )


  • Concocted Pharmacy Websites: Contains 150 concocted pharmacies that used black hat SEO (i.e., link spam) to reach  the top 100 in search engine rankings. URLs were verified thru LegitScript at


​                    (1.4 GB zipped, 8.1 GB unzipped )

  • Legitimate Pharmacy Websites: Contains 150 legitimate pharmacies. Legitimacy was verified through the National Association of Boards of Pharmacies (NABP) and LegitScript at


​                             (659 MB zipped, 2.6 GB unzipped)

  • Pharmacy Extracts: Contains text and URL extracts from the aforementioned 150 concocted and 150 legitimate pharmacies (derived using HTML parsing tools). The text and URL extract files are site-level, with all text for a given website appearing in a single row.

​                     readme.txt

​                         (107 MB zipped, 860 MB unzipped)


  • Phishing-Targeted Brands – Contains time series data from 2006 through 2015 for 178 prominent targeted brands, with URL and Whois information for each phishing attack. The data includes nearly 1.5 million attack URLs.

                         (86.5 MB zipped, 1.6 GB unzipped))


  • ​PhishMonger - Contains 171,360 phishing websites collected between November 2015 and September 2016. This research is ongoing and more websites will be added to this portal as the researchers make them available to the public. CAUTION: THIS DATASET MAY CONTAIN MALWARE. Please review the DIBBs-ISI Malware Handling Protocol






                              Due to file size, some outputs were separated into multiple parts. Download all parts into one folder and open the first file (001) to                     recombine.

               OUTPUT 1-28

               (10.2 GB zipped)       
               (10.2 GB zipped)
               (10.2 GB zipped)
               (10.2 GB zipped)
               (10.2 GB zipped)
               (10.2 GB zipped)
               (10.2 GB zipped)
               (10.2 GB zipped)
               (1.1 GB zipped)

                     OUTPUT 29-661

                     output_29-661.tar        (12.7 GB compresed)

               OUTPUT 662-1323

             (9.7 GB zipped)

​             (6.4 GB zipped)

                     OUTPUT 1324-1980

         (9.7 GB zipped)

               (9.7 GB zipped)

               (9.7 GB zipped)

               (9.7 GB zipped)

               (7 GB zipped)

​                     OUTPUT 1981-2506

               (9.7 GB zipped)

               (1.4 GB zipped)

​                     OUTPUT 2507-2895

               (10 GB zipped)

               (5 GB zipped)  

​​                     OUTPUT 2896-3649

​         (10 GB zipped)
​               (7.8 GB zipped) 
​                     OUTPUT 3650-4044
​               (10 GB zipped)
​               (10 GB zipped)
​               (10 GB zipped)
​               (1.6 GB zipped)
   ​                  OUTPUT 4045-4831
​               (10 GB zipped)    
​               (2.7 GB zipped)    
    ​                 OUTPUT 4832-5512
​               (10 GB zipped)
​               (3.1 GB zipped)
​                     OUTPUT 5513-6278
​                     output_5513-6278.tar            (8.1 GB zipped)
  ​                   OUTPUT 6279-7095
​                     output_6279-7095.tar            (5.7 GB zipped)
     ​                OUTPUT 7096-7697
​               (10 GB zipped)
​               (6.9 GB zipped)
       ​              OUTPUT 7698-8409 
​                     output_7698-8409.tar           (14.7 GB zipped)

Previous Research

See the Papers page for previously published research relating to phishing websites and other topics.

These phishing websites were collected by the University of Virginia during 2006-2015. They are presented here by type of institution or organization.  Click on the link for the organization type of interest to download the compressed files: 

Internet Phishing Websites


Data Infrastructure Building Blocks for ISI. A Project of the University of Arizona (NSF #ACI-1443019), Drexel University,

University of Virginia, University of Texas at Dallas, and University of Utah

Intelligence and Security Informatics Data Sets