ESCROW

              readme.txt

                   ConcoctedEscrow.zip          (129 MB zipped, 276 MB unzipped)


FINANCIAL

  • Spoof Financial Files: Contains 389 spoof financial websites (e.g., banks, PayPal, eBay, etc.). URLs came from PhishTank at  http://www.phishtank.com/ These were mostly used in phishing email-based attacks. Warning – for each site, all files on the server were collected (text, images, code, etc.). Some folders may contain malware. Please review the DIBBs-ISI Malware Handling Protocol

              ​readme.txt

                     SpoofFinancial.zip              (159 MB zipped, 402 MB unzipped)


  • Legitimate Financial Files: Contains 50 legitimate financial websites (e.g., banks, PayPal, eBay, escrow sites, etc.). Can be paired up with the Financial Spoof or Concocted Escrow websites for classification tasks.

​                     readme.txt

                     LegitFinancial.zip               (180 MB zipped, 570 MB unzipped )


PHARMACY

  • Concocted Pharmacy Websites: Contains 150 concocted pharmacies that used black hat SEO (i.e., link spam) to reach  the top 100 in search engine rankings. URLs were verified thru LegitScript at http://www.legitscript.com/.

                     readme.txt

​                     ConcoctedPharma.zip         (1.4 GB zipped, 8.1 GB unzipped )


  • Legitimate Pharmacy Websites: Contains 150 legitimate pharmacies. Legitimacy was verified through the National Association of Boards of Pharmacies (NABP) and LegitScript at http://www.legitscript.com/.

                     readme.txt​

​                     LegitPharma.zip                  (659 MB zipped, 2.6 GB unzipped)


  • Pharmacy Extracts: Contains text and URL extracts from the aforementioned 150 concocted and 150 legitimate pharmacies (derived using HTML parsing tools). The text and URL extract files are site-level, with all text for a given website appearing in a single row.

​                     readme.txt

​                     PharmaExtracts.zip              (107 MB zipped, 860 MB unzipped)


TARGETED BRANDS

  • Phishing-Targeted Brands – Contains time series data from 2006 through 2015 for 178 prominent targeted brands, with URL and Whois information for each phishing attack. The data includes nearly 1.5 million attack URLs.

                      readme.txt
                      TargetedBrands.zip             (86.5 MB zipped, 1.6 GB unzipped))


PHISHMONGER

  • ​PhishMonger - Contains ~252,000 phishing websites collected between November 2015 and May 2017. This research is ongoing and more websites will be added to this portal as the researchers make them available to the public. CAUTION: THIS DATASET MAY CONTAIN MALWARE. Please review the DIBBs-ISI Malware Handling Protocol

               readme-PhishMonger.txt

                     readme2.txt

                     PhishMonger_Dobolyi_Abbasi_ISI-2016_preprint.pdf

                     IEEE_ISI_2016_Poster.pdf

                     IEEE_ISI_2016_Presentation_Short.pdf​

                              Due to file size, some outputs were separated into multiple parts. Download all parts into one folder and open the first file (001) to                     recombine.

               OUTPUT 1-28

                     output_1-28.zip.001    (10.2 GB zipped)       
                     output_1-28.zip.002    (10.2 GB zipped)
                     output_1-28.zip.003    (10.2 GB zipped)
                     output_1-28.zip.004    (10.2 GB zipped)
                     output_1-28.zip.005    (10.2 GB zipped)
                     output_1-28.zip.006    (10.2 GB zipped)
                     output_1-28.zip.007    (10.2 GB zipped)
                     output_1-28.zip.008    (10.2 GB zipped)
                     output_1-28.zip.009    (1.1 GB zipped)

                     OUTPUT 29-661

                     output_29-661.tar        (12.7 GB compresed)

               OUTPUT 662-1323

                     output_662-1323.zip.001  (9.7 GB zipped)

​                     output_662-1323.zip.002  (6.4 GB zipped)

                     OUTPUT 1324-1980

               output_1324-1980.zip.001    (9.7 GB zipped)

                     output_1324-1980.zip.002    (9.7 GB zipped)

                     output_1324-1980.zip.003    (9.7 GB zipped)

                     output_1324-1980.zip.004    (9.7 GB zipped)

                     output_1324-1980.zip.005    (7 GB zipped)

​                     OUTPUT 1981-2506

                     output_1981-2506.zip.001    (9.7 GB zipped)

                     output_1981-2506.zip.002    (1.4 GB zipped)

​                     OUTPUT 2507-2895

                     output_2507-2895.zip.001    (10 GB zipped)

                     output_2507-2895.zip.002    (5 GB zipped)  

​​                     OUTPUT 2896-3649

​               output_2896-3649.zip.001    (10 GB zipped)
​                     output_2896-3649.zip.002    (7.8 GB zipped) 
​                     OUTPUT 3650-4044
​                     output_3650-4044.zip.001    (10 GB zipped)
​                     output_3650-4044.zip.002    (10 GB zipped)
​                     output_3650-4044.zip.003    (10 GB zipped)
​                     output_3650-4044.zip.004    (1.6 GB zipped)
   ​                  OUTPUT 4045-4831
​                     output_4045-4831.zip.001    (10 GB zipped)    
​                     output_4045-4831.zip.002    (2.7 GB zipped)    
    ​                 OUTPUT 4832-5512
​                     output_4832-5512.zip.001    (10 GB zipped)
​                     output_4832-5512.zip.002    (3.1 GB zipped)
​                     OUTPUT 5513-6278
​                     output_5513-6278.tar            (8.1 GB zipped)
  ​                   OUTPUT 6279-7095
​                     output_6279-7095.tar            (5.7 GB zipped)
     ​                OUTPUT 7096-7697
​                     output_7096-7697.zip.001    (10 GB zipped)
​                     output_7096-7697.zip.002    (6.9 GB zipped)
       ​              OUTPUT 7698-8409 
​                     output_7698-8409.tar           (14.7 GB zipped)

                     OUTPUT 8410-9154

             output_8410-9154.tar           (15.9 GB zipped)

                     OUTPUT 9155-9909           

                     output_9155-9909.tar            (11.2 GB zipped)  
    

Previous Research

See the Papers page for previously published research relating to phishing websites and other topics.

These phishing websites were collected by the University of Virginia during 2006-2015. They are presented here by type of institution or organization.  Click on the link for the organization type of interest to download the compressed files: 

Internet Phishing Websites

 

Intelligence and Security Informatics Data Sets

AZSecure-data.org

Data Infrastructure Building Blocks for ISI. A Project of the University of Arizona (NSF #ACI-1443019), Drexel University,

University of Virginia, University of Texas at Dallas, and University of Utah