Previous Research

See the Papers page for previously published research relating to phishing websites and other topics.

Internet Phishing Websites

 

Intelligence and Security Informatics Data Sets

AZSecure-data.org

Data Infrastructure Building Blocks for ISI. A Project of the University of Arizona (NSF #ACI-1443019), Drexel University,

University of Virginia, University of Texas at Dallas, and University of Utah

These phishing websites were collected by the University of Virginia. They are presented here by type of organization. Click on the link for the organization type of interest to download the compressed files: 

ESCROW

              readme.txt

                   ConcoctedEscrow.zip          (129 MB zipped, 276 MB unzipped)


FINANCIAL

  • Spoof Financial Files: Contains 389 spoof financial websites (e.g., banks, PayPal, eBay, etc.). URLs came from PhishTank at  http://www.phishtank.com/ These were mostly used in phishing email-based attacks. Warning – for each site, all files on the server were collected (text, images, code, etc.). Some folders may contain malware. Please review the DIBBs-ISI Malware Handling Protocol

              ​readme.txt

                     SpoofFinancial.zip              (159 MB zipped, 402 MB unzipped)


  • Legitimate Financial Files: Contains 50 legitimate financial websites (e.g., banks, PayPal, eBay, escrow sites, etc.). Can be paired up with the Financial Spoof or Concocted Escrow websites for classification tasks.

​                     readme.txt

                     LegitFinancial.zip               (180 MB zipped, 570 MB unzipped )


PHARMACY

  • Concocted Pharmacy Websites: Contains 150 concocted pharmacies that used black hat SEO (i.e., link spam) to reach  the top 100 in search engine rankings. URLs were verified thru LegitScript at http://www.legitscript.com/.

                     readme.txt

​                     ConcoctedPharma.zip         (1.4 GB zipped, 8.1 GB unzipped )


  • Legitimate Pharmacy Websites: Contains 150 legitimate pharmacies. Legitimacy was verified through the National Association of Boards of Pharmacies (NABP) and LegitScript at http://www.legitscript.com/.

                     readme.txt​

​                     LegitPharma.zip                  (659 MB zipped, 2.6 GB unzipped)


  • Pharmacy Extracts: Contains text and URL extracts from the aforementioned 150 concocted and 150 legitimate pharmacies (derived using HTML parsing tools). The text and URL extract files are site-level, with all text for a given website appearing in a single row.

​                     readme.txt

​                     PharmaExtracts.zip              (107 MB zipped, 860 MB unzipped)


TARGETED BRANDS

  • Phishing-Targeted Brands – Contains time series data from 2006 through 2015 for 178 prominent targeted brands, with URL and Whois information for each phishing attack. The data includes nearly 1.5 million attack URLs.

                      readme.txt
                      TargetedBrands.zip             (86.5 MB zipped, 1.6 GB unzipped))


PHISHMONGER

  • ​PhishMonger - Contains ~393,000 phishing websites collected between November 2015 and May 2018. Each output series represents one month of collection. This research is ongoing and more websites will be added to this portal as the researchers make them available to the public. CAUTION: THIS DATASET MAY CONTAIN MALWARE. Please review the DIBBs-ISI Malware Handling Protocol. The phishmonger tool is maintained at https://github.com/mcintirecba/phishmonger.

               readme-PhishMonger.txt

                     readme2.txt

                     PhishMonger_Dobolyi_Abbasi_ISI-2016_preprint.pdf

                     IEEE_ISI_2016_Poster.pdf

                     IEEE_ISI_2016_Presentation_Short.pdf​

                              Due to file size, some outputs were separated into multiple parts. Download all parts into one folder and open the first file (001) to                     recombine.

               

               INDEX FILES 1-21113

              index.zip    (69 GB zipped)

              OUTPUT 1-28

                     output_1-28.zip.001    (10.2 GB zipped)       
                     output_1-28.zip.002    (10.2 GB zipped)
                     output_1-28.zip.003    (10.2 GB zipped)
                     output_1-28.zip.004    (10.2 GB zipped)
                     output_1-28.zip.005    (10.2 GB zipped)
                     output_1-28.zip.006    (10.2 GB zipped)
                     output_1-28.zip.007    (10.2 GB zipped)
                     output_1-28.zip.008    (10.2 GB zipped)
                     output_1-28.zip.009    (1.1 GB zipped)

                     OUTPUT 29-661

                     output_29-661.tar        (12.7 GB)

               OUTPUT 662-1323

                     output_662-1323.zip.001  (9.7 GB zipped)

​                     output_662-1323.zip.002  (6.4 GB zipped)

                     OUTPUT 1324-1980

               output_1324-1980.zip.001    (9.7 GB zipped)

                     output_1324-1980.zip.002    (9.7 GB zipped)

                     output_1324-1980.zip.003    (9.7 GB zipped)

                     output_1324-1980.zip.004    (9.7 GB zipped)

                     output_1324-1980.zip.005    (7 GB zipped)

​                     OUTPUT 1981-2506

                     output_1981-2506.zip.001    (9.7 GB zipped)

                     output_1981-2506.zip.002    (1.4 GB zipped)

​                     OUTPUT 2507-2895

                     output_2507-2895.zip.001    (10 GB zipped)

                     output_2507-2895.zip.002    (5 GB zipped)  

​​                     OUTPUT 2896-3649

​               output_2896-3649.zip.001    (10 GB zipped)
​                     output_2896-3649.zip.002    (7.8 GB zipped) 
​                     OUTPUT 3650-4044
​                     output_3650-4044.zip.001    (10 GB zipped)
​                     output_3650-4044.zip.002    (10 GB zipped)
​                     output_3650-4044.zip.003    (10 GB zipped)
​                     output_3650-4044.zip.004    (1.6 GB zipped)
   ​                  OUTPUT 4045-4831
​                     output_4045-4831.zip.001    (10 GB zipped)    
​                     output_4045-4831.zip.002    (2.7 GB zipped)    
    ​                 OUTPUT 4832-5512
​                     output_4832-5512.zip.001    (10 GB zipped)
​                     output_4832-5512.zip.002    (3.1 GB zipped)
​                     OUTPUT 5513-6278
​                     output_5513-6278.tar            (8.1 GB)
  ​                   OUTPUT 6279-7095
​                     output_6279-7095.tar            (5.7 GB)
     ​                OUTPUT 7096-7697
​                     output_7096-7697.zip.001    (10 GB zipped)
​                     output_7096-7697.zip.002    (6.9 GB zipped)
       ​              OUTPUT 7698-8409 
​                     output_7698-8409.tar           (14.7 GB)

                     OUTPUT 8410-9154

             output_8410-9154.tar           (15.9 GB)

                     OUTPUT 9155-9909           

                     output_9155-9909.tar            (11.2 GB)  

                     OUTPUT 9910-10589            
                     output_9910-10589.tar          (11.4 GB)  

                     OUTPUT 10590-11375            
                     output_10590-11375.tar          (5.6 GB)  

                     OUTPUT 11376-12183            
                     output_11376-12183 .tar          (7.6 GB)  

               OUTPUT 12184-12895

              output_12184-12895.tar            (7.2 GB)  

​              OUTPUT 12896-13519

                     output_12896-13519.tar            (11 GB)

                     OUTPUT 13520-14395

                     output_13520-14395.tar           (22 GB)

                     OUTPUT 14396-15085

​                     output_14396-15085.tar            (14.3 GB)

               OUTPUT 15086-15506

​                     output_15086-15506.tar            (31.2 GB)​

                     OUTPUT 15507-16276
​                     output_15507-16276.tar            (31 GB)​

​                     OUTPUT 16277-16951
​                     output_16277-16951.tar            (12 GB)​

​​                     OUTPUT 16952-17583
​                     output_16952-17583.tar            (25.5 GB)​

​​                     OUTPUT 17584-18313
​                     output_17584-18313.tar            (22.1 GB)​

           OUTPUT_18314-18822

           output_18314-18822.tar           (13.5 GB)

                     OUTPUT-18823-19665

              output_18823-19665.tar           (29 GB)

                     OUTPUT-19666-20256
                     output_19666-20256.tar           (18 GB)

                     OUTPUT-20257-21113
                     output-20257-21113.tar           (24 GB)