Security Metrics

Organizations have little idea of how secure they are against varying degrees of attacks. Even after expensive penetration testing that may reveal vulnerabilities, an organization faces the difficult process of finding a security product that improves their security posture. We propose a novel scalable method for answering vital questions such as “How secure is an organization?” and “What is the most effective additional security product to purchase and deploy?” by empirical experiments. We conduct experiments gathering and generating data as well as testing security control across layers and multiple attack vectors. See the papers below for details.

This data is made available to the research community for use in noncommercial research. For permission for other uses please contact Please cite any use of the data by crediting the Columbia University IDS Lab and referencing the associated paper. If you have any questions or run into difficulty in using the data please contact Nathaniel Boggs at

Measuring Drive-by Download Defense in Depth
Note that this data was collected in Spring 2012 and as such is ill suited for testing against any current sensors.
Database containing emails, attack URLs, scans and detections: ../
Executable files collected (BitTorrent sync hash ~250MB): BYDXABTVZ32OFO3X5SZE4X6PVUN5PO4QV
Full network capture of each attack (BitTorrent sync hash ~1.5GB): BUDOXXGP7F5SJYT4DBLV3UJ5D7WBLGK2X

Synthetic Data Generation and Defense in Depth Measurement of Web Applications
VMs to run Wind Tunnel yourself (recommended): Coming soon …

Raw multilayer data generated and used in our example experiments. All datasets combine to around 400GB. See paper for details on how datasets are generated. For Wind Tunnel data sets see the attack window file for the start and end timestamp of each attack in order to find ground truth. All attacks were launched from Note that in our paper we did not use all the attacks. The first half of attacks for each dataset were launched while the server was under load and were note used in the paper as some syscalls were missed. The testlink and tikiwiki data padding attacks were placeholders and not used (any attack labeled in the attack windows files as 'padding'). For further details and questions email We are happy to work with you. We plan to release the full framework in the near future, which should be much more useful and easier to work with that raw data files.

Run name - BitTorrent Sync Key
Wordpress usenet short - BRCNYLFYS72UN2TO6LAURUOF2FCEAEDDV
Wordpress wikipedia data - BRWE73HWWNKP5GB2TWZPZBI6PGW3EZZOE
Wordpress other server - BNRK7AZGWGE5IRYETCMUZD6754KN7GDKR