Our version of the data set is available here:
(please right click to download these files instead of browsing them!)
184 employee names as well as email addresses.
(time, from, to) tuples,
where "time" is in elapsed seconds since Jan. 1, 1970,
and "from" and "to" are employee indices.
Please also note that the employee
index starts from 0!
(time, from, receiver, tag),
where "tag" is 0 (to), 1 (CC), and 2 (BCC) for "receiver".
(time, from, receiver, topic),
where "topic" is assigned based on 3-means
clustering of randomly selected 3,120 out of all 125,409
messages, then NN classification for the whole corpus.
(NB: Topic "0" means an outlier, e.g., too few words
or all meaningless numbers in the message body, etc.)
(time, from, receiver, LDC_topic),
where "LDC_topic" is assigned based on Michael W.
Berry's 2001 Annotated (by Topic) Enron Email Data Set. There
are 32 topics.
(NB: Topic "0" means an outlier, e.g., too few words
or all meaningless numbers in the message body, etc. Topic
"-1" means there is no matching topic.)
[2] C.E. Priebe, J.M. Conroy, D.J. Marchette, and Y. Park, "Scan Statistics on Enron Graphs," SIAM International Conference on Data Mining, Workshop on Link Analysis, Counterterrorism and Security, Newport Beach, California, April 23, 2005. (siamenron.pdf, the reprint)
[3] Gina Kolata, "Enron Offers an Unlikely Boost to E-Mail Surveillance," New York Times,, Week in Review, May 22, 2005. (Full article) ("Finding Patterns in Corporate Chatter")
[4] C.E. Priebe, "Scan Statistics on Enron Graphs," IPAM Summer Graduate School: Intelligent Extraction of Information from Graphs and High Dimensional Data,, UCLA, July 11-29, 2005. (cepssg072105.pdf) (video presentation, you may need the lastest version of RealPlayer.)
[5] C.E. Priebe, "Scan Statistics on Enron Graphs," 2005 Fall Department of Applied Mathematics and Statistics Seminars,, September 15, 2005, The Johns Hopkins University. (cepssgams2005.pdf)
[6] Y. Park, C.E. Priebe, D.J. Marchette, "Scan Statistics on Enron Hypergraphs,", Interface 2008, Durham, North Carolina, May 21, 2008, (hgraph-interface08-handout.pdf)
[7] Y. Park, C.E. Priebe, D.J. Marchette, "Anomaly Detection using Scan Statistics on Enron Graphs and Hypergraphs,", The Satellite Workshop of the IASC 2008 Conference, Seoul, Korea, December 1-3, 2008, (iasc-handout.pdf)
[8] Y. Park, C.E. Priebe, D.J. Marchette, A. Youssef, "Anomaly Detection using Scan Statistics on Time Series of Hypergraphs,", Workshop on Link Analysis, Counterterrorism and Security at the SIAM International Conference on Data Mining, Sparks, Nevada, May 1-3, 2009, (hyperenron.pdf, the reprint)
[9] Y. Park, C.E. Priebe, A. Youssef, "Anomaly Detection in Time Series of Graphs using Fusion of Invariants,", IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No. 1, pp. 67-75, February, 2013.
[10] H. Wang, M. Tang, Y. Park, and C. E. Priebe, "Locality statistics for anomaly detection in time series of graphs," IEEE Transactions on Signal Processing, Vol 62, No 3, pp 703-717, February, 2014.