On 8/17/2012, Ming Sun said: For my work, I regenerated the cosine idm based on the WCH files using three different text features: mutual information, latent semantic indexing and term frequency-inverse document frequency. I removed the stop words and did the words stemming before calculating those three text features. lsi, mi and tfidf are for LSI, MI, TF-IDF text features respectively. Running 'print(load("wiki_selfidm_StopWordsRM_stem_lsi.RData"))' in R shows TE, TF, GE and GF, which represent IDM's for English (E) and French (F) wiki documents using text (T) and graph (G) features. Similarly for mii and tfidf. GE and GF are the same in all three files, but TE and TF are different because of different text features.