American Journal of Mathematical and Computer Modelling
Volume 2, Issue 3, August 2017, Pages: 84-87
Received: Jan. 30, 2017;
Accepted: Feb. 18, 2017;
Published: Mar. 9, 2017
Views 2158 Downloads 154
Himanshu Sharma, School of CSE, Jaipur National University, Jaipur, India
Arvind K. Sharma, Dept of CSI, University of Kota, Kota, Rajasthan, India
Now days, topic models have been widely used to identify topics in text corpora. Topic modelling is a mechanism of extracting common topics which occurs among the collection of documents. Topic models are actually a suite of algorithms which uncover the hidden thematic structure in document collections. These algorithms shall definitely be help to develop new paradigms to search, browse and summarize large archive of texts. This paper presents a survey of various important topic modelling techniques and tools which highlights the probabilistic topic models. The primary aim of this paper is to help researchers who do not have a strong background in mathematics or statistics to feel comfortable with using topic modelling methods and tools in their research work. Apart from it, the merits and demerits of topic modelling methods are also summarized.
Arvind K. Sharma,
Study and Analysis of Topic Modelling Methods and Tools – A Survey, American Journal of Mathematical and Computer Modelling.
Vol. 2, No. 3,
2017, pp. 84-87.
Papadimitriou, C. H., Tamaki, H., Raghavan, P., &Vempala, S. (1998). Latent semantic indexing: A Probabilistic Analysis, Paper presented at the Proceedings of the Seventeenth ACM Sigact-Sigmod-Sigart Symposium on Principles of Database Systems.
Hofmann, T. (1999), Probabilistic Latent Semantic Indexing, Paper presented at the Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003), Latent Dirichlet Allocation, the Journal of Machine Learning Research, 3, 993-1022.
Rebecca Katherine Abey, The Statistics of Topic Modelling, University of Canterbury, 2015.
Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal, “Mining Frequent Patterns With Counting Inference, ” ACM SIGKDD Explorations Newsletter, Vol.2, No.2, pp.66–75, 2000.
M. J. Zaki and C. J. Hsiao, “CHARM: An Efficient Algorithm for Closed Itemset Mining”, in Proceedings, SDM, Vol.2, 2002, pp.457–473.
X. Wei and W. B. Croft, “LDA-based Document Models for Ad-Hoc Retrieval”, in Proceedings 29th Annual International, ACM SIGIR Conf. Res. Develop. Information Retrieval, 2006, pp.178–185.
David M. Blei, “Introduction to Probabilistic Topic Models”, Communications of the ACM, 2011 pp.
Mark Steyvers, Tom Griffiths, “Probabilistic Topic Models”, In Landauer.
Zhu, Jun and Eric P Xing, “Conditional Topic Random Fields”, Forbes. Ed. Johannes Fürnkranzand Thorsten Joachims.
A. Gruber, M. Rosen-Zvis and Y. Weiss, “Hidden Topic Markov Models”, in Artificial Intelligence and Statistics, 2007.
T. L. Griffiths, M. Steyvers, D. M. Blei, and J. B. Tenenbaum, “Integrating Topics and Syntax”, In Advances in Neural Information Processing Systems 17, Vol.17, 2005, pp. 537-44.
M. Divya, et al., “A Survey on Topic Modelling”, International Journal of Recent Advances in Engineering & Technology (IJRAET), Volume-1, Issue - 2, 2013.
Hofmann, T., Unsupervised learning by probabilistic latent semantic analysis, Machine Learning, 42 (1), 2001, 177-196.
Blei, D. M., Ng, A. Y., and Jordan, M. I., -Latent Dirichlet Allocation, Journal of Machine Learning Research, 3, 2003, 993-1022.
Ahmed, A., Xing, E. P., and William W., -Joint Latent Topic Models for Text and Citations, ACM New York, NY, USA, 2008.
RubayyiAlghamdi et al., A Survey of Topic Modeling in Text Mining, International Journal of Advanced Computer Science and Applications, Vol.6, No.1, 2015.
Lee, S., Baker, J., Song, J., and Wetherbe, J. C., -An Empirical Comparison of Four Text Mining Methods, Proceedings of the 43rd Hawaii International Conference on System Sciences, 2010.
Mimno, D. (2015). Package 'mallet' Packages.
Řehůřek, R., &Sojka, P. (2011), Gensim–Python Framework for Vector Space Modelling, NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic.
Blei, (2012), Topic Modelling and Digital Humanities, Journal of Digital Humanities, 2 (1), 8-11.
Phan, X. -H., & Nguyen, C. T. (2007), GibbsLDA++: AC/C++ Implementation of Latent Dirichlet Allocation (LDA): Technical report.