Close
返回澳門理工大學

2012/2013

A parallel Probabilistic Latent Semantic Analysis method on MapReduce platform*

2013 IEEE International Conference on Information and Automation, Yinchuan, Ningxia, China

作者Zhao Liang
Wenye Li
Yuxi Li
摘要

Probabilistic Latent Semantic Analysis (PLSA) is a powerful statistical technique to analyze relation between co-occurrence data, and has wide usages in automated information processing tasks. However it involves non-trivial computation and is often difficult and time-consuming to train when the dataset is big. MapReduce is a computing framework designed by Google which aims to provide a distributed solution to practically large-scale data analysis tasks using clusters of computers. In this work, we addressed the scalability problem of PLSA by proposing and implementing a parallel method to train PLSA under the MapReduce computing framework. The empirical experiment results show that when the training dataset is large, learning the probability distributions of PLSA model in a parallel way can achieve almost linear speedups and thus provides a practical solution to large-scale data analysis applications.


Top Top