摘要

  With the rapid growth of the text mining technology,knowledge discovery in text is appealing more and more to researchers.Furthermore,association rule mining,an essential issue in data mining,is widely used in the field of text information.However,due to the exploding amount of information in different fields,the serial association rule mining algorithm has left much to be desired.In this paper,we propose a new parallel association rule mining algorithm on corpus,using MPI programming interface.The algorithm mainly makes use of distributed inverted hash index and the communication pattern based on chessboard decomposition to accelerate the generation of candidate itemsets.We implemented the algorithm and ran some test on the computer cluster.The performance evaluations show that the speedup rate can reach 16 when using 49 processes.

全文