Canonical labeling to improve compression approach to graph matching
[Thesis]
Mohammad R. Islam
Eberle, William
Tennessee Technological University
2016
63
Committee members: Kosa, Martha; Talbert, Doug
Place of publication: United States, Ann Arbor; ISBN=978-1-369-45317-1
M.S.
Computer Science
Tennessee Technological University
2016
Frequent itemset and sequence mining are known successful data mining approaches for discovering interesting patterns. However, more recently, research efforts have focused on the challenges of discovering frequent patterns in structural data - or data where there is a relationship between entities. One potential solution has involved the use of graph mining, where research has focused on creating efficient and scalable algorithms for frequent subgraph mining. Graph based pattern mining is used in many applications like chemistry, biology, and computer networks, just to name a few. However, with the rise of big data, current research efforts need to focus even more on the issue of scalability in order to be practical in the real-world. In this paper, we introduce a new approach for discovering frequent subgraphs in large datasets using a hybrid approach between two of the more popular subgraph mining algorithms. We empirically evaluate our approach on two different publicly available datasets, one representing chemical compounds and the other representing computer networking. From both of them, our algorithm discovers more meaningful frequent patterns than the other two algorithms.