Canonical labeling to improve compression approach to graph matching
General Material Designation
[Thesis]
First Statement of Responsibility
Mohammad R. Islam
Subsequent Statement of Responsibility
Eberle, William
.PUBLICATION, DISTRIBUTION, ETC
Name of Publisher, Distributor, etc.
Tennessee Technological University
Date of Publication, Distribution, etc.
2016
PHYSICAL DESCRIPTION
Specific Material Designation and Extent of Item
63
GENERAL NOTES
Text of Note
Committee members: Kosa, Martha; Talbert, Doug
NOTES PERTAINING TO PUBLICATION, DISTRIBUTION, ETC.
Text of Note
Place of publication: United States, Ann Arbor; ISBN=978-1-369-45317-1
DISSERTATION (THESIS) NOTE
Dissertation or thesis details and type of degree
M.S.
Discipline of degree
Computer Science
Body granting the degree
Tennessee Technological University
Text preceding or following the note
2016
SUMMARY OR ABSTRACT
Text of Note
Frequent itemset and sequence mining are known successful data mining approaches for discovering interesting patterns. However, more recently, research efforts have focused on the challenges of discovering frequent patterns in structural data - or data where there is a relationship between entities. One potential solution has involved the use of graph mining, where research has focused on creating efficient and scalable algorithms for frequent subgraph mining. Graph based pattern mining is used in many applications like chemistry, biology, and computer networks, just to name a few. However, with the rise of big data, current research efforts need to focus even more on the issue of scalability in order to be practical in the real-world. In this paper, we introduce a new approach for discovering frequent subgraphs in large datasets using a hybrid approach between two of the more popular subgraph mining algorithms. We empirically evaluate our approach on two different publicly available datasets, one representing chemical compounds and the other representing computer networking. From both of them, our algorithm discovers more meaningful frequent patterns than the other two algorithms.