Department of Computer Science Faculty Scholarship and Creative Works

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data

Aparna Varde, Montclair State UniversityFollow
Elke A. Rundensteiner, Worcester Polytechnic Institute
Carolina Ruiz, Worcester Polytechnic Institute
Mohammed Maniruzzaman, Worcester Polytechnic InstituteFollow
Richard D. Sisson, Worcester Polytechnic InstituteFollow

Document Type

Conference Proceeding

Publication Date

12-1-2005

Abstract

In mining graphical data the default Euclidean distance is often used as a notion of similarity. However this does not adequately capture semantics in our targeted domains, having graphical representations depicting results of scientific experiments. It is seldom known a-priori what other distance metric best preserves semantics. This motivates the need to learn such a metric. A technique called LearnMet is proposed here to learn a domain-specific distance metric for graphical representations. Input to LearnMet is a training set of correct clusters of such graphs. LearnMet iteratively compares these correct clusters with those obtained from an arbitrary but fixed clustering algorithm. In the first iteration a guessed metric is used for clustering. This metric is then refined using the error between the obtained and correct clusters until the error is below a given threshold. LearnMet is evaluated rigorously in the Heat Treating domain which motivated this research. Clusters obtained using the learned metric and clusters obtained using Euclidean distance are both compared against the correct clusters over a separate test set. Our results show that the learned metric provides better clusters.

DOI

10.1145/1133890.1133904

Montclair State University Digital Commons Citation

Varde, Aparna; Rundensteiner, Elke A.; Ruiz, Carolina; Maniruzzaman, Mohammed; and Sisson, Richard D., "Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data" (2005). Department of Computer Science Faculty Scholarship and Creative Works. 373.
https://digitalcommons.montclair.edu/compusci-facpubs/373

This document is currently not available here.

COinS

Department of Computer Science Faculty Scholarship and Creative Works

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data

Document Type

Publication Date

Abstract

DOI

Montclair State University Digital Commons Citation

Search

Browse

Author Corner

Links

Department of Computer Science Faculty Scholarship and Creative Works

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data

Authors

Document Type

Publication Date

Abstract

DOI

Montclair State University Digital Commons Citation

Share

Search

Browse

Author Corner

Links

//<![CDATA[ document.write("<a href='mailto:" + "digitalcommons" + "@" + "mail.montclair.edu" + "'>" + "Contact Us" + "<\/a>") //]]>