Knowledge Discovery and Measures of Interest is a reference book for knowledge discovery researchers, practitioners, and students. The knowledge discovery researcher will find that the material provides a theoretical foundation for measures of interest in data mining applications where diversity measures are used to rank summaries generated from databases. The knowledge discovery practitioner will find solid empirical evidence on which to base decisions regarding the choice of measures in data mining applications. The knowledge discovery student in a senior undergraduate or graduate course in databases and data mining will find the book is a good introduction to the concepts and techniques of measures of interest.
In
Knowledge Discovery and Measures of Interest, we study two closely related steps in any knowledge discovery system: the generation of discovered knowledge; and the interpretation and evaluation of discovered knowledge. In the generation step, we study data summarization, where a single dataset can be generalized in many different ways and to many different levels of granularity according to domain generalization graphs. In the interpretation and evaluation step, we study diversity measures as heuristics for ranking the interestingness of the summaries generated.
The objective of this work is to introduce and evaluate a technique for ranking the interestingness of discovered patterns in data. It consists of four primary goals:
- To introduce domain generalization graphs for describing and guiding the generation of summaries from databases.
- To introduce and evaluate serial and parallel algorithms that traverse the domain generalization space described by the domain generalization graphs.
- To introduce and evaluate diversity measures as heuristic measures of interestingness for ranking summaries generated from databases.
- To develop the preliminary foundation for a theory of interestingness within the context of ranking summaries generated from databases.
Knowledge Discovery and Measures of Interest is suitable as a secondary text in a graduate level course and as a reference for researchers and practitioners in industry.