CIS 4523/5523: Knowledge Discovery and Data Mining
Spring 2024

Goals


The objective of knowledge discovery and data mining process is to extract nontrivial, implicit, previously unknown, and potentially useful information from massive datasets. The course is intended to serve as an introduction to the fundamental techniques required to support this process. The course is structured to provide ample opportunity for participants to learn about this research area, and scout around for promising research topics by a hands-on experience.

Prerequisites


Basic knowledge in database systems; programming skills; basic statistics, graph theory, and linear algebra.

Texts


  • Tan P.N., Steinbach M., Kumar V., Karpatne A.: Introduction to Data Mining, 2nd Edition, Pearson Education, 2018 ISBN: 0133128903 (required)
  • Aggarwal, C.: Data Mining, The Textbook: Springer, 2015, ISBN-13: 978-3319141411 (recommended)

Topics


  1. An overview of data mining tasks and techniques.
  2. Data:
    1. data types
    2. data quality
    3. data preprocessing: aggregation, sampling, dimensionality reduction, feature selection
  3. Similarities and distances:
    1. multidimensional data
    2. text similarity measures
    3. temporal similarity measures
    4. graph similarity measures
    5. supervised similarity functions
  4. Descriptive and Predictive Modeling:
    1. model functions (cluster analysis, summarization, classification, regression, anomaly detection)
    2. model representation (instance-based and rule-based classifiers, decision trees, probabilistic classifiers, density models, partitioning, hierarchical, density-based, grid-based and model-based clustering algorithms, frequent pattern mining).
  5. Advanced topics:
    1. mining data streams
    2. mining time series
    3. mining spatial data
    4. mining discrete sequences
  6. Reading and research projects presentations.

Grading


Homework (30%), midterm exam on March 21, 2024 (20%), reading/presenting assignments (20%) and a research project report due May 2, 2024 by 5:30pm (30%).

Late Policy and Academic Honesty


An automatic extension of homework submission is acceptable with 20% penalty per day. Discussing materials with fellow students is acceptable, but programs, experiments and the reports must be done individually.