Heterogeneity Meets Rarity: Mining Multi-Faceted Diamond

Stay connected



Share on facebook
Share on twitter
Share on linkedin

CIS Colloquium, Sep 12, 2012, 11:00AM – 12:00PM, Wachman 447

Heterogeneity Meets Rarity: Mining Multi-Faceted Diamond

Jingrui He, IBM T.J. Watson Research Center

Many real-world problems exhibit both heterogeneity and rarity. Take insider threat detection from various social contexts as an example. While the target malicious insiders may only be a very small portion of the entire population (i.e., rarity), each person can be characterized by rich features, such as social friendship, emails, instant messages, etc (i.e., feature heterogeneity). Moreover, different types of insiders, though correlated, may exhibit different statistical characteristics (i.e., task heterogeneity). For such problems, how can we quickly identify an example from a new rare category? How can we leverage both feature heterogeneity and task heterogeneity to maximally boost the learning performance?
In this talk, I will present our recent work on addressing these two challenges. For the challenge of rarity, I will introduce rare category analysis, e.g., how to detect the rare examples with the help of a labeling oracle. For the challenge of heterogeneity, I will present a graph-based approach taking into consideration both feature heterogeneity and task heterogeneity. I will also talk about how these techniques can be used in applications such as insider threat detection, traffic prediction, etc.

Bio: Dr. Jingrui He is currently a research staff member at IBM T.J. Watson Research Center. She received her M.Sc and Ph.D degree from Carnegie Mellon University in 2008 and 2010 respectively, both majored in Machine Learning. Her research interests include developing scalable algorithms for heterogeneous learning, rare category analysis, and semi- supervised learning, with applications in social media analysis, traffic analysis, public safety, enterprise analytics, and virtual metrology in semiconductor manufacturing. She has published over 30 referred articles and served as the organization committee member of ICML, KDD, etc.