Unsupervised Machine Learning Framework for Efficient Detection of Outliers from High Dimensional Datasets

Main Article Content

Girish Reddy Ginni, Srinivasa Chakravarthi Lade

Abstract

A basic idea in data mining and machine learning applications is outlier detection. Outlier identification and clustering frequently go hand in hand since the former can find outliers. Outlier identification was the primary focus of the majority of current research projects and clustering as two different aspects and their intimate relationship is less explored. However, considering such relationship could leverage cluster quality besides detection of outliers leading to dual benefits. Towards this end, we proposed an unsupervised machine learning (ML) framework for efficient detection of outliers from high dimensional datasets. An objective function is defined to improve cluster compactness leading to efficiency in outlier detection process. Further improvement of clustering process with problem transformation and usage of enhanced K-Means could result in an integrated approach that jointly archives quality clustering and outlier identification. We proposed an algorithm known as Learning based Outlier Detection (LbOD). Novelty of our algorithm lies in simultaneous approach in partition space, objective function and cluster optimization. A prototype is built to evaluate the proposed framework and algorithm for its ability to discover outliers considering multiple benchmark high dimensional datasets. Our empirical study has revealed that the LbOD algorithm outperforms many existing outlier detection methods. 

Article Details

Section
Articles