Clustering of Microarray Data Using ENDIST
Main Article Content
Abstract
Microarray gene expressions shed light on genetic indicators that help distinguish between healthy and malignant cells. An unsupervised learning technique called clustering divides the data into groups depending on how similar they are. Clustering of microarray data helps us to identify the expression profiles of the genes. This identifies genes with similar functions like upregulated genes and downregulated genes. The distance measures help precisely ascertain how dissimilar two items are to one another which is a fundamental idea for these analytical jobs and applications. This research proposes a novel distance measure for microarray datasets. For clustering tasks, the benchmark algorithm k-medoids is employed. ENDIST findings are compared with those of other widely used distance measures, including Manhattan and the Euclidean distance measure. The utilized data are central nervous system dataset, lung cancer dataset, brain cancer dataset, endometrial cancer dataset and the dataset for prostate cancer. The preprocessing technique involves the transformation of raw data into logarithm values followed by both binary and ternary discretization. The quality of the clusters using ENDIST was validated through the Dunn index, demonstrating ENDIST's superior capability in identifying gene expression profiles.
Article Details
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.