Please use this identifier to cite or link to this item: http://localhost:80/xmlui/handle/123456789/8632
Title: Feature Importance Genes from Breast Cancer Subtypes Classification Employing Machine Learning
Other Titles: (In) Russian Journal of Genetics
Authors: Bhowmick, Shib Sankar
Issue Date: 2023
Publisher: Springer
Series/Report no.: Vol : 59;Issue : 10, Page 110-122
Abstract: The heterogeneous nature of breast cancer necessitates exploring its molecular subtypes for the early prognosis and treatment of cancer patients. Recent advances in genomics have enabled the investigation of gene expression data in breast cancer research as an alternative to traditional methods. In this regard, a project like The Cancer Genome Atlas (TCGA) provided easy access to the vast high-throughput sequencing gene expression data, including Breast cancer. However, finding evidence of the involvement of a set of genes in a particular breast cancer subtype from this large bulk of gene expression dataset is a demanding task. Here, we propose to develop a classification model based on machine learning to uncover the significant genes associated with different breast cancer subtypes like Basal, human epidermal growth factor receptor 2, luminal A, and luminal B. The RNA-Sequence gene expression data from The Cancer Genome Atlas is used for the tumor and normal sample classification and breast cancer subtype-specific optimal set of gene identification for this experiment. Experimental results show that the average classification accuracy value for different gene subsets varies from 75.36–77.74% depending upon the breast cancer subtype and feature selection method. Additionally, the feature scoring mechanism introduced in our model ranks the Feature Importance genes as three*, four*, five*, and six*. Besides this, Kaplan–Meier survival analysis, Composite network analysis, and Gene Ontology analysis are conducted to highlight the biological significance of the Feature Importancegenes. Given the classification results and the biological insight, we may conclude that the proposed model extracts a set of informative genes involved in breast cancer development, particularly the Basal, human epidermal growth factor receptor 2, luminal A, and luminal B subtypes.
Description: https://doi.org/10.1134/S1022795423130021
URI: http://localhost:80/xmlui/handle/123456789/8632
Appears in Collections:Electronics and Communication Engineering (Publications)



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.