Stability of feature selection methods : a study of metrics across different gene expression datasets

Mungloo-Dilmohamud, Zahra (University of Mauritius, Reduit, Mauritius) ; Jaufeerally-Fakim, Yasmina (University of Mauritius, Reduit, Mauritius) ; Peña-Reyes, Carlos (School of Management and Engineering Vaud, HES-SO // University of Applied Sciences Western Switzerland)

Analysis of gene-expression data often requires that a gene (feature) subset is selected and many feature selection (FS) methods have been devised. However, FS methods often generate different lists of features for the same dataset and users then have to choose which list to use. One approach to support this choice is to apply stability metrics on the generated lists and selecting lists on that base. The aim of this study is to investigate the behavior of stability metrics applied to feature subsets generated by FS methods. The experiments in this work explore a plethora of gene expression datasets, FS methods, and expected number of features to compare several stability metrics. The stability metrics have been used to compare five feature selection methods (SVM, SAM, ReliefF, RFE + RF and LIMMA) on gene expression datasets from the EBI repository. Results show that the studied stability metrics display a high amount of variability. The reason behind this is not clear yet and is being further investigated. The final objective of the research, that is to define how to select a FS method, is an ongoing work whose partial findings are reported herein.


Keywords:
Conference Type:
full paper
Faculty:
Ingénierie et Architecture
School:
HEIG-VD
Institute:
IICT - Institut des Technologies de l'Information et de la Communication
Publisher:
Granada, Spain, 30 September-2nd October 2020
Date:
2020-09
Granada, Spain
30 September-2nd October 2020
Pagination:
11 p.
Published in:
Proceedings of 8th International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2020: Bioinformatics and Biomedical Engineering, 30 September-2nd October 2020, Granada, Spain
Numeration (vol. no.):
2020, pp. 659-669
Series Statement:
Lectures Notes in Computer Science (LNCS), vol. 12108
DOI:
ISBN:
978-3-030-45384-8
Appears in Collection:

Note: The status of this file is: restricted


 Record created 2020-05-26, last modified 2020-06-11

Fulltext:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)