Bài báo - Tạp chí
1 (2020) Trang: 1-19
Tạp chí: SN Computer Science

Classifying gene expression data is known to contain keys for solving the fundamental problems in cancer studies. However, this issue is a complex task because of the large p, small n issue on gene expression data analysis. In this paper, we propose the improvements in the large p, small n classification issue for the study of human cancer. First, a new enhancing sample size method with generative adversarial network is proposed to improve classification algorithms. Second, we suggest a classification approach with over-sampling technique using features extracted by deep convolutional neural network. Numerical test results on fifty very-high-dimensional and low-sample-size gene expression data datasets from the Kent Ridge Biomedical and Array Expression repositories illustrate that the proposed models are more accurate than state-of-the-art classifying models. In addition, we also have explored the performance of support vector machines, k nearest neighbors and random forests, which have improved when apply our approaches.

