MACHINE LEARNING APPROACH FOR BREAST CANCER CLASSIFICATION
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Breast cancer is the most common cancer among women in Africa. These facts have led researchers to continue studying how to treat and detect breast cancer in women, especially older women, who are at higher risk. Achieving satisfactory cancer classification accuracy with the complete set of genes remains a great challenge (most especially with microarray datasets), due to the high dimensions, small sample size, and presence of noise in gene expression data. Feature reduction is critical and sensitive to the classification task. One of the major drawbacks of cancer studies is recognizing informative genes (features) among the thousands of others in the dataset. A large number of features (genes) against a small sample size and redundancy in expressed data are the main two reasons that lead to poor classification accuracy in machine learning and data mining processes. Therefore, dimensionality reduction is an exciting research area in the fields of pattern recognition, machine learning, data mining, and statistics. The purpose of dimensionality reduction is to improve classification performance through the removal of redundant or irrelevant features. Furthermore, feature selection is typically useful in reducing computation time and memory complexity, which have always been challenges in big data tasks. Besides, the high complexity of the memory space or time as a result of high dimension, noise effect, and outliers but it also has adverse impacts on the performance of the algorithms This paper tends to improve the low general accuracy and minimize memory space and execution time in classification models of machine learning algorithms; hence, the system will employ InfoGain for dimensional reduction and the Random Forest algorithm for classification.