Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction
Applied and Computational Mathematics
Volume 7, Issue 4, August 2018, Pages: 212-216
Received: Oct. 17, 2018; Published: Oct. 18, 2018
Views 556      Downloads 207
Yixuan Li, School of Mathematics and Statistics, University of Sheffield, Sheffield, UK
Zixuan Chen, School of Information, Zhejiang University of Finance and Economics, Hangzhou, China
Article Tools
Follow on us
Breast cancer is the most common invasive cancer in women and the second main cause of cancer death in females, which can be classified Benign or Malignant. Research and prevention on breast cancer have attracted more concern of researchers in recent years. On the other hand, the development of data mining methods provides an effective way to extract more useful information from complex database, and some prediction, classification and clustering can be made according to extracted information. In this study, to explore the relationship between breast cancer and some attributes so that the death probability of breast cancer can be reduced, five different classification models including Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN) and Logistics Regression (LR) are used for the classification of two different datasets related to breast cancer: Breast Cancer Coimbra Dataset (BCCD) and Wisconsin Breast Cancer Database (WBCD). Three indicators including prediction accuracy values, F-measure metric and AUC values are used to compare the performance of these five classification models. comparative experiment analysis shows that random forest model can achieve better performance and adaptation than other four methods. Therefore, the model of this study is approved to possess clinical and referential values in practical applications.
Data Mining, Breast Cancer, Classification Models, Prediction
To cite this article
Yixuan Li, Zixuan Chen, Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction, Applied and Computational Mathematics. Vol. 7, No. 4, 2018, pp. 212-216. doi: 10.11648/j.acm.20180704.15
Harbeck, N. & Gnant, M. (2017). Breast cancer. The Lancet, 389, 1134-1150.
Wass, J. (2007). The R language. Scientific Computing, 24, 40-41.
Patrício, M., Pereira, J., & Crisóstomo, J. et al. (2018). Using resistin, glucose, age, and BMI to predict the presence of breast cancer. BMC Cancer, 18, 21-29.
Chaurasia, V., Pal, S., & Tiwari, B. B. (2018). Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology, 12(2), 119-126.
Cakir, A. & Demirel, B. (2011). A software tool for determination of breast cancer treatment methods using data mining approach. Journal of Medical Systems, 35(6), 1503-1511.
Takada, M., Sugimoto, M., & Ohno, S. et al. (2012). Prediction of the pathological response to neoadjuvant chemotherapy in patients with primary breast cancer using a data mining technique. Breast Cancer Research and Treatment, 134(2), 661-670.
Liu, X. Q., Li, Q. M., & Li, T. (2017). Differentially private classification with decision tree ensemble. Applied Soft Computing, 62, 807-816.
O’Neil, G. L., Goodhall, J. L., & Watson, L. T. (2018). Evaluating the potential for site-specific modification of LiDAR DEM derivatives to improve environmental planning-scale wetland identification using random forest classification. Journal of Hydrology, 559, 192-208.
Zhang, H., Gao, C., & Zhang, M. (2017). Prediction of soil organic carbon in an intensively managed reclamation zone of eastern China: a comparison of multiple linear regressions and the random forest model. Science of the Total Environment, 592, 704-713.
Li, L., Paxton, E. W., & Fan, J. (2017). Predicting risk for adverse health events using random forest. Journal of Applied Statistics, 45(12), 2279-2294.
Clark, J. W. (1991). Neural network modeling. Physics in Medicine & Biology, 36, 1259-1317.
Suthar, V., Tarmizi, R. A., & Midi, H. et al. (2010). Students’ belief on mathematics and achievement of university students: logistic regression analysis. Procedia-Social and Behavioral Science, 8, 525-531.
Science Publishing Group
1 Rockefeller Plaza,
10th and 11th Floors,
New York, NY 10020
Tel: (001)347-983-5186