Article Data

  • Views 265
  • Dowloads 149

Original Research

Open Access

Application and comparison of several machine learning methods in the prognosis of cervical cancer

  • Yawen Ling1,†
  • Weiwei Zhang2,†
  • Zhidong Li1
  • Xiaorong Pu1
  • Yazhou Ren1,3,*,

1School of Computer Science and Engineering, University of Electronic Science and Technology of China, 611731 Chengdu, Sichuan, China

2Cancer prevention and treatment institute of Chengdu, Department of oncology, Chengdu Fifth People's Hospital/The Second Clinicical Medical College, Affiliated Fifth People's Hospital of Chengdu University of Traditional Chinese Medicine, 611137 Chengdu, Sichuan, China

3Institute of Electronic and Information Engineering of UESTC in Guangdong, 523808 Dongguan, Guangdong, China

DOI: 10.22514/ejgo.2022.056 Vol.43,Issue 6,December 2022 pp.34-44

Submitted: 29 June 2022 Accepted: 02 August 2022

Published: 15 December 2022

*Corresponding Author(s): Yazhou Ren E-mail:

† These authors contributed equally.


Accurate prognosis of cervical cancer in the clinical setting is challenging because of the complexity of the causative factors. Considering the drawbacks of the widely used Cox proportional hazards model, such as the inability to fully use the information and the possible failure to achieve the best fit, several new attempts based on machine learning have been developed to find better prognostic prediction models. However, the application of these attempts is often limited, because they often rely on public databases. Therefore, for cervical cancer, there is a need to explore the value of machine learning in terms of its practical application in prognostic prediction. In this study, we introduced several machine learning methods including k-nearest neighbors (KNN), decision tree (DT), logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) to predict the survival of patients by using the real-world pathological data of 216 patients collected from the Fifth People’s Hospital of Chengdu. The experimental results showed that these methods have a promising application value in the prediction of overall survival (OS) of patients with cervical cancer (KNN: F1-score = 0.95, Accuracy = 0.93, DT: F1-score = 0.94, Accuracy = 0.92, LR: F1-score = 0.92, Accuracy = 0.90, SVM: F1-score = 0.94, Accuracy = 0.92, RF: F1-score = 0.96, Accuracy = 0.95, XGBoost: F1-score = 0.96, Accuracy = 0.95, LightGBM: F1-score = 0.96, Accuracy = 0.95). Moreover, XGBoost and LightGBM gave the importance of the clinical indicators associated with cervical cancer, whose correlation with OS and progression-free survival (PFS) can be further obtained. Thus, the predictors of OS and PFS were successfully identified. Finally, the results were confirmed by the Cox proportional hazards model. These results indicated that machine learning methods can accurately predict the OS of patients with cervical cancer. Moreover, the methods can be used to analyze the correlation between clinical indicators and OS or PFS to help doctors make more accurate decisions in a clinical setting.


Cervical cancer; Machine learning; Prognosis; Overall survival; Progression-free survival

Cite and Share

Yawen Ling,Weiwei Zhang,Zhidong Li,Xiaorong Pu,Yazhou Ren. Application and comparison of several machine learning methods in the prognosis of cervical cancer. European Journal of Gynaecological Oncology. 2022. 43(6);34-44.


[1] Siegel RL, Miller KD, Jemal A. Cancer statistics, 2020. CA: A Cancer Journal for Clinicians. 2020; 70: 7–30.

[2] Prandi GA, Cocchio S, Fonzo M, Furlan P, Nicoletti M, Baldo V. Towards the elimination of cervical cancer: HPV epidemiology, real-world experiences and the potential impact of the 9-valent HPV vaccine. European Journal of Gynaecological Oncology. 2021; 42: 1068–1078.

[3] Passos CM, Sales JB, Maia EG, Caldeira TCM, Rodrigues RD, Figueiredo N, et al. Trends in access to female cancer screening in Brazil, 2007–16. Journal of Public Health. 2021; 43: 632–638.

[4] Sahasrabuddhe VV, Parham GP, Mwanahamuntu MH, Vermund SH. Cervical cancer prevention in low- and middle-income countries: feasible, affordable, essential. Cancer Prevention Research. 2012; 5: 11–17.

[5] He S, Liao B, Deng Y, Su C, Tuo J, Liu J, et al. MiR-216b inhibits cell proliferation by targeting FOXM1 in cervical cancer cells and is associated with better prognosis. BMC Cancer. 2017; 17: 673.

[6] Park S, Kim J, Eom K, Oh S, Kim S, Kim G, et al. MicroRNA-944 overexpression is a biomarker for poor prognosis of advanced cervical cancer. BMC Cancer. 2019; 19: 419.

[7] Dunkler D, Ploner M, Schemper M, Heinze G. Weighted cox regression using the R package coxphw. Journal of Statistical Software. 2018; 84: 1–26.

[8] She Y, Jin Z, Wu J, Deng J, Zhang L, Su H, et al. Development and validation of a deep learning model for non-small cell lung cancer survival. JAMA Network Open. 2020; 3: e205842.

[9] Wang X, Chen H, Gan C, Lin H, Dou Q, Tsougenis E, et al. Weakly supervised deep learning for whole slide lung cancer image analysis. IEEE Transactions on Cybernetics. 2020; 50: 3950–3962.

[10] Liu P, Fu B, Yang SX, Deng L, Zhong X, Zheng H. Optimizing survival analysis of xgboost for ties to predict disease progression of breast cancer. IEEE Transactions on Biomedical Engineering. 2021; 68: 148–160.

[11] Zhang Y, Li Q, Xin Y, Lv W. Differentiating prostate cancer from benign prostatic hyperplasia using PSAD based on machine learning: single-center retrospective study in China. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2019; 16: 936–941.

[12] Mansour RF. A robust deep neural network based breast cancer detection and classification. International Journal of Computational Intelligence and Applications. 2020; 19: 2050007.

[13] Ragab M, Albukhari A, Alyami J, Mansour RF. Ensemble deep-learning-enabled clinical decision support system for breast cancer diagnosis and classification on ultrasound images. Biology. 2022; 11: 439.

[14] Althobaiti MM, Ashour AA, Alhindi NA, Althobaiti A, Mansour RF, Gupta D, et al. Deep transfer learning-based breast cancer detection and classification model using photoacoustic multimodal images. BioMed Research International. 2022; 2022: 1–13.

[15] Alam TM, Khan MMA, Iqbal MA, Abdul W, Mushtaq M. Cervical cancer prediction through different screening methods using data mining. International Journal of Advanced Computer Science and Applications. 2019; 10: 388–396.

[16] Wu W, Zhou H. Data-driven diagnosis of cervical cancer with support vector machine-based approaches. IEEE Access. 2017; 5: 25189–25195.

[17] Ijaz MF, Attique M, Son Y. Data-driven cervical cancer prediction model with outlier detection and over-sampling methods. Sensors. 2020; 20: 2809.

[18] Deng X, Luo Y, Wang C. ‘Analysis of risk factors for cervical cancer based on machine learning methods’. 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). 23–25 November 2018. IEEE: Nanjing, China. 2018.

[19] Lu J, Song E, Ghoneim A, Alrashoud M. Machine learning for assisting cervical cancer diagnosis: an ensemble approach. Future Generation Computer Systems. 2020; 106: 199–205.

[20] Turan T, Yildirim BA, Tulunay G, Boran N, Kose MF. Prognostic effect of different cut-off values (20 mm, 30 mm and 40 mm) for clinical tumor size in FIGO stage IB cervical cancer. Surgical Oncology. 2010; 19: 106–113.

[21] Kato T, Takashima A, Kasamatsu T, Nakamura K, Mizusawa J, Nakanishi T, et al. Clinical tumor diameter and prognosis of patients with FIGO stage IB1 cervical cancer (JCOG0806-A). Gynecologic Oncology. 2015; 137: 34–39.

[22] Graham JW. Missing data analysis: making it work in the real world. Annual Review of Psychology. 2009; 60: 549–576.

[23] Karadaghy OA, Shew M, New J, Bur AM. Development and assessment of a machine learning model to help predict survival among patients with oral squamous cell carcinoma. JAMA Otolaryngology-Head & Neck Surgery. 2019; 145: 1115–1120.

[24] Song C, Li X. Cost-sensitive KNN algorithm for cancer prediction based on entropy analysis. Entropy. 2022; 24: 253.

[25] Afolayan JO, Adebiyi MO, Arowolo MO, Chakraborty C, Adebiyi AA. Breast cancer detection using particle swarm optimization and decision tree machine learning technique. Intelligent Healthcare. 2022; 286: 61–83.

[26] Idris NF, Ismail MA. Breast cancer disease classification using fuzzy-ID3 algorithm with FUZZYDBD method: automatic fuzzy database definition. PeerJ Computer Science. 2021; 7: e427.

[27] Thabtah F, Abdelhamid N, Peebles D. A machine learning autism classification based on logistic regression analysis. Health Information Science and Systems. 2019; 7: 12.

[28] Lilhore UK, Simaiya S, Pandey H, Gautam V, Garg A, Ghosh P. Breast cancer detection in the iot cloud-based healthcare environment using fuzzy cluster segmentation and svm classifier. Ambient Communications and Computer Systems. 2022; 356: 165–179.

[29] Wang X, Zhai M, Ren Z, Ren H, Li M, Quan D, et al. Exploratory study on classification of diabetes mellitus through a combined random forest classifier. BMC Medical Informatics and Decision Making. 2021; 21: 105.

[30] Liu YH, Jin J, Liu YJ. Machine learning-based random forest for predicting decreased quality of life in thyroid cancer patients after thyroidectomy. Supportive Care in Cancer. 2022; 30: 2507–2513.

[31] Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 13–17 August 2016. Association for Computing Machinery: San Francisco, CA. 2016.

[32] Li Y, Zou Z, Gao Z, Wang Y, Xiao M, Xu C, et al. Prediction of lung cancer risk in Chinese population with genetic‐environment factor using extreme gradient boosting. Cancer Medicine. 2022; 00: 1–10.

[33] Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: a highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems. 2017; 30: 3146–3154.

[34] Huang K, Zhang J, Yu Y, Lin Y, Song C. The impact of chemotherapy and survival prediction by machine learning in early elderly triple negative breast cancer (eTNBC): a population based study from the SEER database. BMC Geriatrics. 2022; 22: 268.

[35] Yang C, Zhu X, Ahmad Z, Wang L, Qiao J. Design of incremental echo state network using leave-one-out cross-validation. IEEE Access. 2018; 6: 74874–74884.

[36] Mittal S, Madigan D, Burd RS, Suchard MA. High-dimensional, massive sample-size cox proportional hazards regression for survival analysis. Biostatistics. 2014; 15: 207–221.

[37] Halle MK, Sødal M, Forsse D, Engerud H, Woie K, Lura NG, et al. A 10-gene prognostic signature points to LIMCH1 and HLA-DQB1 as important players in aggressive cervical cancer disease. British Journal of Cancer. 2021; 124: 1690–1698.

[38] Sheng W, Bai WP. Identification of hypoxia-related prognostic signature for ovarian cancer based on cox regression model. European Journal of Gynaecological Oncology. 2022; 43: 247–256.

Abstracted / indexed in

Science Citation Index Expanded (SciSearch) Created as SCI in 1964, Science Citation Index Expanded now indexes over 9,500 of the world’s most impactful journals across 178 scientific disciplines. More than 53 million records and 1.18 billion cited references date back from 1900 to present.

Biological Abstracts Easily discover critical journal coverage of the life sciences with Biological Abstracts, produced by the Web of Science Group, with topics ranging from botany to microbiology to pharmacology. Including BIOSIS indexing and MeSH terms, specialized indexing in Biological Abstracts helps you to discover more accurate, context-sensitive results.

Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines.

JournalSeek Genamics JournalSeek is the largest completely categorized database of freely available journal information available on the internet. The database presently contains 39226 titles. Journal information includes the description (aims and scope), journal abbreviation, journal homepage link, subject category and ISSN.

Current Contents - Clinical Medicine Current Contents - Clinical Medicine provides easy access to complete tables of contents, abstracts, bibliographic information and all other significant items in recently published issues from over 1,000 leading journals in clinical medicine.

BIOSIS Previews BIOSIS Previews is an English-language, bibliographic database service, with abstracts and citation indexing. It is part of Clarivate Analytics Web of Science suite. BIOSIS Previews indexes data from 1926 to the present.

Journal Citation Reports/Science Edition Journal Citation Reports/Science Edition aims to evaluate a journal’s value from multiple perspectives including the journal impact factor, descriptive data about a journal’s open access content as well as contributing authors, and provide readers a transparent and publisher-neutral data & statistics information about the journal.

Submission Turnaround Time