Paper 1: Predicting Cervical Cancer Based on Behavioral Risk Factors
Abstract: Machine learning (ML) based predictive models are increasingly used in various fields due to their ability to find patterns and interpret complex relationships between variables in an extensive dataset. However, getting a comprehensive dataset is challenging in the field of medicine for rare or emerging infections. Therefore, developing a robust methodology and selecting ML classifiers that can still make compelling predictions even with smaller and imbalanced datasets is essential to defend against emerging threat or infections. This paper uses behavioral risk factors to predict cervical cancer risk. To create a robust technique, we intentionally selected a smaller imbalanced dataset and applied Adaptive Synthetic (ADASYN) sampling and hyper-parameter tuning to enhance the predictive performance. In this work, hyperparameter tuning, evaluated through 3-fold cross-validation, is employed to optimize the performance of the Random Forest, XGBoost, and Voting Classifier models. The results demonstrated high classification performance, with all models achieving an accuracy of 97.12%. Confusion matrix analysis further revealed the models’ robustness in identifying cervical cancer cases with minimal misclassification. A comparison with previous work confirmed the superiority of our approach, showcasing improved accuracy and precision. This study demonstrates the potential of ML models for early screening and risk assessment, even when working with limited datasets.
Keywords: Cervical cancer; random forest; voting classifier; Adaptive Synthetic Sampling (ADASYN); predictive model