Bystander Detection: Automatic Labeling Techniques using Feature Selection and Machine Learning

Anamika Gupta; Khushboo Thakkar; Veenu Bhasin; Aman Tiwari; Vibhor Mathur

doi:10.14569/IJACSA.2024.01501112

DOI: 10.14569/IJACSA.2024.01501112

PDF

Bystander Detection: Automatic Labeling Techniques using Feature Selection and Machine Learning

Author 1: Anamika Gupta

Author 2: Khushboo Thakkar

Author 3: Veenu Bhasin

Author 4: Aman Tiwari

Author 5: Vibhor Mathur

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 15 Issue 1, 2024.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: A hostile or aggressive behavior on an online platform by an individual or a group of people is termed as cyberbullying. A bystander is the one who sees or knows about such incidences of cyberbullying. A defender who intervenes can mitigate the impact of bullying, an instigator who accomplices the bully, can add to the victim’s suffering, and an impartial onlooker who remains neutral and observes the scenario without getting engaged. Studying the behavior of Bystanders role can help in shaping the scale and progression of bullying incidents. However, the lack of data hinders the research in this area. Recently, a dataset, CYBY23, of Twitter threads having main tweets and the replies of Bystanders was published on Kaggle in Oct 2023. The dataset has extracted features related to toxicity and sensitivity of the main tweets and reply tweets. The authors have got manual annotators to assign the labels of Bystanders’ roles. Manually labeling bystanders’ roles is a labor-intensive task which eventually raises the need to have an automatic labeling technique for identifying the Bystander role. In this work, we aim to suggest a machine-learning model with high efficiency for the automatic labeling of Bystanders. Initially, the dataset was re-sampled using SMOTE to make it a balanced dataset. Next, we experimented with 12 models using various feature engineering techniques. Best features were selected for further experimentation by removing highly correlated and less relevant features. The models were evaluated on the metrics of accuracy, precision, recall, and F1 score. We found that the Random Forest Classifier (RFC) model with a certain set of features is the highest scorer among all 12 models. The RFC model was further tested against various splits of training and test sets. The highest results were achieved using a training set of 85% and a test set of 15%, having 78.83% accuracy, 81.79% precision, 74.83% recall, and 79.45% F1 score. Automatic labeling proposed in this work, will help in scaling the dataset which will be useful for further studies related to cyberbullying.

Keywords: Bystanders; cyberbullying; machine learning; defender; instigator; impartial; toxicity; twitter

Anamika Gupta, Khushboo Thakkar, Veenu Bhasin, Aman Tiwari and Vibhor Mathur, “Bystander Detection: Automatic Labeling Techniques using Feature Selection and Machine Learning” International Journal of Advanced Computer Science and Applications(IJACSA), 15(1), 2024. http://dx.doi.org/10.14569/IJACSA.2024.01501112

@article{Gupta2024,
title = {Bystander Detection: Automatic Labeling Techniques using Feature Selection and Machine Learning},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2024.01501112},
url = {http://dx.doi.org/10.14569/IJACSA.2024.01501112},
year = {2024},
publisher = {The Science and Information Organization},
volume = {15},
number = {1},
author = {Anamika Gupta and Khushboo Thakkar and Veenu Bhasin and Aman Tiwari and Vibhor Mathur}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Bystander Detection: Automatic Labeling Techniques using Feature Selection and Machine Learning

Upcoming Conferences