Loading...
Cyberbullying detection in Urdu language using machine learning
Khan, Sara ;
Khan, Sara
Publication Date
2023-01
End of Embargo
Supervisor
Rights
© 2022 IEEE. Reproduced in accordance with the publisher's self-archiving policy.
Peer-Reviewed
Yes
Open Access status
openAccess
Accepted for publication
2022
Institution
Department
Awarded
Embargo end date
Additional title
Abstract
Cyberbullying has become a significant problem with the surge in the use of social media. The most basic way to prevent cyberbullying on these social media platforms is to identify and remove offensive comments. However, it is hard for humans to read and remove all the comments manually. Current research work focuses on using machine learning to detect and eliminate cyberbullying. Although most of the work has been conducted on English texts to detect cyberbullying, limited to no work can be found in Urdu. This paper aims to detect cyberbullying from the users' comments posted in Urdu on Twitter using machine learning and Natural Language Processing (NLP) techniques. To the best of our knowledge, cyberbullying detection on Urdu text comments has not been performed due to the lack of a publicly available standard Urdu dataset. In this paper, we created a dataset of offensive user-generated Urdu comments from Twitter. The comments in the dataset are classified into five categories. n-gram techniques are used to extract features at character and word levels. Various supervised machine-learning techniques are applied to the dataset to detect cyberbullying. Evaluation metrics such as precision, recall, accuracy and F1 scores are used to analyse the performance of machine learning techniques.
Version
Accepted manuscript
Citation
Khan S and Qureshi A (2022) Cyberbullying detection in Urdu language using machine learning. From: 2022 International Conference on Emerging Trends in Electrical, Control, and Telecommunication Engineering (ETECTE). 2-4 Dec 2022, Lahore, Pakistan.
Link to publisher’s version
Link to published version
Link to Version of Record
Type
Conference paper