QUANTITATIVE SENTIMENT MINING FOR ROMAN URDU LANGUAGE
- School of Systems and Technology, University of Management and Technology, Lahore 54770, Pakistan.
- Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Pakistan.
- Faculty the Information Technology, The University of Lahore, Lahore Campus, Pakistan.
Sentiment mining is the natural language processing task that helps to classify a large amount of web-based data for people’s opinion-making. Most of the research work has been carried out for resource-rich languages but less resource language like Roman Urdu needs a considerable effort. In this paper, we classify Roman Urdu data according to the sentiment (Positive/Negative) using numerous feature selection techniques. Feature selection techniques include Chi-square, Mutual Information, select form model that are being evaluated on different n-gram variations on a dataset of 11k Roman Urdu reviews. The dataset contains sentences from different domains like (Drama, Sports, Politics, Food etc.) are classified according to its nature. The experiments were performed by applying renowned machine learning and neural network classifiers. Machine learning classifiers incorporate logistic regression, support vector machine, decision tree, random forest, multinomial naïve Bayes and multilayer perceptron whereas convolutional neural network, long-short term memory and bidirectional long-short term memory belong to the neural network. The results are evaluated for both character-level and word-level variations of n-gram using evaluation measure accuracy and f1-score. Due to the diverse range of spellings being used in Roman Urdu we have measured results for different variations. We have achieved the highest accuracy 91.8% & 91.7% f1-score for bi-lstm meanwhile for the character-level we have 83.91% accuracy and 90.51% f1-score on 4-gram variation. For word-level analysis, 83.73% accuracy and 90.42% f1-score have been achieved on 1–4-gram variation. These results outperformed the baseline results for the Roman Urdu classification which shows the impact of feature selection techniques for sentiment mining classification.
School of Systems and Technology, University of Management and Technology, Lahore 54770, Pakistan.
Share this article
