• Predicting the “helpfulness” of online consumer reviews

      Singh, J.P.; Irani, S.; Rana, Nripendra P.; Dwivedi, Y.K.; Saumya, S.; Kumar Roy, P. (2017-01)
      Online shopping is increasingly becoming people's first choice when shopping, as it is very convenient to choose products based on their reviews. Even for moderately popular products, there are thousands of reviews constantly being posted on e-commerce sites. Such a large volume of data constantly being generated can be considered as a big data challenge for both online businesses and consumers. That makes it difficult for buyers to go through all the reviews to make purchase decisions. In this research, we have developed models based on machine learning that can predict the helpfulness of the consumer reviews using several textual features such as polarity, subjectivity, entropy, and reading ease. The model will automatically assign helpfulness values to an initial review as soon as it is posted on the website so that the review gets a fair chance of being viewed by other buyers. The results of this study will help buyers to write better reviews and thereby assist other buyers in making their purchase decisions, as well as help businesses to improve their websites.
    • Ranking online consumer reviews

      Saumya, S.; Singh, J.P.; Baabdullah, A.M.; Rana, Nripendra P.; Dwivedi, Y.K. (2018-05)
      Product reviews are posted online by the hundreds and thousands for popular products. Handling such a large volume of continuously generated online content is a challenging task for buyers, sellers and researchers. The purpose of this study is to rank the overwhelming number of reviews using their predicted helpfulness scores. The helpfulness score is predicted using features extracted from review text, product description, and customer question-answer data of a product using the random-forest classifier and gradient boosting regressor. The system classifies reviews into low or high quality with the random-forest classifier. The helpfulness scores of the high-quality reviews are only predicted using the gradient boosting regressor. The helpfulness scores of the low-quality reviews are not calculated because they are never going to be in the top k reviews. They are just added at the end of the review list to the review-listing website. The proposed system provides fair review placement on review listing pages and makes all high-quality reviews visible to customers on the top. The experimental results on data from two popular Indian e-commerce websites validate our claim, as 3–4 newer high-quality reviews are placed in the top ten reviews along with 5–6 older reviews based on review helpfulness. Our findings indicate that inclusion of features from product description data and customer question-answer data improves the prediction accuracy of the helpfulness score.