A Deep Learning Approach to Sentiment Analysis of Hotel Reviews: Comparing BERT and LSTM Models

Main Article Content

  Gunawan Wang
  Mustafa Musa Jaber

Abstract

Background of study: Background of study: The impact of online reviews on consumer behavior is especially relevant in the hospitality industry, and the sentiment corresponding to these reviews is difficult to determine due to the subjectivity involved in the reviews, disparate writing styles, and the noticeable class imbalance resulting from the positive reviews outnumbering the negative and neutral ones. Standard machine learning approaches are biased toward the majority class and do not address these problems well.
Aims and scope of paper: The present research uses BERT and LSTM deep learning models to perform classification of customer reviews for hotels into three categories: positive, neutral, and negative. The main focus of the research is to analyze the performance of the models concerning sentiment prediction and the handling of the data imbalance problem and to benchmark the models with and without the use of under-sampling.
Methods: The dataset comprising of 20,000 reviews from the TripAdvisor platform was pre processed in various ways including the removal of stop words/special characters, tokenization, stemming, and lemmatization. The customer reviews were assigned star ratings, which were aggregated into categories of 4-5 stars as positive, 3 stars as neutral, and 1-2 stars as negative. Random under-sampling was used to the positive class to achieve balance in the dataset. The BERT (bert-base-uncased) and LSTM models were prepared with what was assumed to be a final train-validation split of 80:20, and were evaluated based on standard metrics of accuracy, precision, recall, and rel F1 score, and with a cross-validation of 5 folds.
Result: Without the use of under-sampling, BERT achieved the best overall performance with an accuracy of 0.86 and an F1 score of 0.93 for the positive sentiment class and an F1 score of 0.79 in the negative sentiment class. However, all models struggled with neutral sentiments (BERT F1-score: 0.43, LSTM: 0.25). Under-sampling improved neutral class recall (BERT: 0.79) but decreased overall accuracy (BERT: 0.73; LSTM: 0.67) and positive class precision.
Conclusion: BERT generally outperforms LSTM for hotel review sentiment analysis, particularly with imbalanced data. While under-sampling helps address class imbalance by improving neutral recall, it incurs significant performance trade-offs, reducing overall accuracy and precision in majority classes due to information loss. Future work should explore advanced resampling (SMOTE, ADASYN) or transfer learning (RoBERTa, XLNet) for better balance and neutral sentiment classification.

Article Details

How to Cite
Wang, G., & Jaber, M. M. (2025). A Deep Learning Approach to Sentiment Analysis of Hotel Reviews: Comparing BERT and LSTM Models. International Journal of Advances in Artificial Intelligence and Machine Learning, 2(2), 67–75. https://doi.org/10.58723/ijaaiml.v2i2.403
Section
Articles

References

Arroni, S., Galán, Y., Guzmán-Guzmán, X., Núñez-Valdez, E. R., & Gómez, A. (2023). Sentiment Analysis and Classification of Hotel Opinions in Twitter With the Transformer Architecture. International Journal of Interactive Multimedia and Artificial Intelligence, 8(1), 53–63. https://doi.org/10.9781/ijimai.2023.02.005

Chamidah, N., Widiyanto, D., Seta, H. B., & Aziz, A. A. (2024). The Impact of Oversampling and Undersampling on Aspect-Based Sentiment Analysis of Indramayu Tourism Using Logistic Regression. Revue d’Intelligence Artificielle, 38(3), 795–804. https://doi.org/10.18280/ria.380306

Chi, D., Huang, T., Jia, Z., & Zhang, S. (2025). Research on sentiment analysis of hotel review text based on BERT-TCN-BiLSTM-attention model. Array, 25(February), 100378. https://doi.org/10.1016/j.array.2025.100378

Gardazi, N. M., Daud, A., Malik, M. K., Bukhari, A., Alsahfi, T., & Alshemaimri, B. (2025). BERT applications in natural language processing: a review. Artificial Intelligence Review, 58(6). https://doi.org/10.1007/s10462-025-11162-5

George, S., & Srividhya, V. (2022). Performance Evaluation of Sentiment Analysis on Balanced and Imbalanced Dataset Using Ensemble Approach. Indian Journal of Science and Technology, 15(17), 790–797. https://doi.org/10.17485/ijst/v15i17.2339

Hartmann, J., Heitmann, M., Siebert, C., & Schamp, C. (2023). More than a Feeling: Accuracy and Application of Sentiment Analysis. International Journal of Research in Marketing, 40(1), 75–87. https://doi.org/10.1016/j.ijresmar.2022.05.005

Hu, N., Pavlou, P. A., Zhang, J., Hu, N. ;, & Pavlou, P. A. ; (2017). On self-selection biases in online product reviews On self-selection biases in online product reviews Part of the Databases and Information Systems Commons, E-Commerce Commons, and the Numerical Analysis and Scientific Computing Commons Citation Citation . MIS Quarterly, 41(2), 449–472. https://ink.library.smu.edu.sg/sis_research

Jim, J. R., Talukder, M. A. R., Malakar, P., Kabir, M. M., Nur, K., & Mridha, M. F. (2024). Recent advancements and challenges of NLP-based sentiment analysis: A state-of-the-art review. Natural Language Processing Journal, 6(February), 100059. https://doi.org/10.1016/j.nlp.2024.100059

Li, H., Liu, Y., Tan, C. W., & Hu, F. (2020). Comprehending customer satisfaction with hotels: Data analysis of consumer-generated reviews. International Journal of Contemporary Hospitality Management, 32(5), 1713–1735. https://doi.org/10.1108/IJCHM-06-2019-0581

Malashin, I., Tynchenko, V., Gantimurov, A., Nelyub, V., & Borodulin, A. (2024). Applications of Long Short-Term Memory (LSTM) Networks in Polymeric Sciences: A Review. Polymers, 16(18), 1–44. https://doi.org/10.3390/polym16182607

Mienye, I. D., Swart, T. G., & Obaido, G. (2024). Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information, 15(9), 517. https://doi.org/10.3390/info15090517

Miftahushudur, T., Sahin, H. M., Grieve, B., & Yin, H. (2025). A Survey of Methods for Addressing Imbalance Data Problems in Agriculture Applications. Remote Sensing, 17(3), 1–31. https://doi.org/10.3390/rs17030454

Mishra, A., Kishan, K., & Tewari, V. (2023). THE INFLUENCE OF ONLINE REVIEWS ON CONSUMER DECISION-MAKING IN THE HOTEL INDUSTRY. Journal of Data Acquisition and Processing, 3(September), 2559–2573. https://doi.org/10.28934/jwee23.34.pp48-74

Mohammed, R., Rawashdeh, J., & Abdullah, M. (2020). Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results. 2020 11th International Conference on Information and Communication Systems, ICICS 2020, May, 243–248. https://doi.org/10.1109/ICICS49469.2020.239556

Putra, P. P., Anam, M. K., Chan, A. S., Hadi, A., Hendri, N., & Masnur, A. (2025). Optimizing Sentiment Analysis on Imbalanced Hotel Review Data Using SMOTE and Ensemble Machine Learning Techniques. Journal of Applied Data Sciences, 6(2), 936–951. https://doi.org/10.47738/jads.v6i2.618

Roumeliotis, K. I., Tselikas, N. D., & Nasiopoulos, D. K. (2024). Leveraging Large Language Models in Tourism: A Comparative Study of the Latest GPT Omni Models and BERT NLP for Customer Review Classification and Sentiment Analysis. Information (Switzerland), 15(12), 1–23. https://doi.org/10.3390/info15120792

Supriyono, Wibawa, A. P., Suyono, & Kurniawan, F. (2024). Advancements in natural language processing: Implications, challenges, and future directions. Telematics and Informatics Reports, 16(April), 100173. https://doi.org/10.1016/j.teler.2024.100173

Tan, K. L., Lee, C. P., Anbananthen, K. S. M., & Lim, K. M. (2022). RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network. IEEE Access, 10, 21517–21525. https://doi.org/10.1109/ACCESS.2022.3152828

Verhoef, P. C., Broekhuizen, T., Bart, Y., Bhattacharya, A., Qi Dong, J., Fabian, N., & Haenlein, M. (2021). Digital transformation: A multidisciplinary reflection and research agenda. Journal of Business Research, 122(November 2019), 889–901. https://doi.org/10.1016/j.jbusres.2019.09.022

Xiang, Z., Du, Q., Ma, Y., & Fan, W. (2017). A comparative analysis of major online review platforms: Implications for social media analytics in hospitality and tourism. Tourism Management, 58(February), 51–65. https://doi.org/10.1016/j.tourman.2016.10.001