Comparative Analysis of ChatGPT, GPT-4, and Microsoft Copilot Chatbots for GRE Test
Keywords:
technology-enhanced learning; Graduate Record Examination; ChatGPT; GPT-4; Microsoft CopilotAbstract
This paper presents an analysis of how well three artificial intelligence chatbots: Copilot, ChatGPT, and GPT-4, perform when answering questions from standardized tests, mainly the Graduate Record Examination (GRE). A total of 137 questions with different forms of quantitative reasoning and 157 questions with verbal categories were used to assess the chatbot’s capabilities. This paper presents the performance of each chatbot across various skills and styles tested in the exam. The proficiency of the chatbots in addressing image-based questions is also explored, and the uncertainty level of each chatbot is illustrated. The results show varying degrees of success among the chatbots. ChatGPT primarily makes arithmetic errors, whereas the highest percentage of errors made by Copilot and GPT-4 are conceptual. However, GPT-4 exhibited the highest success rates, particularly in tasks involving complex language understanding and image-based questions. Results highlight the ability of these chatbots in helping examinees to pass the GRE with a high score, which encourages the use of them in test preparation. The results also show the importance of preventing access to similar chatbots when tests are conducted online, such as during the COVID-19 pandemic, to ensure a fair environment for all test takers competing for higher education opportunities.
https://doi.org/10.26803/ijlter.23.6.15
References
Abro, W. A., Aicher, A., Rach, N., Ultes, S., Minker, W., & Qi, G. (2022). Natural language understanding for argumentative dialogue systems in the opinion building domain. Knowledge-Based Systems, 242, 108318. https://doi.org/10.1016/j.knosys.2022.108318
Ali, M. Y., Naeem, S. Bin, & Bhatti, R. (2020). Artificial intelligence tools and perspectives of university librarians: An overview. Business Information Review, 37(3), 116–124. https://doi.org/10.1177/0266382120952016
Ali, S. R., Dobbs, T. D., Hutchings, H. A., & Whitaker, I. S. (2023). Using ChatGPT to write patient clinic letters. The Lancet Digital Health, 5(4), e179–e181. https://doi.org/10.1016/S2589-7500(23)00048-1
Baidoo-Anu, D., & Ansah, L. O. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 7(1), 52–62.
Biswas, S. S. (2023a). Potential use of chat gpt in global warming. Annals of Biomedical Engineering, 51(6), 1126–1127. https://doi.org/10.1007/s10439-023-03171-8
Biswas, S. S. (2023b). Role of chat gpt in public health. Annals of Biomedical Engineering, 51(5), 868–869. https://doi.org/10.1007/s10439-023-03172-7
Bleske-Rechek, A., & Browne, K. (2014). Trends in GRE scores and graduate enrollments by gender and ethnicity. Intelligence, 46, 25–34. https://doi.org/10.1016/j.intell.2014.05.005
Burger, B., Kanbach, D. K., Kraus, S., Breier, M., & Corvello, V. (2023). On the use of AI-based tools like ChatGPT to support management research. European Journal of Innovation Management, 26(7), 233–241. https://doi.org/10.1108/EJIM-02-2023-0156
Choi, J. H., Hickman, K. E., Monahan, A., & Schwarcz, D. (2023). Chatgpt goes to law school. Available at SSRN.
Chowdhary, K., & Chowdhary, K. R. (2020). Natural language processing. Fundamentals of Artificial Intelligence, 603–649. https://doi.org/10.1007/978-81-322-3972-7_19
Crompton, H., & Burke, D. (2023). Artificial intelligence in higher education: the state of the field. International Journal of Educational Technology in Higher Education, 20(1), 1–22. https://doi.org/10.1186/s41239-023-00392-8
Dominy, C. L., Arvind, V., Tang, J. E., Bellaire, C. P., Pasik, S. D., Kim, J. S., & Cho, S. K. (2022). Scoliosis surgery in social media: a natural language processing approach to analyzing the online patient perspective. Spine Deformity, 1–8. https://doi.org/10.1007/s43390-021-00433-0
Duong, T., & Suppasetseree, S. (2024). The Effects of an Artificial Intelligence Voice Chatbot on Improving Vietnamese Undergraduate Students’ English Speaking Skills. International Journal of Learning, Teaching and Educational Research, 23(3), 293–321.
ETS. (2023a). Educational Testing Service - Official Website. https://www.ets.org/
ETS. (2023b). GRE® Verbal Reasoning and Quantitative Reasoning Sample Questions with Explanations. Educational Testing Service. https://www.ets.org/pdfs/gre/gre-sample-questions.pdf
ETS. (2023c). Official GRE® Quantitative Reasoning Practice Questions Volume 1 (2nd ed.). Educational Testing Service. https://www.ets.org/gre/test-takers/general-test/prepare/prep-books-services.html
ETS. (2023d). Official GRE® Verbal Reasoning Practice Questions Volume 1 (2nd ed.). Educational Testing Service. https://www.ets.org/gre/test-takers/general-test/prepare/prep-books-services.html
Fang, H., Lu, W., Wu, F., Zhang, Y., Shang, X., Shao, J., & Zhuang, Y. (2015). Topic aspect-oriented summarization via group selection. Neurocomputing, 149, 1613–1619. https://doi.org/10.1016/j.neucom.2014.08.031
Farooq, U., & Anwar, S. (2023). ChatGPT Performance on Standardized Testing Exam--A Proposed Strategy for Learners. ArXiv Preprint ArXiv:2309.14519.
Gilson, A., Safranek, C. W., Huang, T., Socrates, V., Chi, L., Taylor, R. A., & Chartash, D. (2023). How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Medical Education, 9(1), e45312. https://doi.org/https://doi.org/10.2196/45312
Google. (2023). Bard Chatbot [AI Language Model]. https://bard.google.com/chat
Haman, M., & Školník, M. (2023). Using ChatGPT to conduct a literature review. Accountability in Research, 1–3. https://doi.org/10.1080/08989621.2023.2185514
Hargreaves, S. (2023). ‘Words Are Flowing Out Like Endless Rain Into a Paper Cup’: ChatGPT & Law School Assessments. The Chinese University of Hong Kong Faculty of Law Research Paper, 2023–03.
Haristiani, N. (2019). Artificial Intelligence (AI) chatbot as language learning medium: An inquiry. Journal of Physics: Conference Series, 1387(1), 12020.
Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261–266. /https://doi.org/10.1126/science.aaa8685
Khurana, D., Koli, A., Khatter, K., & Singh, S. (2023). Natural language processing: State of the art, current trends and challenges. Multimedia Tools and Applications, 82(3), 3713–3744. https://doi.org/10.1007/s11042-022-13428-4
Klieger, D. M., Kotloff, L. J., Belur, V., Schramm?Possinger, M. E., Holtzman, S. L., & Bunde, H. (2022). Studies of Possible Effects of GRE® ScoreSelect® on Subgroup Differences in GRE® General Test Scores. ETS Research Report Series, 2022(1), 1–33. https://doi.org/10.1002/ets2.12356
Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., Madriaga, M., Aggabao, R., Diaz-Candido, G., & Maningo, J. (2023). Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS Digital Health, 2(2), e0000198. https://doi.org/10.1371/journal.pdig.0000198
Liu, O. L., Bridgeman, B., Gu, L., Xu, J., & Kong, N. (2015). Investigation of response changes in the GRE revised general test. Educational and Psychological Measurement, 75(6), 1002–1020. https://doi.org/10.1177/0013164415573988
Mahsun, M., Ali, M., Ekaningrum, I. R., & Ibda, H. (2024). Trend of Using ChatGPT in Learning Process and Character Education: A Systematic Literature Review. International Journal of Learning, Teaching and Educational Research, 23(5), 387–402.
Microsoft. (2023). Copilot chatbot [AI Language Model]. Microsoft. https://www.bing.com/chat
Moneta-Koehler, L., Brown, A. M., Petrie, K. A., Evans, B. J., & Chalkley, R. (2017). The limitations of the GRE in predicting success in biomedical graduate school. PloS One, 12(1), e0166742. https://doi.org/10.1371/journal.pone.0166742
OpenAI. (2023a). ChatGPT [AI Language Model]. OpenAI. https://chat.openai.com
OpenAI. (2023b). GPT-4 [AI Language Model]. OpenAI. https://openai.com/gpt-4
Ouyang, F., Wu, M., Zheng, L., Zhang, L., & Jiao, P. (2023). Integration of artificial intelligence performance prediction and learning analytics to improve student learning in online engineering course. International Journal of Educational Technology in Higher Education, 20(1), 1–23. https://doi.org/10.1186/s41239-022-00372-4
Phillips-Wren, G. (2012). AI tools in decision making support systems: a review. International Journal on Artificial Intelligence Tools, 21(02), 1240005. https://doi.org/10.1142/S0218213012400052
Roohr, K., Olivera?Aguilar, M., Bochenek, J., & Belur, V. (2022). Exploring GRE® and TOEFL® score profiles of international students intending to pursue a graduate degree in the United States. ETS Research Report Series, 2022(1), 1–27. https://doi.org/10.1002/ets2.12343
Rudolph, J., Tan, S., & Tan, S. (2023). War of the chatbots: Bard, Bing Chat, ChatGPT, Ernie and beyond. The new AI gold rush and its impact on higher education. Journal of Applied Learning and Teaching, 6(1). https://doi.org/10.37074/jalt.2023.6.1.23
Surameery, N. M. S., & Shakor, M. Y. (2023). Use chat gpt to solve programming bugs. International Journal of Information Technology & Computer Engineering (IJITC) ISSN: 2455-5290, 3(01), 17–22. https://doi.org/10.55529/ijitc.31.17.22
Susnjak, T. (2022). ChatGPT: The end of online exam integrity? ArXiv Preprint ArXiv:2212.09292. https://doi.org/10.48550/arXiv.2212.09292
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Mohammad Abu-Haifa, Bara'a Etawi, Huthaifa Alkhatatbeh, Ayman Ababneh
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published by IJLTER are licensed under a Creative Commons Attribution Non-Commercial No-Derivatives 4.0 International License (CCBY-NC-ND4.0).