The Classical Test or Item Response Measurement Theory: The Status of the Framework at the Examination Council of Lesotho


  • Musa Adekunle Ayanwale
  • Julia Chere-Masopha
  • Malebohang C. Morena


classical test theory; item response theory; Examination Council of Lesotho; item development; item analysis


While the Examination Council of Lesotho (ECOL) is burdened with a huge workload of assessment tasks, their procedures for developing tests, analysing items, and compiling scores heavily rely on the classical test theory (CTT) measurement framework. The CTT has been criticised for its flaws, including being test-oriented, sample dependent, and assuming linear relationships between latent variables and observed scores. This article presents an overview of CTT and item response theory (IRT) and how they were applied to standard assessment questions in the ECOL. These theories have addressed measurement issues associated with commonly used assessments, such as multiple-choice, short response, and constructed response tests. Based on three search facets (Item response theory, classical test theory, and examination council of Lesotho), a comprehensive search was conducted across multiple databases (such as Google Scholar, Scopus, Web of Science, and PubMed). The paper was theoretically developed using the electronic databases, keywords, and references identified in the articles. Furthermore, the authors ensure that the keywords are used to identify relevant documents in a wide variety of sources. A general remark was made on the effective application of each model in practice with respect to test development and psychometric activities. In conclusion, the study recommends that ECOL switch from CTT to modern test theory for test development and item analysis, which offers multiple benefits.


Ackerman, T. A. (2010). The Theory and Practice of Item Response Theory by de Ayala, R. J. Journal of Educational Measurement, 47(4), 471–476.

Adedoyin, O. O. (2010). Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories. International Journal of Educational Sciences, 2(2), 107–113.

Adegoke, B. A. (2013). Comparison of item statistics of physics achievement test using Classical test theory and item response theory frameworks. Journal of Education and Practice, 22(4), 87–96.

Adewale, J.G., Adegoke, B.A., Adeleke, J.O. & Metibemu, M. A. (2017). A Training Manual On Item Response Theory. Institute of Education, University of Ibadan in Collaboration with National Examinations Council, Minna, Niger State.

Alagoz, C. (2005). Scoring tests with dichotomous and polytomous items.

Algina, J., & Swaminathan, H. (2015). Psychometrics: Classical Test Theory. In International Encyclopedia of the Social & Behavioral Sciences: Second Edition (pp. 423–430). Elsevier Inc.

Ayanwale, M.A. (2019). Efficacy of Item Response Theory in the Validation and Score Ranking of Dichotomous and Polytomous Response Mathematics Achievement Tests in Osun State, Nigeria. In Doctoral Thesis, Institute of Education, University of Ibadan (Issue April).

Ayanwale, Musa Adekunle, Adeleke, J. O., & Mamadelo, T. I. (2019). Invariance Person Estimate of Basic Education Certificate Examination: Classical Test Theory and Item Response Theory Scoring Perspective. Journal of the International Society for Teacher Education, 23(1), 18–26.

Baker, F.B. (2001). The Basics of Item Response Theory. Test Calibration. ERIC Clearinghouse on Assessment and Evaluation.

Baker, Frank B, & Kim, S. (2017). The Basics of Item Response Theory Using R (S. E. Fienberg (ed.)). Springer International Publishing.

Behavior, S., Yen, Y. C., Chen, H., & Cheng, M. (2012). The Four-Parameter Logistic Item Response Theory Model As a Robust Method of Estimating Ability Despite Aberrant Responses. Social Behavior and Personality: An international journal, 40(10), 1679-1694.

Bichi, A. A., Embong, R., Talib, R., Salleh, S., & Bin Ibrahim, A. (2019). Comparative Analysis of Classical Test Theory and Item Response Theory using Chemistry Test Data. International Journal of Engineering and Advanced Technology, 8(5), 1260–1266.

Bichi, A. A., & Talib, R. (2018). Item Response Theory: An Introduction to Latent Trait Models to Test and Item Development. International Journal of Evaluation and Research in Education, 7(2), 142.

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In: Lord, F.M. and Novick, M.R., Eds., Statistical Theories of Mental Test Scores, Addison-Wesley, Reading, 397-479.

Bovaird, J. A., & Embretson, S. E. (2012). Modern Measurement in the Social Sciences. In The SAGE Handbook of Social Research Methods (pp. 268–289). SAGE Publications Ltd.

Brown, J. D. (2013). Classical test theory. In The Routledge Handbook of Language Testing (pp. 323–335). Springer, Singapore.

Cai, L., Choi, K., Hansen, M., & Harrell, L. (2016). Item Response Theory. In Annual Review of Statistics and Its Application (Vol. 3, pp. 297–321). Annual Reviews Inc.

Cappelleri, J. C., Jason Lundy, J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clinical Therapeutics, 36(5), 648–662.

Chen, W. H., & Thissen, D. (1997). Local Dependence Indexes for Item Pairs Using Item Response Theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289.

Clark, L. A., & Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412–1427.

Cohen, R.J., & Swerdlik, M. E. (2009). Psychological testing and assessment: An introduction to tests and measurement. (4th ed.). Mayfield Publishing House.

Cohen, R. . J., Swerdlik, M. E., & Sturman, E. (2013). Psychological testing and assessment : an introduction to tests and measurement. Psychological Assessment, 53(4), 55–67.

Courville, T. G. (2005). An empirical comparison of item response theory and classical test theory item/person statistics. Dissertation Abstracts International Section A: Humanities and Social Sciences, 65(7), 2575.

Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Harcourt Brace Jovanovich.

De Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. In Medical Education, 44(1), 109–117.

Debelak, R., & Koller, I. (2020). Testing the Local Independence Assumption of the Rasch Model With Q3-Based Nonparametric Model Tests. Applied Psychological Measurement, 44(2), 103–117.

Demars, C. E. (2017). Classical test theory and item response theory. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test Development, 2(1), 49–73.

Dent, J. A., Harden, R. M., & Hunt, D. (2001). A Practical Guide for Medical Teachers. Journal of the Royal Society of Medicine, 94(12), 653–653.

DeVellis, R. F. (2006). Classical test theory. Medical Care, 44(11), 50-59.

Downing, S. M. (2003). Item response theory: Applications of modern test theory in medical education. Medical Education, 37(8), 739–745.

Ebel, R. L. (1965). Book Reviews : Measuring Educational Achievement. Educational and Psychological Measurement, 25(4), 1167–1169.

Edelen, M. O., & Reeve, B. B. (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Research, 16(1), 5–18.

Elgadal, A. H., & Mariod, A. A. (2021). Item Analysis of Multiple-choice Questions (MCQs): Assessment Tool For Quality Assurance Measures. Sudan Journal of Medical Sciences, 16(3), 334-346.

Embretson, S. E., & Reise, S. P. (2013). Item Response Theory for Psychologists. Lawrence Erlbaum Associates, Inc., Mahwah. 1–371.

Esmaeeli, B., Shandiz, E. E., Norooziasl, S., & Shojaei, H. (2021). The Optimal Number of Choices in Multiple-Choice Tests : A Systematic Review. Med Edu Bull, 2(5), 253–260.

Exam Council of, L. (2018). Establishment of ECOL.

Filgueiras, A., Hora, G., Fioravanti-Bastos, A. C. M., Santana, C. M. T., Pires, P., De Oliveira Galvão, B., & Landeira-Fernandez, J. (2014). Development and psychometric properties of a novel depression measure. Temas Em Psicologia, 22(1), 249–269.

Finch, H., & Monahan, P. (2008). A bootstrap generalization of modified parallel analysis for IRT dimensionality assessment. Applied Measurement in Education, 21(2), 119–140.

Finch, W. H., & French, B. F. (2015). Modeling of Nonrecursive Structural Equation Models With Categorical Indicators. Structural Equation Modeling, 22(3), 416–428.

Ganglmair, A., & Lawson, R. (2010). Advantages of Rasch modelling for the development of a scale to measure affective response to consumption. In E-European Advances in Consumer Research, 6, 162–167.

Gay, L.R, Miles, G. E. & Airasian, P. (2011). Educational Research: Competencies for Analysis and Applications. 10th Edition, Pearson Education International, Boston.

González, J., & Wiberg, M. (2017). Applying Test Equating Methods using R. Methodology of Educational Measurement and Assessment.

Hambleton, R.K. and Swaminathan, H. (1985). Item response theory: principles and applications. p.332.

Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38–47.

Hill, C., Nel, J. A., van de Vijver, F. J. R., Meiring, D., Valchev, V. H., Adams, B. G., & de Bruin, G. P. (2013). Developing and testing items for the South African Personality Inventory. SA Journal of Industrial Psychology, 39(1), 1-13.

Hingorjo, M. R., & Jaleel, F. (2012). Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. Journal of the Pakistan Medical Association, 62(2), 142–147.

Immekus, J. C., Snyder, K. E., & Ralston, P. A. (2019). Multidimensional Item Response Theory for Factor Structure Assessment in Educational Psychology Research. Frontiers in Education, 4.

IResearchNet (2022). Classical Test Theory.

Jabrayilov, R., Emons, W. H. M., & Sijtsma, K. (2016). Comparison of Classical Test Theory and Item Response Theory in Individual Change Assessment. Applied Psychological Measurement, 40(8), 559–572.

Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17–24.

Khan, H. F., Danish, K. F., Awan, A. S., & Anwar, M. (2013). Identification of technical item flaws leads to improvement of the quality of single best multiple choice questions. Pakistan Journal of Medical Sciences, 29(3), 715.

Kim, D., de Ayala, R. J., Ferdous, A. A., & Nering, M. L. (2011). The comparative performance of conditional independence indices. Applied Psychological Measurement, 35(6), 447–471.

Kline, R. B. (2005). “Principles and practice of structural equation modelling ”. ((2nd ed.)). The Guilford Press.

Kline, T. (2014). Classical Test Theory: Assumptions, Equations, Limitations, and Item Analyses. In Psychological Testing: A Practical Approach to Design and Evaluation, 23(2), 91–106.

Kolen, M. J. (1981). Comparison of traditional and Item Response Theory methods for equatingTests. Journal of Educational Measurement, 18(1), 1–11.

Krishnan, V. (2013). The Early Child Development Instrument ( EDI ): An item analysis using Classical Test Theory ( CTT ) on Alberta ’ s data. Early Child Development Mapping (ECMap) Project Community-University Partnership (CUP) Faculty of Extension, University of Alberta.

Lang, J. W. B., & Tay, L. (2021). The Science and Practice of Item Response Theory in Organizations. In Annual Review of Organizational Psychology and Organizational Behavior, 8, 311–338.

Lee, W., & Ansley, T. N. (2007). Assessing IRT Model-Data Fit for mixed format tests. Journal of Applied Psychology, 92(2), 23–50.

Lord, F. M. (2012). Applications of item response theory to practical testing problems. In Applications of Item Response Theory To Practical Testing Problems.

Magis, D. (2007). Influence, Information and Item Response Theory in Discrete Data Analysis. Retrieved on 12 June, 2022 from

Mona, N. (2014). Application of Classical Test Theory and Item Response Theory to Analyze Multiple Choice Questions (Unpublished doctoral thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/24958

Nataranjan, V. (2009). Basic Principle of Item Response Theory and Application to Practical Testing and Assessement. Merit Trac Services Publishing Ltd.

Ojerinde, D. & Ifewulu, B. C. (2012). Item Unidimensionality Using 2010 Unified Tertiary Matriculation Examination Mathematics Pre-test. A Paper Presented at the 2012 International Conference of IAEA, 5–18.

Pliakos, K., Joo, S. H., Park, J. Y., Cornillie, F., Vens, C., & Van den Noortgate, W. (2019). Integrating machine learning into item response theory for addressing the cold start problem in adaptive learning systems. Computers and Education, 137, 91–103.

Preston, R., Gratani, M., Owens, K., Roche, P., Zimanyi, M., & Malau-Aduli, B. (2020). Exploring the Impact of Assessment on Medical Students’ Learning. Assessment and Evaluation in Higher Education, 45(1), 109–124.

Privitera, G. J. (2012). Statistics for the behavioral sciences. Sage Publications, Inc.

Reckase, M. D. (2009). Multidimensional Item ResponseTheory. Springer Verlag.

Reise, S. P. (1990). A Comparison of Item- and Person-Fit Methods of Assessing Model-Data Fit in IRT. Applied Psychological Measurement, 14(2), 127–137.

Reise, S. P., & Waller, N. G. (2009). Item response theory and clinical measurement. Annual Review of Clinical Psychology, 5, 27–48.

Rupp, A. A. (2003). Item Response Modeling With BILOG-MG and MULTILOG for Windows. International Journal of Testing, 3(4), 365–384.

Rusch, T., Lowry, P. B., Mair, P., & Treiblmaier, H. (2017). Breaking free from the limitations of classical test theory: Developing and measuring information systems scales using item response theory. Information and Management, 54(2), 189–203.

Sim, S. M., & Rasiah, R. I. (2006). Relationship between item difficulty and discrimination indices in true/false-type multiple choice questions of a para-clinical multidisciplinary paper. Annals of the Academy of Medicine Singapore, 35(2), 67–71.

Song, Y., Kim, H., & Park, S. Y. (2019). An Item Response Theory Analysis of the Korean Version of the CRAFFT Scale for Alcohol Use Among Adolescents in Korea. Asian Nursing Research, 13(4), 249–256.

Steyer, R. (2001). Classical (Psychometric) Test Theory. International Encyclopedia of the Social & Behavioral Sciences, 1955–1962.

Tang, X., Karabatsos, G., & Chen, H. (2020). Detecting Local Dependence: A Threshold-Autoregressive Item Response Theory (TAR-IRT) Approach for Polytomous Items. Applied Measurement in Education, 280–292.

Tay, L., Meade, A. W., & Cao, M. (2015). An Overview and Practical Guide to IRT Measurement Equivalence Analysis. Organizational Research Methods, 18(1), 3–46.

Toksöz, S., & Ertunç, A. (2017). Item Analysis of a Multiple-Choice Exam. Advances in Language and Literary Studies, 8(6), 141.

Traub, R. E. (2015). Classical test theory in historical perspective. Journal of Educational Measurement: Issues and Practice, 16(4), 8–14.

Tuerlinckx, F., Rijmen, F., Molenberghs, G., Verbeke, G., Briggs, D., Van den Noortgate, W., Meulders, M., & De Boeck, P. (2004). Estimation and software. In Explanatory Item Response Models, 6, 343–373.

Vyas, R., & Supe, A. (2008). Multiple choice questions: A literature review on the optimal number of options. In National Medical Journal of India, 21(3), 130–133.

Wells, C. S., & Wollack, J. A. (2018). An Instructor’s Guide to Understanding Test Reliability. Testing and Evaluation Services, 1–7.

Yen, W. M. (1993). Scaling Performance Assessments: Strategies for Managing Local Item Dependence. Journal of Educational Measurement, 30(3), 187–213.

Yu, C. H., Popp, S. O., Digangi, S., & Jannasch-Pennell, A. (2007). Assessing unidimensionality: A comparison of Rasch modeling, Parallel analysis, and TETRAD. Practical Assessment, Research and Evaluation, 12(14), 1–19.

Zhang, J. (2012). Calibration of Response Data Using MIRT Models With Simple and Mixed Structures. Applied Psychological Measurement, 36(5), 375–398.

Zhu, X., & Lu, C. (2017). Re-evaluation of the New Ecological Paradigm scale using item response theory. Journal of Environmental Psychology, 54, 79–90.