The Myers-Briggs Type Indicator (MBTI) is a scientifically validated personality assessment that reliably classifies people into types and predicts behavior and career fit.
The MBTI has poor test-retest reliability: roughly half of respondents get a different type when retested a few weeks later. Systematic reviews find it does not predict job performance, academic outcomes, or relationship success better than chance.
What changed?
By the early 1980s, the Myers-Briggs Type Indicator had become a fixture of American institutional life. Career counselors administered it to high school students choosing college majors. Human resources departments used it to sort job applicants and assemble teams. Management training programs built entire curricula around its four dichotomies: Introversion versus Extraversion, Sensing versus Intuition, Thinking versus Feeling, Judging versus Perceiving. Participants received a four-letter type, such as INFP or ESTJ, and with it a detailed personality portrait that many found strikingly accurate. The instrument felt scientific. It had the form of a validated assessment: standardized questions, a scoring key, published reliability data, a credentialed publisher.
The origins of the MBTI are not academic. Isabel Briggs Myers and her mother Katharine Cook Briggs developed the instrument beginning in the 1940s, drawing on the theoretical personality typology that Carl Jung had published in "Psychological Types" in 1921. Neither woman held a doctorate in psychology. The test was not developed through the standard process of academic validation, in which items are generated from theory, piloted on representative populations, refined against external criteria, and subjected to peer review before publication. It grew instead from a conviction that Jung's types were real and practically useful, and that a questionnaire could capture them.
By the time academic psychologists examined the instrument systematically, problems emerged. In 1989, Robert McCrae and Paul Costa published a study in the Journal of Personality examining the MBTI from the perspective of the five-factor model of personality, which had by then accumulated substantial empirical support. McCrae and Costa found that three of the MBTI's four scales corresponded reasonably well to established Big Five dimensions, but the correspondence was imperfect, and the fourth scale, Judging-Perceiving, did not map cleanly onto any well-validated construct. More damaging was what the analysis revealed about the instrument's core claim: that it sorted people into discrete types. The Big Five research tradition had found that personality traits are continuously distributed across populations. People do not cluster into groups; they fall along spectra. Forcing continuous distributions into binary categories discards real information and creates the illusion of categorical difference where none exists in the underlying data.
Test-retest reliability studies accumulated through the 1990s and produced an often-cited finding: when individuals retook the MBTI after intervals of a few weeks to a few months, roughly half received a different four-letter type. David Pittenger reviewed the psychometric literature in a 2005 paper in the Consulting Psychology Journal: Practice and Research and summarized the case against the instrument's construct validity: the scales did not consistently predict job performance, academic outcomes, or relationship success at levels above chance when compared in controlled studies against simpler measures. The types were memorable and felt meaningful, but predictive validity requires more than that.
The MBTI's commercial success insulated it from the usual feedback loops of academic science. The Myers-Briggs Company, which owns and publishes the instrument, has produced substantial internal validation research, and some of that research documents reasonable short-term reliability under controlled conditions. But the independent peer-reviewed literature consistently identified the same core weakness: a binary typology imposed on continuous data cannot recover the information it discards. The classification system is self-sealing in a specific way. Because the sixteen types are described in language general enough to be recognizable to almost anyone, the sense of accuracy that respondents report reflects a well-documented psychological phenomenon, the tendency to accept personality descriptions as personally accurate, rather than a confirmation that the instrument has measured something real.