Mapping LLM Ideology with Multidimensional IRT
Tags: Psychometrics · LLM Evaluation · Bayesian Modeling
A Bayesian multidimensional Item Response Theory (IRT) model, implemented in Stan, that places large language models in a latent ideological space — estimating each model’s position (ability θ) alongside item discrimination (α) and difficulty (β). Validated against DW-NOMINATE, the standard human benchmark for political ideology, the model reaches 0.98 correlation on the primary dimension.
A methodological finding: an improper uniform prior on the discrimination correlations recovers the latent structure far better than structured priors (≈56% correlation, versus ≈40% for an LKJ(0.1) prior and ≈10% for independent parameters), letting the likelihood determine the correlation structure without Bayesian shrinkage.
Validation report (PDF) · Model report · Code on GitHub
This is the psychometric backbone of our paper When Models Refuse (arXiv:2508.21448).
