Mapping LLM Ideology with Multidimensional IRT

Tags: Psychometrics · LLM Evaluation · Bayesian Modeling

A Bayesian multidimensional Item Response Theory (IRT) model, implemented in Stan, that places large language models in a latent ideological space — estimating each model’s position (ability θ) alongside item discrimination (α) and difficulty (β). Validated against DW-NOMINATE, the standard human benchmark for political ideology, the model reaches 0.98 correlation on the primary dimension.

A methodological finding: an improper uniform prior on the discrimination correlations recovers the latent structure far better than structured priors (≈56% correlation, versus ≈40% for an LKJ(0.1) prior and ≈10% for independent parameters), letting the likelihood determine the correlation structure without Bayesian shrinkage.

Validation report (PDF) · Model report · Code on GitHub

This is the psychometric backbone of our paper When Models Refuse (arXiv:2508.21448).

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Shariar Kabir

Share on