Hi There! πŸ‘‹

My name is Shariar (/ʃɑːriΛˆΙ‘Λr/ πŸ”Š). I am currently working on AI Safety and Reliability via Interpretability. I am particularly interested in how LLMs’ behavior evolves over longer contexts such as multi-turn interactions, how their internal mechanisms can be made interpretable, and how fairness can be ensured through targeted interventions.

I am actively seeking PhD opportunities in AI safety, interpretability, and reliability. If our research interests align, or you’d like to collaborate, please feel free to reach out !

In Spring 2026, I joined SPAR SPAR to work on real-time automated mechanistic interpretability methods for AI safety, under the mentorship of Sriram Balasubramanian.

Previously, I was a research intern at the NLP Lab in UC Riverside, under Prof. Yue Dong, where I was also fortunate to work with Prof. Kevin Esterling. I worked on behavioral evaluation of LLMs, and explored how psychometric and Bayesian modeling techniques can quantify and explain complex social behaviors in LLMs.

Prior to that, I worked on inclusive AI systems for low-resource languages, including Bengali medical ASR and document understanding tools. Some of my earlier work applied ML in other domains, including cloud systems and bioinformatics.

I led the AI Research and Engineering team at Celloscope Ltd. I hold a BSc and MSc in Computer Science and Engineering from Bangladesh University of Engineering and Technology (BUET). My detailed CV can be found here.

News

Research

I am working on using mechanistic interpretability as a practical tool for AI safety building methods that scale beyond toy settings and validating them on real model behaviors. I’m especially interested in using LLM agents to automate interpretability (autointerp). For example, turning circuit analysis from manual, single-prompt inspection into a scalable process.

  • Agents for autointerp: building agentic pipelines that discover, label, and validate circuits and features at scale, so interpretability keeps pace with model capability instead of lagging on isolated examples.
  • Interpretability for safety: locating and editing the causal mechanisms behind undesirable behavior: systematic bias in sensitive domains, unstable reasoning, moving toward targeted, mechanism-level interventions.
  • Reliability under real use: LLMs shift stance and tone under minor prompt changes; I am interest in designing benchmarks to measure their stability and build interpretable methods to measure and improve stability in multi-turn settings.

See my publications for details.