Shariar Kabir

AI Research Engineer · Dhaka, Bangladesh · +880-1832055656 · shariar1405076@gmail.com

I am a researcher in machine learning and natural language processing with a focus on understanding and improving the reliability of large language models. I am currently working full-time as an AI Research Engineer at Celloscope Ltd. I completed my BSc in Computer Science and Engineering, from Bangladesh University of Engineering and Technology (BUET). My work explores how LLMs behavior evolve over longer context such as in multi-turn interactions, how their internal mechanisms can be made interpretable, and how fairness can be ensured through principled interventions.

Previously, I worked on inclusive AI systems for low-resource languages, including Bengali medical ASR and document understanding tools. My long-term goal is to build methods that make AI systems not only capable but also transparent, stable, and socially aligned. My detailed CV can be found Here

News


Interests

Large foundation models (e.g., LLMs and VLMs) remain a black box whose inner reasoning and long-term behavior are still poorly understood. Despite this, they are increasingly deployed for sensitive tasks such as mass persuasion and student education. My goal is to contribute to research that makes such models more reliable, transparent, and socially aligned. Specifically, I am interested in working on three intertwined directions:

  1. Robustness: Understanding and improving model behavior over extended interactions.
  2. Interpretability: Developing methods to explain the internal mechanisms and decision-making processes of large models.
  3. Fairness: Designing principled interventions to mitigate biases and ensure equitable treatment across diverse user groups.

Outside of my professional pursuits, I consider myself curious by nature and enjoy learning in general. I am an avid reader with a keen interest in classical thrillers, and philosophical novels. Music serves as both inspiration and solace for me, encompassing the timeless tunes of classical rock and the soul-stirring melodies of Bengali classical music.

I love animals and have a soft spot for cats due not only to their elegance but also their independent nature and curious spirits.

Experience

Research Intern

UCR NLP Lab (Prof. Yue Dong)

Working on methods to combine interpretability tools with fairness diagnostics from social science for designing an intervention that targets emergent activation circuits in LLMs responsible for particular behavioral tendencies.

  • Understanding LLMs’ response instability over longer context.
  • Mechanistic Interpretability of LLM in Socio-Political Reasoning.
  • LLMs’ Social Epistemology using Bayesian Statistics.
Winter 2025 - Present

AI Research Engineer

Celloscope Limited

I led a team of six research engineers developing production-grade NLP and computer vision systems deployed across multiple industrial domains. Key projects I directed include:

  • Exercise Monitoring System for LG Nova, which used multimodal pose-estimation and language models to provide real-time feedback on workout form.
  • Resume Shortlister, a RAG-based ranking engine that matched the requirements from RFPs or job descriptions with candidate resumes using a hybrid approach combining rule-based filtering with semantic retrieval.
  • Drawing Checker, a vision system to automate design-error detection in engineering drawings through deep-learning-based object detection and geometric analysis.
September 2020 - Present

NLP and Data Scientist

MedAI Pvt. Limited (Part Time)

Extracting data-driven insights from medical data of Bangladesh and developing a smart healthcare platform that uses AI to deliver personalised healthcare services in local languages. Major contributions are:

  • Empowering Mental Health Support for Bengali Speakers through a Conversational AI chatbot.
  • Synthetic patient generator reflecting local demography.
  • Classifier for clustering patients disease using symptoms and other demography.
  • Training and serving of voice-based patients' symptoms collector.
  • Design and developement of audio data collection portal.
August 2021 - November 2024

DevOps

GRP, ICT Division

Automating the deployment process and monitoring of numerous microservices. Major contributions include:

  • Automation scripts for deploying web apps and micro-services in Docker
  • Gateway configuration using NGINX reverse proxy
  • Document generation scripts from Google Sheets
May 2019 - August 2020

Education

Bangladesh University of Engineering and Technology (BUET)

Master of Science (part time)
Computer Science and Engineering

GPA (coursework): 3.54

Thesis: Dynamic Resource Allocation for Workloads in Serverless Architecture using Collaborative Filtering. Under the supervision of Professor Muhammad Abdullah Adnan.

Coursework: Bioinformatics Algorithms · Distributed Computing Systems · Data Mining · Data Management in the Cloud · Advanced Database Systems · Advanced Artificial Intelligence

April 2019 - October 2022

Bangladesh University of Engineering and Technology (BUET)

Bachelor of Science
Computer Science and Engineering

GPA: 3.53

Major: Artificial Intelligence

Thesis: Active Learning on Big Data; A research on how we can apply active learning on big data in a distributed cloud computing system. Under the supervision of Professor Muhammad Abdullah Adnan.

Coursework: Machine Learning · Pattern Recognition · Computer Graphics · Artificial Intelligence · Digital Image Processing · Data Structures · Database · Operating Systems · Software Development · Computer Architecture · Microprocessors and Microcontrollers · Computer Networks · Concrete Mathematics · Discrete Mathematics · Numerical Methods · Software Engineering and Information System Design · Compiler · Data Communication · Digital Logic Design · Structured Programming Language · Object Oriented Programming Language · Theory of Computation

February 2015 - April 2019


Publications

Please, refer to my Google Scholar profile for a complete list of my publications.

Automatic Speech Recognition for Biomedical Data in Bengali Language. [PDF]
Shariar Kabir, Nazmun Nahar, Shyamasree Saha, Mamunur Rashid,
arXiv preprint arXiv:2406.12931 (2024)

This paper presents the development of a prototype Automatic Speech Recognition (ASR) system specifically designed for Bengali biomedical data. Recent advancements in Bengali ASR are encouraging, but a lack of domain-specific data limits the creation of practical healthcare ASR models. This project bridges this gap by developing an ASR system tailored for Bengali medical terms like symptoms, severity levels, and diseases, encompassing two major dialects: Bengali and Sylheti.

SynthNID: Synthetic Data to Improve End-to-end Bangla Document Key Information Extraction. [PDF]
Syed Monsur, Shariar Kabir, Sakib Chowdhury,
In Proceedings of the First Workshop on Bangla Language Processing (BLP-2023), pages 117–123, Singapore. Association for Computational Linguistics.

In this paper, we have introduced SynthNID, a system to generate domain-specific document image data for training OCR-less end-to-end Key Information Extraction systems. We show the generated data improves the performance of the extraction model on real datasets and the system is easily extendable to generate other types of scanned documents for a wide range of document understanding tasks.


Awards & Achievements

Industry Coding Assessment

CodeSignal General Coding Assessment (ICA): 510/600 (≈ 722/850 equivalent GCA, top 15%)
2025

Global Health Equity Challenge Award

MIT Solve

AmarDoctor by MedAI has been selected as one of the six solvers out of 2200+ participants worldwide for its innovative approach to accessible healthcare.

2024


Research

NER From Chatbot User Messages

Extraction of named entities (NE) like benefeciary names, transfer amount, accound type, account no. etc. Instead of applying a single language model like BERT for all of these We employed a recipe of different approaches including BERT, RegEx and lookup tables. This was done to minimize training/finetuning tasks which is challenging for low-resource language like Bengali. We used BERT based model for beneficiary name extraction, RegEx for transfer amount and account no. extraction and lookup tables for account type extraction.

Finetuning LLMs for Mental Health Counsel

Recognizing the inherent bias of most LLMs towards European languages and ethnicities and the low resources of structured Bengali data, I initially focused on refining open-source models like LLaMA using different parameter-efficient fine-tuning (PEFT) (e.g., Adapter injections and LoRA). Finally, I was able to successfully fine-tune LLaMA for Bengali mental health consultation using QLoRA, which resulted in a more optimized model that can be served on low GPU memory. This work has been pivotal in ensuring equitable access to healthcare technologies across diverse linguistic communities.

ASR System for Patient Symptoms [PPT]

ASR system for understanding medical symptoms spoken by patients in Bengali language. We trained the DeepSpeech model from scratch using audio data collected from consented users using our audio data collection portal. We finetuned the model for a noisy environment, using the 13 domain augmentations provided by DeepSpeech. This model performed poorly when the user says any out-of-vocabulary words. Therefore we finetuned a Whisper (tiny) model specifically the BanglaASR model which was trained using Bangla Mozilla Common Voice Dataset. The model performs with a WER of only 8%. The performance is due to the limited vocabulary of symptoms.

SynthCases Creator and Disease Classifier [PPT]

A recommendation system based on ensemble classifiers for diseases based on patients' symptoms. The classifier is trained on synthetic data generated to reflect real-world demography. The generator takes into account patients' risk factors family history and medical history. The classifier uses a multi-layer pipeline for making predictions where in the first step it predicts the probability of each disease based on the symptoms, then it uses a prevalence look-up table for filtering the most probable diseases based on ethnicity, finally, it makes the prediction using the filtered diseases and patients risk-factors.

Licence Plate Detection in CCTV Frames using YOLOv5 [PPT]

We finetuned the famous YOLOv5 model to detect lcence plates of different vehicles in Bangladesh in the CCTV footage. Colab Notebook

Key Information Extraction (KIE) From NID using Donut [PPT]

We used the data generated by SynthNID to fine-tune the pretrained document transformers model (Donut) for Key Information Extraction (KIE) task. We used a mix of real and synthetic data. With the addition of synthetic data we found signinficant improvement in performance, especially in the Bengali fields.

Projects

Medical Code Classification via Linear Probing of LLM Activations

This project investigates multi-label medical code classification by training linear probes on Large Language Model (LLM) activations. We extract layer-wise attention head activations from medical-domain LLMs and use Ridge regression classifiers to predict relevant medical disciplines from clinical descriptions. The approach enables interpretable analysis of which model components are most informative for medical domain classification tasks.

Exercise Monitoring System

A system leveraging Vision-Language Models (VLMs) to assist users in performing exercises correctly by comparing their execution against reference videos of expert demonstrations. The system uses frame-level visual and motion comparison, integrated with language-based feedback, to generate natural language guidance that helps users improve their form and reduce the risk of injury.

Agrani Voice Banking Chatbot

Bangladesh's pioneering Voice-based AI Chatbot for seamless banking activities, serving hundreds of thousands of real users. Agrani Bank is one of the largest state-owned banks in Bangladesh, with a huge number of customers who have very little access to information. Agrani Voice Banking makes banking services accessible to everyone. It is powered by Bengali ASR and a finetuned NLU engine for natural language-driven fund transfers and inquiries. It can behave dynamically based on the input messages by the user.

Realtime Liveness Check

Analyzing real-time facial movements, blinking and requiring the user to perform specific facial actions during the authentication process of eKYC to ensure the presence of a live person. Developed to be used in mobile devices like smartphones.

Audio Data Collection Portal

Audio data collection portal for large user base. Built using React frontend and Python-Flask Backend. Metadata is stored in PostgreSQL, while object storage is in S3. Complete user authentication and authorization using AWS Cognito. Ability to collect data based on priority or user specifics. Useful for collection of medical recordings by filtering symptoms based on age or gender or audio counts.

AI Service Gateway

A portal for showcasing AI services. Clients can use a demo version of each services. Authentication and authorization is built using Keycloak and Google identity provider. New clients can sign-up using their email and receives a limited credit for using the services.

Don't Drop The Bomb

This microcontroller project was built as a multiplayer game. It was built using the wonderful mechanisms of microcontrollers. The game features two player controlled bars on either side of two connected dot matrices. At its core was a single Atmega32 microprocessor. The controllers were built using MPU-6050 accelerometer & gyro sensors.

Ray Tracing

Ray Tracing is a rendering technique that can produce incredibly realistic lighting effects. It works by tracing the path of light through pixels in an image plane and simulating the effects of its encounters with virtual objects. In this project, I implemented a ray tracer that can render spheres, planes, and triangles with textures and shadows. Phong Lighting Model and Recursive Reflection are employed in this implementation.

Curriculum Vitae