Shariar Kabir

AI Research Engineer · Dhaka, Bangladesh · +880-1832055656 · shariar1405076@gmail.com

I am working full-time as an AI Research Engineer at Celloscope Ltd. and part-time as a Data Scientist at MedAI Ltd. I completed my BSc in Computer Science and Engineering, majoring in artificial intelligence (AI), from Bangladesh University of Engineering anf Technology (BUET). I have over 5 years of experience developing AI-based user-facing services. Additionally, I have extensive experience working with large-scale datasets and drawing insights from raw multimodal data using different statistical approaches. I am interested in learning and leveraging advanced machine learning (ML) techniques to solve real-world problems and refining modern ML models to be more inclusive and effective for diverse populations. Throughout my professional career, my main focus has been to make the interaction between humans and computer applications easier and more intuitive using various ML and NLP tools and algorithms. My detailed CV can be found Here

News


Experience

AI Research Engineer

Celloscope Limited

Building AI-based solutions in local languages for intelligent broad user-centric applications that can save users' time and reduce complexity in daily banking-related tasks. Major contributions include:

  • Bangla voice-based AI chatbot for banking applications.
  • OCR based document key information extraction for eKYC.
  • OCR free document key information extraction for eKYC.
  • Service showcase and usage monitoring systems.
  • Service users and usage traking system.
September 2020 - Present

NLP and Data Scientist

MedAI Pvt. Limited (Part Time)

Extracting data-driven insights from medical data of Bangladesh and developing a smart healthcare platform that uses AI to deliver personalised healthcare services in local languages. Major contributions are:

  • Empowering Mental Health Support for Bengali Speakers through a Conversational AI chatbot.
  • Synthetic patient generator reflecting local demography.
  • Classifier for clustering patients disease using symptoms and other demography.
  • Training and serving of voice-based patients' symptoms collector.
  • Design and developement of audio data collection portal.
August 2021 - Present

DevOps

GRP, ICT Division

Automating the deployment process and monitoring of numerous microservices. Major contributions include:

  • Automation scripts for deploying web apps and micro-services in Docker
  • Gateway configuration using NGINX reverse proxy
  • Document generation scripts from Google Sheets
May 2019 - August 2020

Education

Bangladesh University of Engineering and Technology (BUET)

Master of Science
Computer Science and Engineering

GPA (Predicted): 3.75

Thesis: Dynamic Resource Allocation for Workloads in Serverless Architecture using Collaborative Filtering. Under the supervision of Professor Muhammad Abdullah Adnan.

Coursework: Bioinformatics Algorithms · Distributed Computing Systems · Data Mining · Data Management in the Cloud · Advanced Database Systems · Advanced Artificial Intelligence

April 2019 - Present

Bangladesh University of Engineering and Technology (BUET)

Bachelor of Science
Computer Science and Engineering

GPA: 3.53

Major: Artificial Intelligence

Thesis: Active Learning on Big Data; A research on how we can apply active learning on big data in a distributed cloud computing system. Under the supervision of Professor Muhammad Abdullah Adnan.

Coursework: Machine Learning · Pattern Recognition · Computer Graphics · Artificial Intelligence · Digital Image Processing · Data Structures · Database · Operating Systems · Software Development · Computer Architecture · Microprocessors and Microcontrollers · Computer Networks · Concrete Mathematics · Discrete Mathematics · Numerical Methods · Software Engineering and Information System Design · Compiler · Data Communication · Digital Logic Design · Structured Programming Language · Object Oriented Programming Language · Theory of Computation

February 2015 - April 2019


Interests

I'm interested in working with human–computer interaction (HCI), machine learning, data science, natural language processing, and computer vision. Throughout my academic and professional journey, I have cultivated a deep fascination with the inner workings of large language models (LLMs) and their potential applications in solving complex systems. In recent years, many large transformer-based models have shown promising performance across various application domains. However, their usability remains somewhat limited over many important use cases due to the high training cost, lack of interpretability, and inherent bias of the pre-trained models. To this end, my main focus has been to develop strategies to improve the performance of different SOTA models over specific scenarios.

Outside of my professional pursuits, I am an avid reader with a keen interest in classical thrillers, and philosophical novels. Music serves as both inspiration and solace for me, encompassing the timeless tunes of classical rock and the soul-stirring melodies of Bengali classical music.

Publications

SynthNID: Synthetic Data to Improve End-to-end Bangla Document Key Information Extraction. [PDF]
Syed Monsur, Shariar Kabir, Sakib Chowdhury,
In Proceedings of the First Workshop on Bangla Language Processing (BLP-2023), pages 117–123, Singapore. Association for Computational Linguistics.

In this paper, we have introduced SynthNID, a system to generate domain-specific document image data for training OCR-less end-to-end Key Information Extraction systems. We show the generated data improves the performance of the extraction model on real datasets and the system is easily extendable to generate other types of scanned documents for a wide range of document understanding tasks.

Automatic Speech Recognition for Biomedical Data in Bengali Language. [PDF]
Shariar Kabir, Nazmun Nahar, Shyamasree Saha, Mamunur Rashid,
arXiv preprint arXiv:2406.12931 (2024)

This paper presents the development of a prototype Automatic Speech Recognition (ASR) system specifically designed for Bengali biomedical data. Recent advancements in Bengali ASR are encouraging, but a lack of domain-specific data limits the creation of practical healthcare ASR models. This project bridges this gap by developing an ASR system tailored for Bengali medical terms like symptoms, severity levels, and diseases, encompassing two major dialects: Bengali and Sylheti.


Awards & Achievements

Global Health Equity Challenge Award

MIT Solve

AmarDoctor by MedAI has been selected as one of the six solvers out of 2200+ participants worldwide for its innovative approach to accessible healthcare.

2024


Research

NER From Chatbot User Messages

Extraction of named entities (NE) like benefeciary names, transfer amount, accound type, account no. etc. Instead of applying a single language model like BERT for all of these We employed a recipe of different approaches including BERT, RegEx and lookup tables. This was done to minimize training/finetuning tasks which is challenging for low-resource language like Bengali. We used BERT based model for beneficiary name extraction, RegEx for transfer amount and account no. extraction and lookup tables for account type extraction.

Finetuning LLMs for Mental Health Counsel

Recognizing the inherent bias of most LLMs towards European languages and ethnicities and the low resources of structured Bengali data, I initially focused on refining open-source models like LLaMA using different parameter-efficient fine-tuning (PEFT) (e.g., Adapter injections and LoRA). Finally, I was able to successfully fine-tune LLaMA for Bengali mental health consultation using QLoRA, which resulted in a more optimized model that can be served on low GPU memory. This work has been pivotal in ensuring equitable access to healthcare technologies across diverse linguistic communities.

ASR System for Patient Symptoms [PPT]

ASR system for understanding medical symptoms spoken by patients in Bengali language. We trained the DeepSpeech model from scratch using audio data collected from consented users using our audio data collection portal. We finetuned the model for a noisy environment, using the 13 domain augmentations provided by DeepSpeech. This model performed poorly when the user says any out-of-vocabulary words. Therefore we finetuned a Whisper (tiny) model specifically the BanglaASR model which was trained using Bangla Mozilla Common Voice Dataset. The model performs with a WER of only 8%. The performance is due to the limited vocabulary of symptoms.

SynthCases Creator and Disease Classifier [PPT]

A recommendation system based on ensemble classifiers for diseases based on patients' symptoms. The classifier is trained on synthetic data generated to reflect real-world demography. The generator takes into account patients' risk factors family history and medical history. The classifier uses a multi-layer pipeline for making predictions where in the first step it predicts the probability of each disease based on the symptoms, then it uses a prevalence look-up table for filtering the most probable diseases based on ethnicity, finally, it makes the prediction using the filtered diseases and patients risk-factors.

Licence Plate Detection in CCTV Frames using YOLOv5 [PPT]

We finetuned the famous YOLOv5 model to detect lcence plates of different vehicles in Bangladesh in the CCTV footage. Colab Notebook

Key Information Extraction (KIE) From NID using Donut [PPT]

We used the data generated by SynthNID to fine-tune the pretrained document transformers model (Donut) for Key Information Extraction (KIE) task. We used a mix of real and synthetic data. With the addition of synthetic data we found signinficant improvement in performance, especially in the Bengali fields.

Projects

Agrani Voice Banking Chatbot

Bangladesh's pioneering Voice-based AI Chatbot for seamless banking activities, serving hundreds of thousands of real users. Agrani Bank is one of the largest state-owned banks in Bangladesh, with a huge number of customers who have very little access to information. Agrani Voice Banking makes banking services accessible to everyone. It is powered by Bengali ASR and a finetuned NLU engine for natural language-driven fund transfers and inquiries. It can behave dynamically based on the input messages by the user.

Realtime Liveness Check

Analyzing real-time facial movements, blinking and requiring the user to perform specific facial actions during the authentication process of eKYC to ensure the presence of a live person. Developed to be used in mobile devices like smartphones.

Audio Data Collection Portal

Audio data collection portal for large user base. Built using React frontend and Python-Flask Backend. Metadata is stored in PostgreSQL, while object storage is in S3. Complete user authentication and authorization using AWS Cognito. Ability to collect data based on priority or user specifics. Useful for collection of medical recordings by filtering symptoms based on age or gender or audio counts.

AI Service Gateway

A portal for showcasing AI services. Clients can use a demo version of each services. Authentication and authorization is built using Keycloak and Google identity provider. New clients can sign-up using their email and receives a limited credit for using the services.

Don't Drop The Bomb

This microcontroller project was built as a multiplayer game. It was built using the wonderful mechanisms of microcontrollers. The game features two player controlled bars on either side of two connected dot matrices. At its core was a single Atmega32 microprocessor. The controllers were built using MPU-6050 accelerometer & gyro sensors.

Curriculum Vitae