PhD Candidate · Purdue ECE

Aradhana
Mohan Parvathy

I design hardware-software systems that make deep learning inference radically more efficient. My work spans the full stack — from RTL accelerator design to model-level compression — unified by the thesis that approximation is the most underutilized lever in efficient AI.

Fifth-year PhD student in the Integrated Systems Lab, advised by Prof. Anand Raghunathan. Active collaborations with Intel Labs and IBM Research.

Intel Labs · IBM Research · Qualcomm · EnCharge AI · SRC / JUMP

Hardware-Aware Approximate Computing
for Efficient DNN Inference

Making large models run on constrained hardware by co-designing across the model–algorithm–architecture stack.

Keywords
quantization sparsity MoE LLM inference accelerators RTL design SystemVerilog transformers softmax edge AI

My dissertation unifies four contributions under one thesis: approximation, applied at the right abstraction level, unlocks efficiency gains that neither pure hardware nor pure algorithm work achieves alone.

SPARQLe is a W4A8 inference accelerator, co-designed with Intel Labs, that exploits sub-precision activation sparsity. Rather than treating sparsity as a binary gate, SPARQLe decomposes activations into MSB/LSB sub-fields and skips computation on sparse sub-fields — a hardware optimization orthogonal to quantization. The design is implemented end-to-end in SystemVerilog with dual-port register files and a two-phase MAC pipeline.

TokenMix, developed with IBM Research, tackles MoE inference cost at the decode level. It encourages expert reuse across tokens within a decode step, reducing the number of unique expert fetches per step without retraining. Evaluated across Mixtral-8x7B, Qwen3, Llama-4-Scout, and DeepSeek-V2-Lite.

Softprox replaces the expensive exponential-and-normalize Softmax with post-training approximations, targeting the attention bottleneck on edge hardware where Softmax dominates latency.

Seprox uses sequence-based approximation for ultra-low-precision (sub-4-bit) model compression, finding structure in weight distributions that standard uniform quantization misses.

Papers

Selected publications spanning accelerator architecture, model compression, and efficient inference.

Under Review
2026

Efficient Quantized LLM Inference via Hardware-Software Co-Design

A. Mohan Parvathy, S. K. Ghosh, S. Kundu, A. Raha, S. Kundu, D. Mathaikutty, A. Raghunathan

Post-training optimization methods enabling efficient execution of quantized LLMs on resource-constrained hardware. Details withheld due to double-blind review.

quantizationLLM inferenceIntel
Under Review
2026

Hardware-Software Co-Design for Efficient LLM Inference

A. Mohan Parvathy, S. Krithivasan, S. Venkataramani, A. Raghunathan, V. Srinivasan

Novel approximation techniques and accelerator architecture for efficient inference of large language models. Details withheld due to double-blind review.

acceleratorsparsityIBM
Under Review
2026

TokenMix: Efficient MoE Inference via Cross-Token Expert Reuse

A. Mohan Parvathy, et al., A. Raghunathan

Encourages expert reuse across tokens in the MoE decode step, reducing unique expert fetches with negligible accuracy loss. Evaluated on Mixtral-8x7B, Qwen3-30B-A3B, DeepSeek-V2-Lite, and Llama-4-Scout.

MoELLM inferenceIBM Research
DAC '25
2025

3D-CIMlet: A Chiplet Co-Design Framework for Heterogeneous In-Memory Acceleration of Edge LLM Inference

S. Du, L. Zheng, A. Mohan Parvathy, F. Xie, T. Wei, A. Raghunathan, H. Li

Thermal-aware modeling and co-design framework for 2.5D/3D edge-LLM engines using heterogeneous computing-in-memory chiplets, for both inference and continual learning.

chipletCIM3D integrationedge LLM
TCASAI
2025
Accepted

Softprox: A Systematic Methodology to Mitigate Softmax Bottleneck in Emerging Transformer Workloads

A. Mohan Parvathy, S. Roy, S. K. Ghosh, A. Raha, D. Mathaikutty, A. Raghunathan

Three-step post-finetuning approximation of the Softmax operation, validated across multimodal architectures (text, vision, audio transformers) with up to 40.22% inference improvement.

softmaxtransformersmultimodaledge inference
ICCAD '22
2022

Seprox: Sequence-Based Approximations for Compressing Ultra-Low Precision Deep Neural Networks

A. Mohan Parvathy, S. Krithivasan, S. Sen, A. Raghunathan

Exploits sequential structure in weight distributions to enable ultra-low-precision (sub-4-bit) compression with accuracy recovery beyond standard uniform quantization.

quantizationcompressionIBM

Talks

Apr 2026

Purdue Institute of Chips and AI Workshop

Organizing committee member. Workshop on hardware-aware AI systems.

2026

JUMP Community Research Talk

Training-to-inference efficiency narrative: from edge dense LLMs (SPARQLe) to datacenter MoE (TokenMix).

May 2025

TECHCON 2025 — Softprox

Presented Softprox at SRC TECHCON, Atlanta, Georgia.

2025

SRC Annual Sponsor Review

Industry-facing research talk covering the edge-to-datacenter approximate computing arc.

2022

IBM AI Hardware Forum 2022

Presented Seprox: ultra-low-precision DNN compression via sequence-based approximation.

2022

TECHCON 2022 — Seprox

Presented at SRC TECHCON 2022.

Oct 2022

ICCAD 2022 — Seprox

Conference presentation on sequence-based ultra-low-precision DNN compression.

Recognition

Qualcomm Innovation Fellowship

Qualcomm · 2025

Winner — one of 17 teams selected from 266 proposals across U.S. and Canadian universities. Joint work on ML-powered SoC power estimation.

DAC Young Fellow

Design Automation Conference · 2020, 2022

Selected for the Young Fellow program at the premier EDA/design automation conference, twice.

DAAD-WISE Scholarship

German Academic Exchange Service · 2019

Research internship at RWTH Aachen, Department of Cardiovascular Engineering, Helmholtz Institute.

AnitaB.org / Grace Hopper Scholarship

AnitaB.org

Selected among a few hundred recipients for contributions to computing and engineering.

Indian Academy of Sciences Fellowship

Indian Academy of Sciences

Summer research fellowship to conduct research at a premier Indian research institute.

Graduate School Summer Research Grant

Purdue University · 2021

Competitive summer research funding from the Purdue Graduate School.

Experience

Sep–Dec
2025

IBM Research · Research Scientist Intern

Yorktown Heights, NY. Approximate computing for efficient LLM inference. Large-scale LoRA finetuning with DeepSpeed. Compiler techniques for DNN inference efficiency.

Oct–Dec
2024

Intel Corporation · Multimodal AI Research Intern

Proposed novel approximation technique and tensor-core-like accelerator for ultra-low precision LLMs. Up to 25% performance improvement. Research paper submitted.

May–Aug
2023

EnCharge AI · Summer Intern

Santa Clara, CA. DNN workload compression to reduce off-chip memory accesses. Workload mapping onto EnCharge AI's hardware architecture.

Ph.D. Electrical & Computer Engineering
Purdue University · 2020 – 2026 (expected)
Advisor: Prof. Anand Raghunathan
Integrated Systems Lab
Preliminary exam passed, March 2026
B.Tech. (Hons.) Electrical & Electronics Engineering
NIT Tiruchirappalli, India · 2016 – 2020
Graduated in the top 1% of the EEE Department
Overall Coordinator, Amruthavarshini (Carnatic Music Club)
Joint Treasurer, Spider R&D Club

Writing

Notes on research, engineering, and the things I find interesting.

Coming
soon
Why sub-precision sparsity is the next frontier for LLM accelerators
hardware
Coming
soon
MoE inference is memory-bound — here's how to think about it
systems
Coming
soon
The gap between quantization papers and real hardware
opinion
Coming
soon
Notes on navigating the PhD → industry transition
career

Interests

Carnatic Music

Trained vocalist. Former Overall Coordinator of Amruthavarshini, the Carnatic music club at NIT Trichy. Music has been a constant thread through undergrad and grad school.

Teaching & Mentoring

Graduate Teaching Assistant at Purdue. Previously taught mathematics and science to middle school students through Illuminate, an NGO in Tiruchirappalli serving nearby villages.

Cooking & Nutrition

Interested in clean eating and health-conscious cooking — the kind of optimization that doesn't require a GPU.

Community

Coordinated guest lectures for Festember at NIT Trichy. Active in the SRC/JUMP research community and co-organizing the Purdue Institute of Chips and AI workshop.

Latest CV

Full details on publications, experience, and technical skills.

Download Resume (PDF)