Aradhana
Mohan Parvathy
I design hardware-software systems that make deep learning inference radically more efficient. My work spans the full stack — from RTL accelerator design to model-level compression — unified by the thesis that approximation is the most underutilized lever in efficient AI.
Fifth-year PhD student in the Integrated Systems Lab, advised by Prof. Anand Raghunathan. Active collaborations with Intel Labs and IBM Research.
Research
Hardware-Aware Approximate Computing
for Efficient DNN Inference
Making large models run on constrained hardware by co-designing across the model–algorithm–architecture stack.
My dissertation unifies four contributions under one thesis: approximation, applied at the right abstraction level, unlocks efficiency gains that neither pure hardware nor pure algorithm work achieves alone.
SPARQLe is a W4A8 inference accelerator, co-designed with Intel Labs, that exploits sub-precision activation sparsity. Rather than treating sparsity as a binary gate, SPARQLe decomposes activations into MSB/LSB sub-fields and skips computation on sparse sub-fields — a hardware optimization orthogonal to quantization. The design is implemented end-to-end in SystemVerilog with dual-port register files and a two-phase MAC pipeline.
TokenMix, developed with IBM Research, tackles MoE inference cost at the decode level. It encourages expert reuse across tokens within a decode step, reducing the number of unique expert fetches per step without retraining. Evaluated across Mixtral-8x7B, Qwen3, Llama-4-Scout, and DeepSeek-V2-Lite.
Softprox replaces the expensive exponential-and-normalize Softmax with post-training approximations, targeting the attention bottleneck on edge hardware where Softmax dominates latency.
Seprox uses sequence-based approximation for ultra-low-precision (sub-4-bit) model compression, finding structure in weight distributions that standard uniform quantization misses.
Publications
Papers
Selected publications spanning accelerator architecture, model compression, and efficient inference.
Efficient Quantized LLM Inference via Hardware-Software Co-Design
Post-training optimization methods enabling efficient execution of quantized LLMs on resource-constrained hardware. Details withheld due to double-blind review.
Hardware-Software Co-Design for Efficient LLM Inference
Novel approximation techniques and accelerator architecture for efficient inference of large language models. Details withheld due to double-blind review.
TokenMix: Efficient MoE Inference via Cross-Token Expert Reuse
Encourages expert reuse across tokens in the MoE decode step, reducing unique expert fetches with negligible accuracy loss. Evaluated on Mixtral-8x7B, Qwen3-30B-A3B, DeepSeek-V2-Lite, and Llama-4-Scout.
3D-CIMlet: A Chiplet Co-Design Framework for Heterogeneous In-Memory Acceleration of Edge LLM Inference
Thermal-aware modeling and co-design framework for 2.5D/3D edge-LLM engines using heterogeneous computing-in-memory chiplets, for both inference and continual learning.
Softprox: A Systematic Methodology to Mitigate Softmax Bottleneck in Emerging Transformer Workloads
Three-step post-finetuning approximation of the Softmax operation, validated across multimodal architectures (text, vision, audio transformers) with up to 40.22% inference improvement.
Seprox: Sequence-Based Approximations for Compressing Ultra-Low Precision Deep Neural Networks
Exploits sequential structure in weight distributions to enable ultra-low-precision (sub-4-bit) compression with accuracy recovery beyond standard uniform quantization.
Talks & Presentations
Talks
Purdue Institute of Chips and AI Workshop
Organizing committee member. Workshop on hardware-aware AI systems.
JUMP Community Research Talk
Training-to-inference efficiency narrative: from edge dense LLMs (SPARQLe) to datacenter MoE (TokenMix).
TECHCON 2025 — Softprox
Presented Softprox at SRC TECHCON, Atlanta, Georgia.
SRC Annual Sponsor Review
Industry-facing research talk covering the edge-to-datacenter approximate computing arc.
IBM AI Hardware Forum 2022
Presented Seprox: ultra-low-precision DNN compression via sequence-based approximation.
TECHCON 2022 — Seprox
Presented at SRC TECHCON 2022.
ICCAD 2022 — Seprox
Conference presentation on sequence-based ultra-low-precision DNN compression.
Honors & Awards
Recognition
Qualcomm Innovation Fellowship
Winner — one of 17 teams selected from 266 proposals across U.S. and Canadian universities. Joint work on ML-powered SoC power estimation.
DAC Young Fellow
Selected for the Young Fellow program at the premier EDA/design automation conference, twice.
DAAD-WISE Scholarship
Research internship at RWTH Aachen, Department of Cardiovascular Engineering, Helmholtz Institute.
AnitaB.org / Grace Hopper Scholarship
Selected among a few hundred recipients for contributions to computing and engineering.
Indian Academy of Sciences Fellowship
Summer research fellowship to conduct research at a premier Indian research institute.
Graduate School Summer Research Grant
Competitive summer research funding from the Purdue Graduate School.
Industry Research
Experience
2025
IBM Research · Research Scientist Intern
Yorktown Heights, NY. Approximate computing for efficient LLM inference. Large-scale LoRA finetuning with DeepSpeed. Compiler techniques for DNN inference efficiency.
2024
Intel Corporation · Multimodal AI Research Intern
Proposed novel approximation technique and tensor-core-like accelerator for ultra-low precision LLMs. Up to 25% performance improvement. Research paper submitted.
2023
EnCharge AI · Summer Intern
Santa Clara, CA. DNN workload compression to reduce off-chip memory accesses. Workload mapping onto EnCharge AI's hardware architecture.
Education
Integrated Systems Lab
Preliminary exam passed, March 2026
Overall Coordinator, Amruthavarshini (Carnatic Music Club)
Joint Treasurer, Spider R&D Club
Blog
Writing
Notes on research, engineering, and the things I find interesting.
soon
soon
soon
soon
Beyond Research
Interests
Carnatic Music
Trained vocalist. Former Overall Coordinator of Amruthavarshini, the Carnatic music club at NIT Trichy. Music has been a constant thread through undergrad and grad school.
Teaching & Mentoring
Graduate Teaching Assistant at Purdue. Previously taught mathematics and science to middle school students through Illuminate, an NGO in Tiruchirappalli serving nearby villages.
Cooking & Nutrition
Interested in clean eating and health-conscious cooking — the kind of optimization that doesn't require a GPU.
Community
Coordinated guest lectures for Festember at NIT Trichy. Active in the SRC/JUMP research community and co-organizing the Purdue Institute of Chips and AI workshop.
Resume
Latest CV
Full details on publications, experience, and technical skills.
Download Resume (PDF)