PhD Candidate · Purdue ECE

Aradhana
Mohan Parvathy

I design hardware-software systems that make deep learning inference radically more efficient. My work spans the full stack — from RTL accelerator design to model-level compression — unified by the thesis that approximation is the most underutilized lever in efficient AI.

Fifth-year PhD student in the Integrated Systems Lab, advised by Prof. Anand Raghunathan. Active collaborations with Intel Labs and IBM Research.

Scholar LinkedIn Email Resume

Intel Labs · IBM Research · Qualcomm · EnCharge AI · SRC / JUMP

Research

Hardware-Aware Approximate Computing
for Efficient DNN Inference

Making large models run on constrained hardware by co-designing across the model–algorithm–architecture stack.

My dissertation unifies four contributions under one thesis: approximation, applied at the right abstraction level, unlocks efficiency gains that neither pure hardware nor pure algorithm work achieves alone.

SPARQLe is a W4A8 inference accelerator, co-designed with Intel Labs, that exploits sub-precision activation sparsity. Rather than treating sparsity as a binary gate, SPARQLe decomposes activations into MSB/LSB sub-fields and skips computation on sparse sub-fields — a hardware optimization orthogonal to quantization. The design is implemented end-to-end in SystemVerilog with dual-port register files and a two-phase MAC pipeline.

TokenMix, developed with IBM Research, tackles MoE inference cost at the decode level. It encourages expert reuse across tokens within a decode step, reducing the number of unique expert fetches per step without retraining. Evaluated across Mixtral-8x7B, Qwen3, Llama-4-Scout, and DeepSeek-V2-Lite.

Softprox replaces the expensive exponential-and-normalize Softmax with post-training approximations, targeting the attention bottleneck on edge hardware where Softmax dominates latency.

Seprox uses sequence-based approximation for ultra-low-precision (sub-4-bit) model compression, finding structure in weight distributions that standard uniform quantization misses.

Publications

Papers

Selected publications spanning accelerator architecture, model compression, and efficient inference.

Under Review

2026

Efficient Quantized LLM Inference via Hardware-Software Co-Design

A. Mohan Parvathy, S. K. Ghosh, S. Kundu, A. Raha, S. Kundu, D. Mathaikutty, A. Raghunathan

Post-training optimization methods enabling efficient execution of quantized LLMs on resource-constrained hardware. Details withheld due to double-blind review.

quantizationLLM inferenceIntel

Under Review

2026

Hardware-Software Co-Design for Efficient LLM Inference

A. Mohan Parvathy, S. Krithivasan, S. Venkataramani, A. Raghunathan, V. Srinivasan

Novel approximation techniques and accelerator architecture for efficient inference of large language models. Details withheld due to double-blind review.

acceleratorsparsityIBM

Under Review

2026

TokenMix: Efficient MoE Inference via Cross-Token Expert Reuse

A. Mohan Parvathy, et al., A. Raghunathan

Encourages expert reuse across tokens in the MoE decode step, reducing unique expert fetches with negligible accuracy loss. Evaluated on Mixtral-8x7B, Qwen3-30B-A3B, DeepSeek-V2-Lite, and Llama-4-Scout.

MoELLM inferenceIBM Research

DAC '25

2025

3D-CIMlet: A Chiplet Co-Design Framework for Heterogeneous In-Memory Acceleration of Edge LLM Inference

S. Du, L. Zheng, A. Mohan Parvathy, F. Xie, T. Wei, A. Raghunathan, H. Li

Thermal-aware modeling and co-design framework for 2.5D/3D edge-LLM engines using heterogeneous computing-in-memory chiplets, for both inference and continual learning.

chipletCIM3D integrationedge LLM

TCASAI

2025

Accepted

Softprox: A Systematic Methodology to Mitigate Softmax Bottleneck in Emerging Transformer Workloads

A. Mohan Parvathy, S. Roy, S. K. Ghosh, A. Raha, D. Mathaikutty, A. Raghunathan

Three-step post-finetuning approximation of the Softmax operation, validated across multimodal architectures (text, vision, audio transformers) with up to 40.22% inference improvement.

softmaxtransformersmultimodaledge inference

ICCAD '22

2022

Seprox: Sequence-Based Approximations for Compressing Ultra-Low Precision Deep Neural Networks

A. Mohan Parvathy, S. Krithivasan, S. Sen, A. Raghunathan

Exploits sequential structure in weight distributions to enable ultra-low-precision (sub-4-bit) compression with accuracy recovery beyond standard uniform quantization.

quantizationcompressionIBM

Talks & Presentations

Talks

Apr 2026

Purdue Institute of Chips and AI Workshop

Organizing committee member. Workshop on hardware-aware AI systems.

2026

JUMP Community Research Talk

Training-to-inference efficiency narrative: from edge dense LLMs (SPARQLe) to datacenter MoE (TokenMix).

May 2025

TECHCON 2025 — Softprox

Presented Softprox at SRC TECHCON, Atlanta, Georgia.

2025

SRC Annual Sponsor Review

Industry-facing research talk covering the edge-to-datacenter approximate computing arc.

2022

IBM AI Hardware Forum 2022

Presented Seprox: ultra-low-precision DNN compression via sequence-based approximation.

2022

TECHCON 2022 — Seprox

Presented at SRC TECHCON 2022.

Oct 2022

ICCAD 2022 — Seprox

Conference presentation on sequence-based ultra-low-precision DNN compression.

Honors & Awards

Recognition

Qualcomm Innovation Fellowship

Qualcomm · 2025

Winner — one of 17 teams selected from 266 proposals across U.S. and Canadian universities. Joint work on ML-powered SoC power estimation.

DAC Young Fellow

Design Automation Conference · 2020, 2022

Selected for the Young Fellow program at the premier EDA/design automation conference, twice.

DAAD-WISE Scholarship

German Academic Exchange Service · 2019

Research internship at RWTH Aachen, Department of Cardiovascular Engineering, Helmholtz Institute.

AnitaB.org / Grace Hopper Scholarship

AnitaB.org

Selected among a few hundred recipients for contributions to computing and engineering.

Indian Academy of Sciences Fellowship

Indian Academy of Sciences

Summer research fellowship to conduct research at a premier Indian research institute.

Graduate School Summer Research Grant

Purdue University · 2021

Competitive summer research funding from the Purdue Graduate School.

Industry Research

Experience

Sep–Dec
2025

IBM Research · Research Scientist Intern

Yorktown Heights, NY. Approximate computing for efficient LLM inference. Large-scale LoRA finetuning with DeepSpeed. Compiler techniques for DNN inference efficiency.

Oct–Dec
2024

Intel Corporation · Multimodal AI Research Intern

Proposed novel approximation technique and tensor-core-like accelerator for ultra-low precision LLMs. Up to 25% performance improvement. Research paper submitted.

May–Aug
2023

EnCharge AI · Summer Intern

Santa Clara, CA. DNN workload compression to reduce off-chip memory accesses. Workload mapping onto EnCharge AI's hardware architecture.

Education

Ph.D. Electrical & Computer Engineering

Purdue University · 2020 – 2026 (expected)

Advisor: Prof. Anand Raghunathan
Integrated Systems Lab
Preliminary exam passed, March 2026

B.Tech. (Hons.) Electrical & Electronics Engineering

NIT Tiruchirappalli, India · 2016 – 2020

Graduated in the top 1% of the EEE Department
Overall Coordinator, Amruthavarshini (Carnatic Music Club)
Joint Treasurer, Spider R&D Club

Blog

Writing

Notes on research, engineering, and the things I find interesting.

Coming
soon

Why sub-precision sparsity is the next frontier for LLM accelerators

hardware

Coming
soon

MoE inference is memory-bound — here's how to think about it

systems

Coming
soon

The gap between quantization papers and real hardware

opinion

Coming
soon

Notes on navigating the PhD → industry transition

career

Beyond Research

Interests

Carnatic Music

Trained vocalist. Former Overall Coordinator of Amruthavarshini, the Carnatic music club at NIT Trichy. Music has been a constant thread through undergrad and grad school.

Teaching & Mentoring

Graduate Teaching Assistant at Purdue. Previously taught mathematics and science to middle school students through Illuminate, an NGO in Tiruchirappalli serving nearby villages.

Cooking & Nutrition

Interested in clean eating and health-conscious cooking — the kind of optimization that doesn't require a GPU.

Community

Coordinated guest lectures for Festember at NIT Trichy. Active in the SRC/JUMP research community and co-organizing the Purdue Institute of Chips and AI workshop.

Resume

Latest CV

Full details on publications, experience, and technical skills.

Download Resume (PDF)

AradhanaMohan Parvathy

Hardware-Aware Approximate Computingfor Efficient DNN Inference

Papers

Efficient Quantized LLM Inference via Hardware-Software Co-Design

Hardware-Software Co-Design for Efficient LLM Inference

TokenMix: Efficient MoE Inference via Cross-Token Expert Reuse

3D-CIMlet: A Chiplet Co-Design Framework for Heterogeneous In-Memory Acceleration of Edge LLM Inference

Softprox: A Systematic Methodology to Mitigate Softmax Bottleneck in Emerging Transformer Workloads

Seprox: Sequence-Based Approximations for Compressing Ultra-Low Precision Deep Neural Networks

Talks

Purdue Institute of Chips and AI Workshop

JUMP Community Research Talk

TECHCON 2025 — Softprox

SRC Annual Sponsor Review

IBM AI Hardware Forum 2022

TECHCON 2022 — Seprox

ICCAD 2022 — Seprox

Recognition

Qualcomm Innovation Fellowship

DAC Young Fellow

DAAD-WISE Scholarship

AnitaB.org / Grace Hopper Scholarship

Indian Academy of Sciences Fellowship

Graduate School Summer Research Grant

Experience

IBM Research · Research Scientist Intern

Intel Corporation · Multimodal AI Research Intern

EnCharge AI · Summer Intern

Writing

Interests

Carnatic Music

Teaching & Mentoring

Cooking & Nutrition

Community

Latest CV

Aradhana
Mohan Parvathy

Hardware-Aware Approximate Computing
for Efficient DNN Inference