Amey Agrawal

I am a PhD student at Georgia Tech where I am advised by Prof. Alexey Tumanov. I have been working as a research intern at Microsoft Research with Dr. Ramachandran Ramjee’s team since summer’23. My primary area of interest is systems for machine learning.

Previously, I was a research engineer at Microsoft Research, where I worked in Dr. Muthian Sivathanu’s team on low-level systems for deep learning infrastructure. Before that, I spent a couple of years working at Qubole, a big data platform start-up. I did my bachelor’s in Computer Science from BITS Pilani, India in 2018. For more details, refer to my resume or drop me an email.

Publications

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, and Ramachandran Ramjee
Preprint: arXiv:2403.02310 (2024) [pdf]

Vidur: A Large Scale Simulation Framework For LLM Inference
Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee, and Alexey Tumanov
7th Annual Conference on Machine Learning Systems (MLSys’24), Santa Clara

Sarathi: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani and Ramachandran Ramjee
Preprint: arXiv:2308.16369 (2023) [pdf]

DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov
Preprint: arXiv:2306.11800 (2023) [pdf]

Singularity: Planet-Scale, Preemptible and Elastic Scheduling of AI Workloads
Singularity Team, Microsoft
Preprint: arXiv:2202.07848 (2022) [pdf]

Learning Digital Circuits: A Journey Through Weight Invariant Self-Pruning Neural Networks
Amey Agrawal, and Rohit Karlupiya
Proceedings of New in ML Workshop, NeurIPS, 2019, Vancouver Proceedings of Sparsity in Neural Networks Workshop, 2021, Virtual [pdf]
[code]

Delog: A Privacy Preserving Log Filtering Framework for Online Compute Platforms
Amey Agrawal, Abhishek Dixit, Namrata Shettar, Darshil Kapadia, Rohit Karlupia, Vikram Agrawal, and Rajat Gupta
Proceedings of IEEE International Conference on Big Data, 2019, Los Angeles
[pdf]

Logan: A Distributed Online Log Parser
Amey Agrawal, Rajat Gupta, and Rohit Karlupiya
Proceedings of IEEE International Conference on Data Engineering (ICDE), 2019, Macau
[pdf] [blog]

Select Projects

Learning Efficient Job Placement Policy for ETL jobs on Big Data Platforms
Mentors: Joydeep Sen Sarma, Rohit Karlupia
A learnt scheduling algorithm that leverages recurrent nature of ETL worloads to minimize operational cost by optimal job placement.

Callisto: Bringing Jupyter notebooks to classroom
Advisor: Prof. Surekha Bhanot
A cross-platform desktop application to host and grade assignments designed in Jupyter notebook. The system strives to lower the barrier to entry in the scientific Python ecosystem for newcomers by providing a one-click setup of development environment and Google Colab like interface for hosted assignments. This work was later presented at PyCon India, 2020. [blog] [code] [demo]

Deep Reinforcement Learning for Autonomous Warehouse Robots
Advisor: Prof. Surekha Bhanot
A framework to create Q-learning agents for autonomous navigation tasks in warehouses. The agents are pre-trained in a custom simulation environment built on top of V-REP, a popular robotics simulation package. [code]

Disentanglement Learning for Iris Image Indexing
Advisor: Prof. Kamlesh Tiwari
An autoencoder architecture to learn representations of normalized Iris images that are robust to geometric variations which occur in real-world Iris samples. [blog] [code]

Automated news-in-shorts
Advisor: Prof. Poonam Goyal
A news aggregation system that collects the latest posts from RSS feeds of multiple news agencies to automatically generate abstracts for top stories. Trending topics on Twitter are mapped to news articles and generate extractive text summaries using a natural language processing pipeline. [code]