About Me
I’m a Senior Principal Engineer at Eli Lilly, where I lead data engineering for scientific platforms that power drug discovery at scale. My approach to data engineering is less about moving data with fancy tools and more about delivering it in formats that are generic enough to last and shaped precisely for how people actually use it — a philosophy I’ve refined working closely with scientists, analysts, and ML teams.
I hold a Master’s in Computer Science from Purdue, with specializations in Data Science and Software Engineering. My interests span data engineering, Data Science, Machine Learning, NLP, and their applications across science.
Previously I was at Deloitte Consulting as a Data Engineer, and interned at Galois Inc. as a Research Software Engineer.
Experience
Eli Lilly is a US-based pharmaceutical giant recently known for inventing an antibody therapeutic for COVID-19.
I led the design and delivery of a scientific data platform that ingests and harmonizes large-scale experimental data, powering drug discovery decisions for 3,000+ scientists across 13 departments.
Pipelines & scale: Built and scaled high-volume pipelines processing 12M+ records across 40K+ experiments, hitting sub-15-minute end-to-end refresh SLAs. Reduced full rebuild times from days to hours, and incremental refreshes from over an hour to minutes.
Data modeling: Led multi-parameter result modeling initiatives that eliminated legacy constraints and achieved a 300x reduction in manual effort for scientists. What excited me most was having to pick up large molecule domain knowledge — understanding the science well enough to engineer data that was actually useful, not just technically correct.
Consumer-centric products: Partnered with scientists, analysts, and product stakeholders to shift from ingestion-focused pipelines to consumer-centric data products, improving data usability and trust across the platform.
AI/ML & search: Contributed to a Cortex PubMed ingestion pipeline (1.5M+ records), built metadata APIs for Kernel-Lilly scientific discovery, and developed a federated deep search POC combining NLP + ElasticSearch over 1,100+ clinical trials in collaboration with NVIDIA & Google.
Cloud & reliability: Organized AWS workshops and drove RStudio modernization saving $150K/year. Embedded Great Expectations for data quality and patched a critical production vulnerability in the Marketplace backend.
Galois works in close collaboration with DARPA to secure USA's cyber-physical infrastructure.
As a Research Intern, I was responsible for jump-starting the MuseML project.
Deloitte is one of the world's largest technical consultancies and professional services networks.
As a technical consultant and a data engineer, my responsibilities included analyzing clients’ business requirements, designing data warehouse schemas (Data Vault, Snowflake etc.), developing ETL frameworks, and generating BI reports for finance, pricing and rating sectors.I worked mostly on Financial Service Insdustry(FSI) domain building ETL pipelines for clients like Anthem INc., State Auto INc.,
Projects
MuseML
https://muse.devUsing AI to help programmers write better code
This is a research project I started during my internship at Galois Inc., and continued at Purdue. The project focuses on analyzing the quality of source code in a software project through a combination of techniques from Software Engineering, Machine Learning, and Natural Language Processing. In particular, I developed a novel classification algorithm that combines ML-based classifiers (e.g., Naive Bayes, SVM, & Random Forests) with topic modeling techniques from NLP (e.g., LDA) to automatically triage the bug reports from static analysis tools (e.g., FBInfer) into true and false positives. MuseML therefore helps developers accurately gauge the quality of source code in a software project, while also helping them quickly improve the code quality by prioritizing bug fixes.
Contractual obligations prevent me from disclosing the source code and reports from this project. However, I would be happy to give a presentation and provide references.
A compiler and an interpreter for a C-like programming language.
For a course project at Purdue, I built a fully-functional interpreter and a compiler for a C-like programming language in C. I implemented all phases of an industry-standard compiler, including lexical analysis & parsing (via Lex and YaCC), type checking, dataflow analysis to detect uninitialized and unused variables, precise error localization, register allocation, and code generation.
Using Fuzz Testing to Improve the Accuracy of Static Analysis on C Programs
Static analysis is the technique of analyzing source code of programs to find bugs. Industry-standard static analysis tools, such as Facebook’s Infer, generate many false positives, i.e., they report bugs that are not real bugs. In this research project, I explored ways to increase the effectiveness of Infer by combining its static analysis technique with a automated software testing technique called Fuzz Testing. The idea is to automatically triage the bug reports issued by Infer based on a Fuzz Tester’s (e.g., AFL’s) ability to reproduce the bug. The results form this project are mixed.
Machine Learning and Natural Language Processing to Analyze News-making Events
Analyzing news data is a crucial task for providing more organized and easy access to news articles, and also to make predictions about a future event (e.g., an election) based on how the discourse is evolving. In this project, we performed NLP on the source data, trained deep learning models like CNN, LSTM and ensemble models like Random Forests to accurately categorize news articles. We also performed sentiment analysis on the news dataset to visualize and predict how the public sentiment around an event is evolving. A report on this project can be found here.
Using Deep Learning to Classify Audible Sounds in Urban Areas
In this project I worked with a large audio dataset to classify urban sounds into a number of different categories e.g., the sound of construction work, vehicle horns, street music, gun-shots, etc. I designed multiple classification algorithms based on SVM, Logistic Regression, and 1D Convolutional Neural Networks, and compared their performance. On 8732 labeled data points, SVM-based classification performed with an 81% accuracy, followed by CNN with 75%. I have also demonstrated that carefully engineered audio features, such as MEL-Frequency Cepstrum Coefficient, Spectogram etc., give 8 times better performance than trivially selected baseline features. A report on this project can be found here.
Learning the patterns and predicting the outcomes of speed dating
Dating preferences provide insights into the complex psychology of an individual. In this project I analyzed the data from a speed dating application using a combination of data mining techniques to determine the features that act as best predictors of a date’s success. I then compared several classification algorithms, including Naive Bayes, Decision Trees, SVM, Logistic Regression, and Random Forests, in terms of their accuracy at predicting the outcome of a speed date given the same set of features.
Education
Purdue University
Master of Science
January 2018 - May 2020
Purdue runs one of the world's best graduate programs in Computer Science (http://csrankings.org).
Current GPA is 3.68/4.0. Relevant coursework includes Data Mining (CS573), Statistical Machine Learning (CS578), Natural Language Processing (CS577), Compilers and Programming Systems (CS502), Computer Networks (CS523), and Introduction to Simulation and Modeling (CS543). I was supported by the Department of Computer Science through a Teaching Assistantship (TA). I served as the Head TA for the undergraduate Software Engineering course (CS407). References will be provided on request.
Osmania University, India.
Bachelor of Engineering
August 2009 - May 2013
Osmania University is a public state university in India
I did Bachelor of Engineering (BE) at Chaitanya Bharati Institute of Technology (CBIT) under Osmania University, Hyderabad, India. My major study was in Electrical and Electronics Engineering.
A Little More About Me
- I was one of the three graduate students to have been awarded a travel scholarship by Purdue CS to attend Grace Hopper Conference for Women in Computing (2019).
- I have Coursera certifications in Web Development and Python Programming.
- I organized a MuseML Hackathon at Galois.
- I was a founding member of the StateAuto DevOps team at Deloitte.