Shubham Pandey

Backend Developer

3+ years of full time experience working as a backend developer building and optimizing large scale client facing applications from ground up!

Brief Summary

  • Experienced in building large scale and performant applications
  • Strong professional and academic background in SQL and NoSQL Databases
  • Skilled in Distributed Systems, System Design, and Object Oriented Programming
  • Extensive experience in collaborating with teams and mentoring junior developers, with a track record of teaching important courses like Data Structures and Information Systems

For a quick overview, here is my resume Resume icon

You can still scroll down to know a whole lot more about me!

Always happy to connect :)

Education

Purdue University, West Lafayette (USA)

Masters of Science, Computer Science

GPA: 4/4

Teaching Assistant for CS 348 (Information Systems)

Teaching Assistant for ECE 368 (Data Structures)

Aug 2021 - May 2023


National Institute of Technology, Allahabad (India)

Bachelor of Technology, Computer Science and Engineering

GPA: 9.07/10

Group leader for Junior and Senior year projects

Machine Learning Club Coordinator

July 2014 - May 2018

Experience

Full Time (3+ years)

Goldman Sachs

Analyst

  • Contributed to development of large-scale algorithmic trading application, demonstrating expertise in writing garbage collector-free code for optimized memory management in Java.
  • Handled extensive Time Series Data stored in KDB+ and efficiently retrieved it through optimized Q queries, enabling the generation of analytical insights for both historical and real-time orders.
  • Implemented indexing strategies for key columns in target tables, leading to enhanced query performance and the ability for queries to execute in parallel.
  • Leveraged visualization libraries such as Bokeh and the powerful Tornado server in Python to present analytical insights through graphs and tables.
  • Optimizations led to reduction in response time from 60+ seconds to less than 10 seconds.
  • Cross-region deployment of application to reach users in regions like America, Europe and Asia.

Mar 2020 - Aug 2021

Visa

Software Engineer

  • Developed high-performance application by proficiently writing RESTful APIs in Java using Spring framework to support real-time analytics.
  • Performed ETL processes for millions of transactional data stored in HDFS, demonstrating the ability to handle large volumes of data and load pre-processed data in MySQL database.
  • Designed database schema and worked on optimal normalization of tables to balance computation and space overheads.
  • Used data structures like Segment Trees to efficiently compute important business metrics.
  • Optimizations resulted in delivering SLA guarantee for generating insights from millions of records in less than 5s.

July 2019 - Sept. 2019



Internship/Teaching (~1.5 years)

Dgraph Labs Inc.

Intern

  • Analysed the performance of underlying KeyValue store and other LSM based implementations. Performed YCSB benchmarks based on changes in size of values and frequency of 'Set' and 'Get' operations.
  • Added changes to the current implementation for faster query performance and better Disk and Memory usage.

May 2022 - Aug 2022

Purdue University

Graduate Teaching Assistant

Information Systems

  • Organized study sessions and led group of 40 students in their database projects.
  • Topics Covered: Relational Databases, SQL Queries, Transactions, MongoDB and Neo4j

Data Structures

  • Created and evaluated programming assignments
  • Topics Covered: Sorting Algorithms, Linked Lists, Stacks, Queues, Trees and Graphs

Sept 2021 - May 2023

Visa

Intern

  • Successfully filed a Trade Secret on a groundbreaking project that involved processing a significant volume of textual data and developing a score-based model for generating valuable suggestions.
  • Applied Textual Mining and Processing to identify complex address structures within the documents. This involved the use of Trie data structure that facilitated fast lookup of candidate strings.

Apr 2017 - Jul 2017

Projects

Professional

Client Insights


  • Extraction, Transformation and Loading of transactional data from HDFS to MySQL DB and generating insights based on business metrics.
  • Construction of binary indexed trees to assist range query operations on data and optimal normalization of tables in DB to balance data storage and retrieval complexities.
RESTful API MySQL Hadoop Java Optimization

Order Analytics


  • Render analytics related to real time and historical trading orders in form of graphs and tables.
  • Read huge amount of Time Series Data from KDB+ using optimized Q queries. Indexing the right columns and executing queries in parallel to bring down the response time.
Python Tornado KDB+ Q Analytics Visualization

Distinct Count Approximation


  • Approximating the distinct no of entities in a given set with high precision using Genetic Algorithm written in Python.
  • Optimization and precision improvement done using Segment Tree for complex aggregation operations across different sets.
Genetic Algorithms Segment Trees Python Maths and Heuristics Optimization

Merchant Location Compliance


  • Creation of a parser to identify address structures in documents obtained from the web and map it to most relevant country.
  • Used Trie data structure to facilitate fast lookup and storage of all the candidate address strings.
Web Crawling Textual Mining and Processing Python Trie

Algorithmic Trading


  • Developing logic for fast execution of orders in accord with the constraints set by the client.
  • Writing Garbage Collector free code to impute close to zero Garbage Collection overheads.
Java Garbage Collection Free Code Algorithms

KeyValue Store Optimization


  • Analysing the performance of LSM based KeyValue stores written in Go.
  • Perform benchmarks based on changes in size of values and frequency of 'Set' and 'Get' operations. Adding changes to the current implementation for faster query performance and better Disk and Memory usage.
Go KV Stores Benchmarking

Integer Compression


  • Explored the usage scenarios of integer compression where some amount of loss can be tolerated and the extent of loss can be parameterized.
  • Used bucketization and frequency based approach by exploiting homogeneous groups of data.
Compression Databases Maths

Academic

Parallel Page Rank


  • Implemented and compared computational methods across varied sparse graphs and computational environments.
  • Used compressed sparse row and coordinate format of representation for sparse matrix along with adjacency matrix.
C++ Multi-Threading Page Rank

Economic Sentiment Prediction


  • Identified and implemented efficient techniques to summarize relevant news data using BERT
  • Used LSTM architecture on summarized news vectors across months and exploited demographic information to accurately predict economic sentiment scores
Pytorch NLP BERT LSTM Python

Study of HTAP systems and LSM based storage techniques


  • Conducted study of state-of-the-art HTAP systems like SAP HANA, TiDB, SingleStore, etc and identified the key techniques used in such systems along with their tradeoffs
  • Surveyed a range of LSM-based storage techniques, emphasizing their read and write trade-offs and exploring the difficulties encountered in modern systems leveraging such approaches
HTAP Databases LSM Trees

Data Serialization


  • Explored several serialization techniques like Protobuf and Avro and the usecases where such libraries perform best.
  • Analyzed the serialization and packet transfer latencies in distributed systems and calculated trade-offs of applying compression in a geo-distributed setup.
Serialization Protobuf Distributed Systems

Recommender Systems


  • Rating Prediction for movies using User Based Collaborative Filtering.
  • Used single and multi criteria approach and filter selection using bit masking.
Machine Learning Python Linear Regression

Word Embedding


  • Involved conversion of word to vector using skip-gram with negative sampling and GloVe model.
  • Explored variations in co-occurrence matrix computation using TF-IDF approach and comparing results by plotting using matplotlib after dimensionality reduction.
NLP Heuristics Python

Skills

Technical Skills


  • Data Engineering
  • System Design
  • Data Structures & Algorithms
  • Object Oriented Programming & Design Patterns
  • Machine Learning
  • Data Science

Programming Languages

Java Python C C++ Golang JavaScript

Database Technologies

MySQL KDB+ Hadoop MongoDB Redis Neo4j KV Stores

Tools

AWS GCP Git SVN JUnit Maven
Intellij VS Code Eclipse Pycharm

Libraries & Frameworks

Spring Pandas NumPy Pytorch Flask

Operating Systems

Linux Mac OS Windows



Leadership Skills


  • ML Club Coordinator: Conducted Machine Learning classes during undergraduate study. Taught topics ranging from basic regression models to neural networks.
  • Intern Mentor: Guided interns at Goldman Sachs and Visa in their projects and helped them with technical doubts.
  • Project Group Leader: Led the group for ML based projects during junior and senior year.
  • Academics Captain: Looked after academic activties and represented school in several competitions.