Gunj Desai
Bis 2023, Solutions Architect / Principal Engineer, Doubtnut Inc
Über mich
Someone who loves building data-centric products. I am well-versed in using Kafka and Spark for building real-time pipelines that can handle high ingestion rates. Notably, i've have built ACID capabilities on S3 using a combination of Debezium, Kafka, and Apache HUDI. This has enabled me to support the ingestion of 2k messages/sec for 100+ tables in parallel with a latency of 45 seconds max. I've also demonstrated their ability to work with large-scale data by parsing WEB3 EVM events using Pyspark in near real-time with backfill ability built-in. The Spark job can parse all 3.9TB data of Ethereum Transactions in under 5 hours when run in backfill mode. I've also built a streaming platform that can ingest more than a billion events per week, with an average latency of 3 seconds per message while validating individual packets. I've also created an open-source Kafka Connect monitoring tool that continuously checks on Kafka Connect connectors and notifies by mail in case of any anomaly.
Werdegang
Berufserfahrung von Gunj Desai
Bis heute 1 Jahr und 4 Monate, seit Feb. 2023
Staff Engineer / Principal Engineer
Backend & Data Engineering ngram
Core dev in building pipelines for parsing Ethererum data. • Spark job for parsing events, transactions and traces data in Ethereum while checking for ABI mismatches and error handling of missing rows with ability to parse data in real time.This job can run through all evts generated on the Ethereum blockchain with 5 hrs and process every single record,the approximate size of which is 3.9TB. • Designed a system to breakdown ABI’s on a topic & event signature level, so that dups can be avoided in storing ABI
1 Jahr und 11 Monate, Apr. 2021 - Feb. 2023
Solutions Architect / Principal Engineer
Doubtnut Inc
Core Contributor and maintainer for Video Recommendation Platform which aggregates data In near real-time and building A.C.I.D capabilities on S3 • Created smart aggregates across 20 categories which improved recommendations and increased engage time significantly • Built a near realtime analytics platform on Apache H.U.D.I + Debezium + Apache Kafka which runs multiple pipelines consisting of 100+ tables and has a latency of 45 seconds max with simple transformations included
1 Jahr und 5 Monate, Dez. 2019 - Apr. 2021
Senior Data Engineer
Razorpay
Responsible for creating a Near Real Time Aggregation Platform that returns personalised user payment options under 100ms • Created smart aggregates across 6 categories which improved conversions significantly • Revamped Events Platform that generates about 1TB of data per day and reduced errors by 40% • Setting up an OLAP Store which is designed to ingest petabytes of data and respond with sub second responses • Building an Event Stitching Platform for merging events for different micro- services
Responsible for creating BookMyShow’s Big Data and Clickstream Platform and its PWA Ticketing Platform • Created a Clickstream platform for all the platforms which ingests more the 2 million events every 5 minutes • Increased conversions of mobile web by 80%. • Initial load time on 2G network is 3.1 seconds (along with smart personalisation also calculations) Brought down subsequent load time of the Web down to 1.9s • Moving PZN Apis to new Akka Stack from legacy stack
Sprachen
Hindi
Muttersprache
English
Muttersprache
Spanish
Grundlagen