Gunj Desai

is looking for freelance projects. 🔎

Bis 2023, Solutions Architect / Principal Engineer, Doubtnut Inc

Über mich

Someone who loves building data-centric products. I am well-versed in using Kafka and Spark for building real-time pipelines that can handle high ingestion rates. Notably, i've have built ACID capabilities on S3 using a combination of Debezium, Kafka, and Apache HUDI. This has enabled me to support the ingestion of 2k messages/sec for 100+ tables in parallel with a latency of 45 seconds max. I've also demonstrated their ability to work with large-scale data by parsing WEB3 EVM events using Pyspark in near real-time with backfill ability built-in. The Spark job can parse all 3.9TB data of Ethereum Transactions in under 5 hours when run in backfill mode. I've also built a streaming platform that can ingest more than a billion events per week, with an average latency of 3 seconds per message while validating individual packets. I've also created an open-source Kafka Connect monitoring tool that continuously checks on Kafka Connect connectors and notifies by mail in case of any anomaly.

Fähigkeiten und Kenntnisse

Big Data
Data Engineer
Data Engineering
Data Warehouse
Apache Spark
Spark Streaming
Spark
Kafka
GoLang
Scala
Data Pipelines
ETL
Apache Hadoop
HDFS
Python
Engineering
Software Development

Werdegang

Berufserfahrung von Gunj Desai

  • Bis heute 1 Jahr und 4 Monate, seit Feb. 2023

    Staff Engineer / Principal Engineer

    Backend & Data Engineering ngram

    Core dev in building pipelines for parsing Ethererum data. • Spark job for parsing events, transactions and traces data in Ethereum while checking for ABI mismatches and error handling of missing rows with ability to parse data in real time.This job can run through all evts generated on the Ethereum blockchain with 5 hrs and process every single record,the approximate size of which is 3.9TB. • Designed a system to breakdown ABI’s on a topic & event signature level, so that dups can be avoided in storing ABI

  • 1 Jahr und 11 Monate, Apr. 2021 - Feb. 2023

    Solutions Architect / Principal Engineer

    Doubtnut Inc

    Core Contributor and maintainer for Video Recommendation Platform which aggregates data In near real-time and building A.C.I.D capabilities on S3 • Created smart aggregates across 20 categories which improved recommendations and increased engage time significantly • Built a near realtime analytics platform on Apache H.U.D.I + Debezium + Apache Kafka which runs multiple pipelines consisting of 100+ tables and has a latency of 45 seconds max with simple transformations included

  • 1 Jahr und 5 Monate, Dez. 2019 - Apr. 2021

    Senior Data Engineer

    Razorpay

    Responsible for creating a Near Real Time Aggregation Platform that returns personalised user payment options under 100ms • Created smart aggregates across 6 categories which improved conversions significantly • Revamped Events Platform that generates about 1TB of data per day and reduced errors by 40% • Setting up an OLAP Store which is designed to ingest petabytes of data and respond with sub second responses • Building an Event Stitching Platform for merging events for different micro- services

  • 3 Jahre und 10 Monate, März 2016 - Dez. 2019

    Senior Software Engineer

    Bookmyshow.com

    Responsible for creating BookMyShow’s Big Data and Clickstream Platform and its PWA Ticketing Platform • Created a Clickstream platform for all the platforms which ingests more the 2 million events every 5 minutes • Increased conversions of mobile web by 80%. • Initial load time on 2G network is 3.1 seconds (along with smart personalisation also calculations) Brought down subsequent load time of the Web down to 1.9s • Moving PZN Apis to new Akka Stack from legacy stack

Sprachen

  • Hindi

    Muttersprache

  • English

    Muttersprache

  • Spanish

    Grundlagen

Interessen

Tennis
Formula 1
Books
Travel

21 Mio. XING Mitglieder, von A bis Z