Skip to main content

About

Table of Contents

I am passionate about solving complex data problems. As software and data engineer in Japan’s largest telecommunication company I developed a data platform and also learned about cultural and language barrier. This experience have equipped me with excellent technical and communication skills and ability to collaborate with multiple stakeholders. I have worked in data projects which involved data migration, anomaly detection, data visualization, Data Warehouse design, Developing ML Models, MLOps and ETL.

Work Experience ๐Ÿ’ผ
#

  • Slalom Build
    #

    Slalom Build

    July 2023 ~ Current

    Senior Data Engineer

    • Migration of Forecast Model to Snowflake (2024)

      • Successfully transitioned the forecasting model from AWS to Snowflake using Python and Snowpark, improving integration and performance.
      • Reduced ML pipeline execution time from 2 hours to 30 minutes by optimizing pipeline design and adhering to Snowflake best practices, resulting in a significant efficiency boost.
      • Enhanced model accuracy through extensive experiments, including sampling and feature engineering, leading to more reliable forecasting results. Ensured data quality at each step of the pipeline for more reliable results.
      • Designed the project architecture and introduced software engineering best practices to the Data Science team,enabling more flexible experimentation, a streamlined development workflow, and increased team productivity.
      • Served as Tech Lead for the project, responsible for setting weekly milestones, steering the technical direction,and leading customer interactions for requirement gathering and alignment.
    • SQLServer to Snowflake migration (2023)

      • Proficiently migrated 13TB of customer data to Snowflake, utilizing Airflow for streamlined job orchestration and ensuring a seamless data transition
      • Employed efficient data export tools like BCP, while optimizing Snowflake stages, to facilitate swift and efficient data ingestion and storage
      • Integrated AWS S3 to facilitate secure and scalable data transfers, ensuring data integrity and accessibility during the migration process
      • Implemented a dynamic data pipeline for ongoing synchronization, enabling continuous updates of SQL Server changes into Snowflake through an incremental copy approach. This real-time data availability enhances analytical capabilities and reporting accuracy
    • Data Visualization and anomaly detection (2023)

      • Deployed infrastructure setup of the entire project using terraform cloud and Google Cloud Platform. Deployed serverless data pipeline using CloudFunction and CloudScheduler
      • Designed end to end solution for anomaly detection and email alert system using GCP’s application integration.
      • Implemented best practices of CI/CD and infrastructure deployment

    Tech Stack

    NTT Communications & Docomo Business
    #

    NTT Communications & Docomo Business

    April 2017 ~ Jun 2023

    Software Engineer (Data)

    • Data Engineering

      • Contributed to the development and maintenance of the data analysis platform which collected and analyzed data from various internal departments. Designed system architecture of core components of the platform such as compute SQL engine Trino, storage HDFS, authentication workflow and Hadoop ecosystem.
      • Collaborated with data analysis team to improve performance of BI dashboards by tuning queries to retrieve data and by improving underlying physical schema.
      • Developed data pipeline to ingest the network traffic flow data from Apache Kafka to HDFS using Pyspark. This pipeline processed over million records per second and was used for real time anomaly detection to mitigate DDoS attacks.
      • Integrated SQL engine Trino with other internal data platform components such as BI tools, authentication systems and data warehouse tools. Developed custom connectors for Trino using Java and contributed to open source community.
      • Developed YouTube playback statistics data collector tool using Java, Docker, and JavaScript which collected run time statistics of YouTube playback from over 1000 nodes to analyze and compare the end-user Quality of Experience of the Internet over QUIC protocol vs HTTP. Collected and analyzed data from over 1000 nodes using python data visualization with plotly. Presented research work at ACM CoNEXT 2018, Greece
    • Software Engineering

      • Developed data catalog metadata platform using ReactJS to increase the productivity of the team by providing features such as advanced search for Japanese text, table search, schema information, and metadata from over 10 different internal systems and tools.
      • Developed JP-EN and EN-JP translation feature and Furigana(reading in Hiragana for Japanese Kanji) in the open source markdown documentation tool called CodiMD using DeepL API, JavaScript, and nodejs. This feature enhanced the communication and collaboration between non-Japanese speaking engineers and Japanese engineers
      • Presented NTT Communicationโ€™s Trino usage and architecture design at Trino Japan Virtual Meetup 2021
      • Trained new team members and helped them to understand the overall architecture of the data platform. Created useful tutorials and guides which improved team productivity.

    Tech Stack

    Persistent Systems
    #

    Persistent Systems

    June 2016 ~ January 2017

    Software Developer (UI)

    • Developed SQL queries and dashboard widgets for IBMโ€™s Tivoli Netcool Performance Manager product
    • Designed database relations for a full stack application for monitoring live water quality using sensors and internet gateway

    Tech Stack

  • Certifications ๐Ÿ†
    #

    Education ๐ŸŽ“
    #

    University of Texas at Austin

    Remote

    Master of Artificial Intelligence

    Vishwakarma Institute of Technology

    Pune, India

    Bachelor of Technology Computer Engineering

    There are no articles to list here yet.