Skip to content
View skmahaboob's full-sized avatar

Block or report skmahaboob

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
skmahaboob/README.md

πŸ‘¨β€πŸ’» Mahaboob Sheik | Data Engineer

Welcome to my GitHub

I'm Mahaboob Sheik, a passionate Data Engineer with over 3 years of experience in transforming complex datasets into actionable insights. My expertise spans across big data technologies, cloud platforms, and advanced data processing tools, driving efficient data solutions that empower businesses to thrive.

πŸ› οΈ Technical Skills

πŸ–₯️ Languages

  • Python 🐍
  • PySpark ⚑
  • SQL πŸ—ƒοΈ
  • Scala πŸ”

πŸ’Ύ Databases

  • SQL Server πŸ›’οΈ
  • Postgres 🐘
  • MongoDB πŸƒ

☁️ Big Data & Cloud Technologies

  • Apache Spark ✨
  • Hadoop 🐘
  • Databricks πŸš€
  • Azure Data Factory 🏭
  • ADLS Gen 2 πŸ’Ύ
  • Google Cloud Platform (GCP) ☁️

πŸ› οΈ DevOps & CI/CD

  • Azure DevOps πŸš€
  • Docker 🐳
  • CI/CD Pipelines πŸ”„
  • Apache Airflow 🌬️

🧰 Other Tools & Technologies

  • Kafka πŸ”—
  • Snowflake ❄️
  • StreamSets 🌐
  • Linux 🐧
  • Data Modeling πŸ“Š

πŸš€ Key Projects & Achievements

1. Scalable Big Data Pipelines

  • Role: Lead Data Engineer
  • Technologies: Spark, Hadoop, Azure Databricks
  • Description: Designed and managed scalable big data pipelines handling over 50 terabytes monthly. Improved query performance by 20% using distributed computing technologies.

2. CI/CD Pipeline Automation

  • Role: DevOps Lead
  • Technologies: Azure DevOps, StreamSets, Docker
  • Description: Developed fully automated CI/CD pipelines, reducing deployment time by 50%. Facilitated seamless migration of 40+ pipelines from Development to QA with minimal downtime.

3. Resource Optimization on GCP

  • Role: Data Engineer
  • Technologies: Google Cloud Platform (GCP), Apache Spark
  • Description: Optimized GCP resources by implementing Storage Lifecycle Management, reducing costs by over 10% annually and boosting operational efficiency.

πŸ‘₯ Leadership & Team Management

  • Successfully led a cross-functional team of 10 members, achieving a 20% increase in on-time project completions.
  • Focus on collaboration, continuous learning, and collective success.

πŸŽ“ Education & Certifications

  • Bachelor of Technology in Electronics and Communication Engineering
    • SRKR Engineering College, 2017-2021
  • Certifications
    • Microsoft Certified Azure Data Engineer Associate (DP-203) πŸŽ“
    • Microsoft Certified Azure Fundamentals (AZ-900) πŸŽ“
    • StreamSets White Belt Certification πŸ₯‹

🌱 Current Learning & Interests

  • Current Projects: Working on a new Data Engineering project using Google Cloud Platform (GCP).
  • Learning: Kafka πŸ”— and Snowflake ❄️.

🀝 Let's Connect!

🌐 Website

Visit my live portfolio website at https://skmahaboob.github.io to learn more about me, my skills, and my work.


Feel free to explore the repository and contact me if you have any questions or would like to collaborate on a project!

Contact: MahaboobSheik26@gmail.com πŸ“§

Pinned Loading

  1. AzureDataFactory AzureDataFactory Public

    This repository contains resources for mastering Azure Data Factory from scratch, specifically designed for data engineers. The repository includes pipelines, datasets, linked services, data flows,…

    1

  2. Azure-Serverless-Data-Processing-Pipeline Azure-Serverless-Data-Processing-Pipeline Public

    This repository contains a serverless data processing pipeline project built using Azure Functions and Azure SQL Database. The pipeline is designed to ingest JSON data from an HTTP endpoint, proces…

    Python 1