Skip to content

🌟 An end-to-end full-stack data science project, including modelling, MLOps, and data storytelling. ✨

License

Notifications You must be signed in to change notification settings

nnthanh101/Machine-Learning

Repository files navigation

πŸ”₯ MLOps at Scale πŸ¦…

🌟 An end-to-end full-stack Data Science and AI/ML project effectively implementing ML models, MLOps practices, scalable machine learning, and data storytelling. ✨

πŸ“š πŸ› οΈ Experiment (Design + Develop) --> πŸš€ Production (Deploy + Iterate) βš™οΈ: Full-Stack Data Science and Production-Grade Machine Learning at Scale are the fastest-growing fields in technology. This repository aims to develop professional and strong advanced analytics skills to compete in the age of digital and AI. 🏁

Nhat-Thanh Nguyen Thanh Nguyen Nhat-Thanh Nguyen

🎯 End-to-end full-stack machine learning from experimental (design + development) to production (deployment + iteration) for iteratively building reliable production-grade AI/ML applications.

  • πŸ’‘ Agile CRISP-DM for Data Science and Machine Learning
    • Cookiecutter Data Science (CCDS) V2: data science tooling and MLOps
    • Agile Implementation of CRISP-DM for Data Science and Machine Learning
  • βš™οΈ MLOps
    • πŸ’» DevOps best practices for developing and deploying machine learning models.
    • βš™οΈΒ BuildΒ anΒ end-to-end machineΒ learningΒ systemΒ byΒ connectingΒ MLOpsΒ componentsΒ suchΒ asΒ tracking,Β testing,Β serving,Β andΒ orchestration.
  • πŸš€ Dev to Prod:
    • πŸ™ Develop robust CI/CD workflows to continuously train and deploy better models in a modular way that integrates with any stack.
    • πŸ“ˆ Scale: ML workloads (data, training, tuning, and serving) are easily scalable, facilitating a quick and reliable transition from development to production without requiring code or infrastructure modifications.

Deliverables πŸ’Ž

πŸ“† ⏰ Deliverables / Tasks Done πŸ”— Reference Links
01 πŸŽ“ AWS Certified Data Analytics - Specialty (DAS) (Collecting Streaming Data, Data Collection and Getting Data, Amazon Elastic Map Reduce (EMR), Using Redshift & Redshift Maintenance & Operations, AWS Glue, Athena, and QuickSight, ElasticSearch, AWS Security Services) βœ… A Cloud Guru - DAS & ACG Practice Exam & UDemy Practice Exam
02 πŸŽ“ AWS Certified Machine Learning - Specialty (MLS-C01) (Data Preparation, Data Analysis and Visualization, Modeling, Algorithms, Evaluation and Optimization, Implementation and Operations) β˜‘οΈ A Cloud Guru - MLS-C01 & ACG Practice Exam & UDemy Practice Exam
03 πŸ›  Reproducible Local Development for Data Science and Machine Learning projects Data Science
04 πŸ‘¨β€πŸ’» Analytics-Experience Project: Time Series Forecasting & Machine Learning Prediction Analytics-Experience Project
05 πŸ“š MLOps MLOps
06 πŸ’Ή Analytics Dashboard: Data Insights & Visual Analytics Visual Analytics
07 πŸš€ Scalable MLOps MLOps at Production-grade Scale Scalable MLOps

Project Organization

πŸ›  Production-grade project structure for successful data-science or machine-learning projects πŸš€

β”œβ”€β”€ Makefile           <- Makefile with convenience commands like `make data` or `make train`
β”œβ”€β”€ README.md          🀝 Explain your project and its structure for better collaboration.
β”œβ”€β”€ config/
β”‚   └── logging.config.ini
β”œβ”€β”€ data               πŸ” Where all your raw and processed data files are stored.
β”‚   β”œβ”€β”€ external       <- Data from third-party sources.
β”‚   β”œβ”€β”€ interim        <- Intermediate data that has been transformed.
β”‚   β”œβ”€β”€ processed      <- The final, canonical data sets for modeling.
β”‚   └── raw            <- The original, unprocessed, immutable data dump.
β”‚
β”œβ”€β”€ docs               πŸ““ A default docusaurus | mkdocs project; see docusaurus.io | mkdocs.org for details
β”‚
β”œβ”€β”€ models             🧠 Store your trained and serialized models for easy access and versioning.
β”‚
β”œβ”€β”€ notebooks          πŸ’» Jupyter notebooks for exploration and visualization.
β”‚   β”œβ”€β”€ data_exploration.ipynb
β”‚   β”œβ”€β”€ data_preprocessing.ipynb
β”‚   β”œβ”€β”€ model_training.ipynb
β”‚   └── model_evaluation.ipynb
β”‚
β”œβ”€β”€ pyproject.toml     <- Project configuration file with package metadata for analytics
β”‚                         and configuration for tools like black
β”‚
β”œβ”€β”€ references         <- Data dictionaries, manuals, and all other explanatory materials.
β”‚
β”œβ”€β”€ reports            πŸ“Š Generated analysis (reports, charts, and plots) as HTML, PDF, LaTeX.
β”‚   └── figures        <- Generated graphics and figures to be used in reporting
β”‚
β”œβ”€β”€ requirements.txt   πŸ›  The requirements file for reproducing the analysis environment, for easy environment setup.
β”‚
β”œβ”€β”€ setup.cfg          <- Configuration file for flake8
β”‚
β”œβ”€β”€ src                πŸ’Ύ Source code for data processing, feature engineering, and model training.
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   └── data_preprocessing.py
β”‚   β”œβ”€β”€ features/
β”‚   β”‚   └── feature_engineering.py
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   └── model.py
β”‚   └── utils/
β”‚       └── helper_functions.py
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_data_preprocessing.py
β”‚   β”œβ”€β”€ test_feature_engineering.py
β”‚   └── test_model.py
β”œβ”€β”€ setup.py           πŸ›  A Python script to make the project installable.
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ .gitignore
└── analytics          🧩 Source code for use in this project.
    β”‚
    β”œβ”€β”€ __init__.py    <- Makes analytics a Python module
    β”‚
    β”œβ”€β”€ data           <- Scripts to download, preprocess, or generate data
    β”‚   └── make_dataset.py
    β”‚
    β”œβ”€β”€ features       <- Scripts to turn raw data into features for modeling
    β”‚   └── build_features.py
    β”‚
    β”œβ”€β”€ models         <- Scripts to train models and then use trained models to make predictions.           
    β”‚   β”œβ”€β”€ predict_model.py
    β”‚   └── train_model.py
    β”‚
    └── visualization  <- Scripts to create exploratory and results-oriented visualizations
        └── visualize.py