Ethiopian Medical Data Warehouse and Analytics Pipeline

The Ethiopian Medical Business Data Warehouse & Analytics Platform aims to enhance the efficiency of Ethiopia's healthcare sector by creating a robust data warehouse. The project will extract data and images from public Telegram channels related to Ethiopian medical businesses, perform object detection on the images, and clean, transform, and store the extracted data in the warehouse. The main goal is to provide a unified solution for data analysis, supporting informed decision-making and driving strategic advancements in healthcare.

Technologies/Tools Used:

Python, DBT, SQL, ETL, PostgreSQL, FastAPI, Pandas, Pytest, SQLAlchemy, YOLOv5 Postman, CI/CD, Jupiter Notebook,Git , PDF & Google Drive (for project report).

Key Accomplishments

ETL Process: Successfully managed the end-to-end ETL process, including data extraction, cleaning, transformation, and loading.
DBT for Data Modeling: Implemented data modeling and transformation using SQL with DBT.
Image Extraction and Object Detection: Extracted images from Telegram channels, performed object detection, and stored the results back into the data warehouse.
Database Management: Loaded cleaned data into a PostgreSQL database.
API Development: Exposed cleaned data for analysis through APIs using FastAPI, facilitating easy access from the database/data warehouse.
Project Documentation: Prepared comprehensive documentation for each step of the project to ensure clarity and understanding for the client.

Data Scraping and Collection Pipeline

Telegram Scraping

Utilize the Telegram API or custom scripts to extract data from public Telegram channels related to Ethiopian medical businesses. Key channels include:

Image Scraping

Collect images from specified Telegram channels for object detection:

Chemed Telegram Channel
Lobelia for Cosmetics

For more details, see the data_scraping_and_cleaning.ipynb notebook.

Data Cleaning and Transformation

Data Cleaning

Remove duplicates
Handle missing values
Standardize formats
Validate data

Data Cleaning Models DBT Doc

Set up DBT for data transformation and create models (SQL files) for data transformation:

pip install dbt
dbt init dbt_med
dbt run

Storing Cleaned Data

Store cleaned data in a database.

Fact Table in PostgreSQL Database

For more details, see the data_scraping_and_cleaning.ipynb notebook.

Object Detection Using YOLO

Setting Up the Environment

Ensure necessary dependencies are installed:

pip install opencv-python
pip install torch torchvision
pip install tensorflow

Downloading the YOLO Model

git clone https://github.com/ultralytics/yolov5.git
cd yolov5
pip install -r requirements.txt

Preparing the Data

Collect images from the specified Telegram channels.
Use the pre-trained YOLO model to detect objects in the images.

Processing the Detection Results

Extract data such as bounding box coordinates, confidence scores, and class labels.
Store detection data in a database table.

For more details, see the yolo.ipynb notebook.

Exposing the Collected Data Using FastAPI

Setting Up the Environment

Install FastAPI and Uvicorn:

pip install fastapi uvicorn

Create a FastAPI Application

Set up a basic project structure:

my_project/
├── main.py
├── database.py
├── models.py
├── schemas.py
└── crud.py

Database Configuration

In database.py, configure the database connection using SQLAlchemy.

Creating Data Models

In models.py, define SQLAlchemy models for the database tables.

Creating Pydantic Schemas

In schemas.py, define Pydantic schemas for data validation and serialization.

CRUD Operations

In crud.py, implement CRUD (Create, Read, Update, Delete) operations for the database.

Creating API Endpoints

In main.py, define the API endpoints using FastAPI.

Get All Telegram Data

Get All YOLO Detection Results

Postman Collection

You can use the Postman API collection found in the link below:

Postman collection link

Installation

To get started, follow these steps:

Clone the repository:

git clone https://github.com/Daniel-Andarge/AiML-ethiopian-medical-biz-datawarehouse.git
cd AiML-ethiopian-medical-biz-datawarehouse

Create a virtual environment and activate it:

# Using virtualenv
virtualenv venv
source venv/bin/activate

# Using conda
conda create -n your-env python=3.x
conda activate your-env

Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

Run Data Scraping Scripts:
```
python extract_load_pipeline.py
```
Run DBT Models:
```
dbt run
```

Run Object Detection:

python detect.py --source data/telegram_images --save-txt --save-conf --project results --name run1

Start FastAPI Application:
```
uvicorn main:app --reload
```

Contributing

Contributions are welcome. Please follow these steps:

Fork the repository.
Create a new branch for your feature or bug fix.
Make your changes and commit them.
Push your branch to your forked repository.
Create a pull request to the main repository.

License

This project is licensed under the MIT License.

Acknowledgments

Special thanks to the contributors and the open-source community for their support and resources.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Ethiopian Medical Data Warehouse and Analytics Pipeline

Technologies/Tools Used:

Key Accomplishments

Table of Contents

Data Scraping and Collection Pipeline

Telegram Scraping

Image Scraping

Data Cleaning and Transformation

Data Cleaning

Data Cleaning Models DBT Doc

Storing Cleaned Data

Fact Table in PostgreSQL Database

Object Detection Using YOLO

Setting Up the Environment

Downloading the YOLO Model

Preparing the Data

Processing the Detection Results

Exposing the Collected Data Using FastAPI

Setting Up the Environment

Create a FastAPI Application

Database Configuration

Creating Data Models

Creating Pydantic Schemas

CRUD Operations

Creating API Endpoints

Get All Telegram Data

Get All YOLO Detection Results

Postman Collection

Installation

Usage

Contributing

License

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

Ethiopian Medical Data Warehouse and Analytics Pipeline

Technologies/Tools Used:

Key Accomplishments

Table of Contents

Data Scraping and Collection Pipeline

Telegram Scraping

Image Scraping

Data Cleaning and Transformation

Data Cleaning

Data Cleaning Models DBT Doc

Storing Cleaned Data

Fact Table in PostgreSQL Database

Object Detection Using YOLO

Setting Up the Environment

Downloading the YOLO Model

Preparing the Data

Processing the Detection Results

Exposing the Collected Data Using FastAPI

Setting Up the Environment

Create a FastAPI Application

Database Configuration

Creating Data Models

Creating Pydantic Schemas

CRUD Operations

Creating API Endpoints

Get All Telegram Data

Get All YOLO Detection Results

Postman Collection

Installation

Usage

Contributing

License

Acknowledgments