Skip to content

The Automated Data Preprocessing Toolkit streamlines the data preprocessing stage in machine learning by automating tasks like handling missing values, encoding categorical features, and normalizing data. With a user-friendly interface for easy dataset uploads, it enhances data quality and improves model performance efficiently.

Notifications You must be signed in to change notification settings

Nidhi-Satyapriya/AutoEDA-Automated-Data-Preprocessing-Toolkit

Repository files navigation

AutoEDA - Automated Data Preprocessing Toolkit

Overview

AutoEDA is an open-source project designed to automate the data preprocessing workflow, making it easier for data scientists and analysts to prepare their datasets for exploratory data analysis (EDA) and machine learning model building. This toolkit aims to eliminate null values, prepare clean datasets, perform feature engineering, and optimize preprocessing strategies for seamless integration with various machine learning models.

Contribution Guidelines

We welcome contributions from the community! Below are the areas where you can contribute:

Frontend

  1. Enhancing the Frontend: Improve the user interface and user experience of the application using modern frontend frameworks and libraries.
  2. Adding Necessary Pages: Implement additional pages like a feedback form or documentation section to enhance user interaction and gather input.

Model Building- The heart of AutoEDA

Contributors can assist in the following areas:

  1. Data Loading: Implement efficient methods to read and store CSV files.
  2. Data Cleaning: Develop algorithms for handling null values, removing duplicates, and correcting data types.
  3. Feature Engineering: Introduce techniques to create new features from existing data for improved model performance.
  4. Model Training: Experiment with various machine learning algorithms to optimize preprocessing strategies.

Backend

  1. Building Functions for Data Processing: Develop different functions that support data cleaning, filtering, and preprocessing procedures.
  2. Creating APIs: Build APIs for the machine learning model to handle data uploads and downloads efficiently.
  3. Integration: Ensure seamless integration between the frontend and backend components.
  4. Dockerization: Dockerize the application to streamline deployment and ensure consistent environments.

Workflow

alt text

Requirements

  • React.JS + Vite (for frontend)
  • Python 3.x (for backend + model building)
  • Docker (for containerization)

Make sure to run the .gitignore file.

Getting Started

To clone the repository, run the following command:

git clone https://github.com/Nidhi-Satyapriya/AutoEDA-Automated-Data-Preprocessing-Toolkit

Once cloned, follow the instructions in the respective frontend and backend directories for setup and running the application.

About

The Automated Data Preprocessing Toolkit streamlines the data preprocessing stage in machine learning by automating tasks like handling missing values, encoding categorical features, and normalizing data. With a user-friendly interface for easy dataset uploads, it enhances data quality and improves model performance efficiently.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published