This Python script implements a genetic algorithm for clustering data. The algorithm optimizes the cluster assignments of data points using a genetic approach, aiming to improve the silhouette score. The silhouette score is a measure of how well-defined the clusters are in the data.
- Python 3
- Required libraries: numpy, pandas, scikit-learn, matplotlib
- Clone the repository:
https://github.com/parvvaresh/clustering-with-genetic
cd clustering-with-genetic
- Install the required dependencies:
pip install -r requirements.txt
Run the genetic_clustering.py
script to execute the genetic clustering algorithm on the provided dataset. Make sure to update the script with your dataset or use the default Iris dataset.
python3 test_iris.py
The genetic clustering algorithm consists of the following components:
Defines the genetic operations such as mutation, generation, and fitness calculation.
Manages the clustering process, including the initialization of populations, evolution, and convergence.
Utilizes the genetic and clustering classes to run the algorithm on a given dataset.
size_population
: Number of individuals in the population.goal
: The desired fitness score to achieve.repeat
: Number of generations to run the algorithm.is_mutation
: Boolean flag to enable or disable mutation.
The script outputs the progress of the algorithm, including the generation number and the fitness score achieved. Additionally, a plot of the fitness scores over generations is displayed at the end of the execution.
This project is licensed under the MIT License - see the LICENSE.md file for details.
- This implementation is inspired by genetic algorithms and clustering techniques.
- Special thanks to the scikit-learn library for providing the silhouette score metric.