K-Means Clustering with Multiple Distance Metrics

This Python script performs K-Means clustering on a given dataset using Euclidean, Cosine, and Jaccard distance metrics. The code includes data preprocessing, clustering implementation, SSE (Sum of Squared Errors) calculation, and accuracy assessment.

Requirements

Python 3
Pandas
NumPy
Scikit-learn

Usage

Data Preparation:
- Ensure you have the required dataset (data.csv) and labels (label.csv) in the same directory as the script.
Run the Script:
- Execute the Python script in a Python environment that has the necessary libraries installed.
```
python kmeans_clustering.py
```

Code Overview

Data Loading:
- The script reads the dataset from data.csv using Pandas.
Data Scaling:
- The script scales the data to the range (0, 1).
Random Centroids Initialization:
- Random centroids are initialized for the K-Means algorithm.
Clustering:
- K-Means clustering is performed with Euclidean, Cosine, and Jaccard distance metrics.
SSE Calculation:
- SSE is calculated for each distance metric.
Accuracy Assessment:
- Majority voting is applied to assign labels, and accuracy is calculated for each distance metric.

Results

The script provides the SSE for each distance metric and the accuracy of K-Means clustering based on majority voting.

Notes

This script assumes the availability of the required dataset and labels.
Ensure that you have the necessary Python libraries installed.

Author

Vladislav Kuznetsov

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Movies		Movies
K-means Q3.ipynb		K-means Q3.ipynb
K-means Q4 Centroid.ipynb		K-means Q4 Centroid.ipynb
K-means Q4 SSE.ipynb		K-means Q4 SSE.ipynb
K-means Q4 iterations.ipynb		K-means Q4 iterations.ipynb
README.md		README.md
data.csv		data.csv
k-means.ipynb		k-means.ipynb
label.csv		label.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-Means Clustering with Multiple Distance Metrics

Requirements

Usage

Code Overview

Results

Notes

Author

License

About

Releases

Packages

Languages

VladKuzR/Kmeans

Folders and files

Latest commit

History

Repository files navigation

K-Means Clustering with Multiple Distance Metrics

Requirements

Usage

Code Overview

Results

Notes

Author

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages