Skip to content

Implementation of the K-Means clustering algorithm.

License

Notifications You must be signed in to change notification settings

robtomlinson/kmeans

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

K-Means Clustering Algorithm

Implementation in C++ of the K-Means clustering algorithm.

Examples of datasets are in datasets folder.

References are in references folder.

This link was very useful for me: http://mnemstudio.org/clustering-k-means-example-1.htm

The implementation is in the file kmeans.cpp.

Each dataset is in format expected by the program.

Explanation of the format:

First line: A B C D E

"A" is the amount of data points.
"B" is the amount of attributes.
"C" is the amount of clusters.
"D" is the maximum iterations.
"E" indicates if contains a name for each data point. The value for "E" is 0 (not contains) or 1 (contains).
The next "A" lines contains "B" attributes and the data point name (if "E" variable is 1).

Example of dataset:

7 2 2 100 0
1.0 1.0
1.5 2.0
3.0 4.0
5.0 7.0
3.5 5.0
4.5 5.0
3.5 4.5

7 is the amount of data points.
2 is the amount of attributes.
2 is the amount of clusters.
100 is the maximum iterations.
0 indicates that not contains data point name.
The next 7 lines contains 2 attributes each.

Example that contains data point name (iris-data):

150 4 3 100 1
5.1 3.5 1.4 0.2 Iris-setosa
4.9 3.0 1.4 0.2 Iris-setosa
4.7 3.2 1.3 0.2 Iris-setosa
4.6 3.1 1.5 0.2 Iris-setosa
(...)

The euclidean distance was used for to calculate the distance of each data point for the centroid of cluster.

The algorithm stops by maximum number of iterations or if no data point exchange cluster.

Doubts? [email protected]

About

Implementation of the K-Means clustering algorithm.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 100.0%