k-Means Parallelization

k-Means Parallelization#

➡️ Objectives :

Program the k-means algorithm from scratch.
Apply it to 1 Million data.
Calculate the optimal value of k (computationally heaviest part as it requires to run the algorithm repeatedly).
Parallelize point 3) with multiprocessing and multithreading.
Analyze computation times and speedups with respect to the serial (non-parallelized) version.

➡️ Conclusions:

🔸 Computational intensity: the selection of the optimal value of k is the most computationally expensive of the program, consuming most (97%) of the execution time.

🔸 Advantages of parallelisation: to cope with the computational load, parallelisation techniques that improve efficiency were employed.

🔸 Starmap method: Among the various multiprocessing methods used, starmap() was the most effective in delivering superior results.

🔸 Performance improvement: Parallelism significantly improves performance. The serial approach takes almost 4 minutes for 1 Million data, while the multiprocessing and threading approaches complete the task in less than 1 minute, achieving a speedup of more than 4 times.