Chapter 33: Machine-Learning Algorithms: Clustering, Gradient Descent, and Weights

Loading audio…

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

If there is an issue with this chapter, please let us know → Contact Us

K-means clustering addresses the unsupervised problem of partitioning n d-dimensional points into at most k disjoint groups to minimize total intra-cluster squared Euclidean distance, solved by Lloyd's algorithm, which iterates between assigning each point to its nearest centroid and recomputing centroids as cluster means until assignments stabilize, running in O(T × d × k × n) time but potentially converging to a local minimum — motivating multiple restarts with different random initializations. The multiplicative-weights framework models online decision-making with n experts, each making binary predictions per round, where the weighted-majority algorithm begins with equal weights and after each round scales down incorrect experts by a factor of (1–ε), making at most (2.1 + ε) × m* + (2 ln n)/ε total mistakes against a best expert making m* mistakes — with a randomized variant reducing this to (1 + ε) × m* + (ln n)/ε in expectation by sampling experts proportionally to their weights rather than taking a hard majority vote. Gradient descent solves unconstrained optimization by iteratively updating a solution as x^(t+1) = x^(t) – η∇f(x^(t)), where η is the step size, and Theorem 33.8 guarantees that for convex functions with step size η = R/(L√T), the error after T steps is at most RL/√T — with the output taken as the average of iterates for improved convergence. For constrained optimization, each gradient step is followed by projection back onto the feasible region, and stochastic gradient descent scales the approach to large datasets by computing gradients on individual examples or mini-batches rather than the full dataset, making it the workhorse of modern neural network training and large-scale machine learning.