-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdistance-metrics.html
37 lines (37 loc) · 11.9 KB
/
distance-metrics.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
<!doctype html><html lang=en-uk><head><script data-goatcounter=https://ruivieira-dev.goatcounter.com/count async src=//gc.zgo.at/count.js></script><script src=https://unpkg.com/@alpinejs/[email protected]/dist/cdn.min.js></script><script src=https://unpkg.com/[email protected]/dist/cdn.min.js></script><script type=module src=https://ruivieira.dev/js/deeplinks/deeplinks.js></script><link rel=preload href=https://ruivieira.dev/lib/fonts/fa-brands-400.woff2 as=font type=font/woff2 crossorigin=anonymous><link rel=preload href=https://ruivieira.dev/lib/fonts/fa-regular-400.woff2 as=font type=font/woff2 crossorigin=anonymous><link rel=preload href=https://ruivieira.dev/lib/fonts/fa-solid-900.woff2 as=font type=font/woff2 crossorigin=anonymous><link rel=preload href=https://ruivieira.dev/fonts/firacode/FiraCode-Regular.woff2 as=font type=font/woff2 crossorigin=anonymous><link rel=preload href=https://ruivieira.dev/fonts/vollkorn/Vollkorn-Regular.woff2 as=font type=font/woff2 crossorigin=anonymous><link rel=stylesheet href=https://ruivieira.dev/css/kbd.css type=text/css><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><title>Distance metrics · Rui Vieira</title>
<link rel=canonical href=https://ruivieira.dev/distance-metrics.html><meta name=viewport content="width=device-width,initial-scale=1"><meta name=robots content="all,follow"><meta name=googlebot content="index,follow,snippet,archive"><meta property="og:title" content="Distance metrics"><meta property="og:description" content="L-p metricsManhattan distance (L1)Given two vectors $p$ and $q$, such that
$$ \begin{aligned} p &= \left(p_1, p_2, \dots,p_n\right) \ q &= \left(q_1, q_2, \dots,q_n\right) \end{aligned} $$
we define the Manhattan distance as:
$$ d_1(p, q) = |p - q|1 = \sum{i=1}^n |p_i-q_i| $$
Euclidean distance (L2)In general, for points given by Cartesian coordinates in $n$-dimensional Euclidean space, the distance is
$$ d(p,q)=\sqrt {(p_{1}-q_{1})^{2}+(p_{2}-q_{2})^{2}+\dots +(p_{i}-q_{i})^{2}+\dots +(p_{n}-q_{n})^{2}} $$
Cluster distancesWithin-cluster sum of squares (WCSS)Given a set of observations ($x_1, x_2,\dots,x_n$), where each observation is a $d$-dimensional real vector, $k$-means clustering aims to partition the $n$ observations into $k$ ($\leq n$) sets $S=\lbrace S_1, S_2, \dots, S_k\rbrace$ so as to minimize the within-cluster sum of squares (WCSS) (i."><meta property="og:type" content="article"><meta property="og:url" content="https://ruivieira.dev/distance-metrics.html"><meta property="article:section" content="posts"><meta property="article:modified_time" content="2023-10-01T20:46:29+01:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Distance metrics"><meta name=twitter:description content="L-p metricsManhattan distance (L1)Given two vectors $p$ and $q$, such that
$$ \begin{aligned} p &= \left(p_1, p_2, \dots,p_n\right) \ q &= \left(q_1, q_2, \dots,q_n\right) \end{aligned} $$
we define the Manhattan distance as:
$$ d_1(p, q) = |p - q|1 = \sum{i=1}^n |p_i-q_i| $$
Euclidean distance (L2)In general, for points given by Cartesian coordinates in $n$-dimensional Euclidean space, the distance is
$$ d(p,q)=\sqrt {(p_{1}-q_{1})^{2}+(p_{2}-q_{2})^{2}+\dots +(p_{i}-q_{i})^{2}+\dots +(p_{n}-q_{n})^{2}} $$
Cluster distancesWithin-cluster sum of squares (WCSS)Given a set of observations ($x_1, x_2,\dots,x_n$), where each observation is a $d$-dimensional real vector, $k$-means clustering aims to partition the $n$ observations into $k$ ($\leq n$) sets $S=\lbrace S_1, S_2, \dots, S_k\rbrace$ so as to minimize the within-cluster sum of squares (WCSS) (i."><link rel=stylesheet href=https://ruivieira.dev/css/styles.css><!--[if lt IE 9]><script src=https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js></script><script src=https://oss.maxcdn.com/respond/1.4.2/respond.min.js></script><![endif]--><link rel=icon type=image/png href=https://ruivieira.dev/images/favicon.ico></head><body class="max-width mx-auto px3 ltr" x-data="{currentHeading: undefined}"><div class="content index py4"><div id=header-post><a id=menu-icon href=#><i class="fas fa-eye fa-lg"></i></a>
<a id=menu-icon-tablet href=#><i class="fas fa-eye fa-lg"></i></a>
<a id=top-icon-tablet href=# onclick='$("html, body").animate({scrollTop:0},"fast")' style=display:none aria-label="Top of Page"><i class="fas fa-chevron-up fa-lg"></i></a>
<span id=menu><span id=nav><ul><li><a href=https://ruivieira.dev/>Home</a></li><li><a href=https://ruivieira.dev/blog/>Blog</a></li><li><a href=https://ruivieira.dev/draw/>Drawings</a></li><li><a href=https://ruivieira.dev/map/>All pages</a></li><li><a href=https://ruivieira.dev/search.html>Search</a></li></ul></span><br><div id=share style=display:none></div><div id=toc><h4>Contents</h4><nav id=TableOfContents><ul><li><a href=#l-p-metrics :class="{'toc-h2':true, 'toc-highlight': currentHeading == '#l-p-metrics' }">L-p metrics</a></li><li><a href=#manhattan-distance-l1 :class="{'toc-h3':true, 'toc-highlight': currentHeading == '#manhattan-distance-l1' }">Manhattan distance (L1)</a></li><li><a href=#euclidean-distance-l2 :class="{'toc-h3':true, 'toc-highlight': currentHeading == '#euclidean-distance-l2' }">Euclidean distance (L2)</a></li><li><a href=#cluster-distances :class="{'toc-h2':true, 'toc-highlight': currentHeading == '#cluster-distances' }">Cluster distances</a></li><li><a href=#within-cluster-sum-of-squares-wcss :class="{'toc-h3':true, 'toc-highlight': currentHeading == '#within-cluster-sum-of-squares-wcss' }">Within-cluster sum of squares (WCSS)</a></li><li><a href=#dunn-index :class="{'toc-h3':true, 'toc-highlight': currentHeading == '#dunn-index' }">Dunn index</a></li><li><a href=#gower-distance :class="{'toc-h3':true, 'toc-highlight': currentHeading == '#gower-distance' }">Gower distance</a></li></ul></nav><h4>Related</h4><nav><ul><li class="header-post toc"><span class=backlink-count>1</span>
<a href=https://ruivieira.dev/machine-learning.html>Machine Learning</a></li><li class="header-post toc"><span class=backlink-count>1</span>
<a href=https://ruivieira.dev/counterfactuals.html>Counterfactuals</a></li></ul></nav></div></span></div><article class=post itemscope itemtype=http://schema.org/BlogPosting><header><h1 class=posttitle itemprop="name headline">Distance metrics</h1><div class=meta><div class=postdate>Updated <time datetime="2023-10-01 20:46:29 +0100 BST" itemprop=datePublished>2023-10-01</time>
<span class=commit-hash>(<a href=https://ruivieira.dev/log/index.html#e23fe25>e23fe25</a>)</span></div></div></header><div class=content itemprop=articleBody><h2 id=l-p-metrics x-intersect="currentHeading = '#l-p-metrics'">L-p metrics</h2><h3 id=manhattan-distance-l1 x-intersect="currentHeading = '#manhattan-distance-l1'">Manhattan distance (L1)</h3><p>Given two vectors $p$ and $q$, such that</p><p>$$
\begin{aligned}
p &= \left(p_1, p_2, \dots,p_n\right) \
q &= \left(q_1, q_2, \dots,q_n\right)
\end{aligned}
$$</p><p>we define the Manhattan distance as:</p><p>$$
d_1(p, q) = |p - q|<em>1 = \sum</em>{i=1}^n |p_i-q_i|
$$</p><h3 id=euclidean-distance-l2 x-intersect="currentHeading = '#euclidean-distance-l2'">Euclidean distance (L2)</h3><p>In general, for points given by Cartesian coordinates in $n$-dimensional Euclidean space, the distance is</p><p>$$
d(p,q)=\sqrt {(p_{1}-q_{1})^{2}+(p_{2}-q_{2})^{2}+\dots +(p_{i}-q_{i})^{2}+\dots +(p_{n}-q_{n})^{2}}
$$</p><h2 id=cluster-distances x-intersect="currentHeading = '#cluster-distances'">Cluster distances</h2><h3 id=within-cluster-sum-of-squares-wcss x-intersect="currentHeading = '#within-cluster-sum-of-squares-wcss'">Within-cluster sum of squares (WCSS)</h3><p>Given a set of observations ($x_1, x_2,\dots,x_n$), where each observation is a $d$-dimensional real vector, $k$-means clustering aims to partition the $n$ observations into $k$ ($\leq n$) sets $S=\lbrace S_1, S_2, \dots, S_k\rbrace$ so as to minimize the within-cluster sum of squares (WCSS) (<em>i.e.</em> variance). Formally, the objective is to find:</p><p>$$
{\underset {\mathbf {S} }{\operatorname {arg,min} }}\sum <em>{i=1}^{k}\sum <em>{\mathbf {x} \in S</em>{i}}\left|\mathbf {x} -{\boldsymbol {\mu }}</em>{i}\right|^{2}={\underset {\mathbf {S} }{\operatorname {arg,min} }}\sum <em>{i=1}^{k}|S</em>{i}|\operatorname {Var} S_{i}
$$</p><p>where $\mu_i$ is the mean of points in $S_i$. This is equivalent to minimizing the pairwise squared deviations of points in the same cluster:</p><p>$$
{\displaystyle {\underset {\mathbf {S} }{\operatorname {arg,min} }}\sum <em>{i=1}^{k},{\frac {1}{2|S</em>{i}|}},\sum <em>{\mathbf {x} ,\mathbf {y} \in S</em>{i}}\left|\mathbf {x} -\mathbf {y} \right|^{2}}
$$</p><p>The equivalence can be deduced from identity</p><p>$${\displaystyle \sum <em>{\mathbf {x} \in S</em>{i}}\left|\mathbf {x} -{\boldsymbol {\mu }}<em>{i}\right|^{2}=\sum <em>{\mathbf {x} \neq \mathbf {y} \in S</em>{i}}(\mathbf {x} -{\boldsymbol {\mu }}</em>{i})({\boldsymbol {\mu }}_{i}-\mathbf {y} )}.
$$</p><p>Because the total variance is constant, this is equivalent to maximizing the sum of squared deviations between points in <em>different</em> clusters (between-cluster sum of squares, BCSS) which follows from the law of total variance.</p><h3 id=dunn-index x-intersect="currentHeading = '#dunn-index'">Dunn index</h3><p>A full explanation is available at <a href=https://ruivieira.dev/dunn-index.html>Dunn index</a>.</p><h3 id=gower-distance x-intersect="currentHeading = '#gower-distance'">Gower distance</h3><p>A full explanation with examples is available at <a href>site/Machine learning/Gower distance</a>.</p></div></article><div id=footer-post-container><div id=footer-post><div id=nav-footer style=display:none><ul><li><a href=https://ruivieira.dev/>Home</a></li><li><a href=https://ruivieira.dev/blog/>Blog</a></li><li><a href=https://ruivieira.dev/draw/>Drawings</a></li><li><a href=https://ruivieira.dev/map/>All pages</a></li><li><a href=https://ruivieira.dev/search.html>Search</a></li></ul></div><div id=toc-footer style=display:none><nav id=TableOfContents><ul><li><a href=#l-p-metrics>L-p metrics</a><ul><li><a href=#manhattan-distance-l1>Manhattan distance (L1)</a></li><li><a href=#euclidean-distance-l2>Euclidean distance (L2)</a></li></ul></li><li><a href=#cluster-distances>Cluster distances</a><ul><li><a href=#within-cluster-sum-of-squares-wcss>Within-cluster sum of squares (WCSS)</a></li><li><a href=#dunn-index>Dunn index</a></li><li><a href=#gower-distance>Gower distance</a></li></ul></li></ul></nav></div><div id=share-footer style=display:none></div><div id=actions-footer><a id=menu-toggle class=icon href=# onclick='return $("#nav-footer").toggle(),!1' aria-label=Menu><i class="fas fa-bars fa-lg" aria-hidden=true></i> Menu</a>
<a id=toc-toggle class=icon href=# onclick='return $("#toc-footer").toggle(),!1' aria-label=TOC><i class="fas fa-list fa-lg" aria-hidden=true></i> TOC</a>
<a id=share-toggle class=icon href=# onclick='return $("#share-footer").toggle(),!1' aria-label=Share><i class="fas fa-share-alt fa-lg" aria-hidden=true></i> share</a>
<a id=top style=display:none class=icon href=# onclick='$("html, body").animate({scrollTop:0},"fast")' aria-label="Top of Page"><i class="fas fa-chevron-up fa-lg" aria-hidden=true></i> Top</a></div></div></div><footer id=footer><div class=footer-left>Copyright © 2024 Rui Vieira</div><div class=footer-right><nav><ul><li><a href=https://ruivieira.dev/>Home</a></li><li><a href=https://ruivieira.dev/blog/>Blog</a></li><li><a href=https://ruivieira.dev/draw/>Drawings</a></li><li><a href=https://ruivieira.dev/map/>All pages</a></li><li><a href=https://ruivieira.dev/search.html>Search</a></li></ul></nav></div></footer></div></body><link rel=stylesheet href=https://ruivieira.dev/css/fa.min.css><script src=https://ruivieira.dev/js/jquery-3.6.0.min.js></script><script src=https://ruivieira.dev/js/mark.min.js></script><script src=https://ruivieira.dev/js/main.js></script><script>MathJax={tex:{inlineMath:[["$","$"],["\\(","\\)"]]},svg:{fontCache:"global"}}</script><script type=text/javascript id=MathJax-script async src=https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js></script></html>