Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start using PMVector and cleaning #37

Merged
merged 3 commits into from
Apr 18, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 40 additions & 28 deletions src/AI-EditDistances-Tests/AIHellingerDistanceTest.class.st
Original file line number Diff line number Diff line change
Expand Up @@ -27,39 +27,51 @@ AIHellingerDistanceTest >> setUp [
]

{ #category : #tests }
AIHellingerDistanceTest >> testDifferentDistributions [
AIHellingerDistanceTest >> testDifferentDistribution1 [

| p q |

p := #(0.36 0.48 0.16).
q := #(0.33 0.33 0.33).
self
assert: (metric distanceBetween: p and: q)
| p q |
p := #( 0.36 0.48 0.16 ).
q := #( 0.33 0.33 0.33 ).
self
assert: (metric distanceBetween: p and: q)
closeTo: 0.15049826726881443
epsilon: 0.001.

p := #(0.25 0.25 0.25 0.25).
q := #(0.1 0.2 0.3 0.4).
self
assert: (metric distanceBetween: p and: q)
closeTo: 0.167
epsilon: 0.001.

p := #(0 0.5 0.5).
q := #(1 0 0).
self
assert: (metric distanceBetween: p and: q)
closeTo: 1.0
epsilon: 0.0001.
epsilon: 0.001
]

p := #(0.2 0.3 0.1 0.4).
q := #(0.1 0.4 0.3 0.2).
self
assert: (metric distanceBetween: p and: q)
closeTo: 0.2368980623511251
epsilon: 0.0001.
{ #category : #tests }
AIHellingerDistanceTest >> testDifferentDistribution2 [

| p q |
p := #( 0.25 0.25 0.25 0.25 ).
q := #( 0.1 0.2 0.3 0.4 ).
self
assert: (metric distanceBetween: p and: q)
closeTo: 0.167
epsilon: 0.001
]

{ #category : #tests }
AIHellingerDistanceTest >> testDifferentDistribution3 [

| p q |
p := #( 0 0.5 0.5 ).
q := #( 1 0 0 ).
self
assert: (metric distanceBetween: p and: q)
closeTo: 1.0
epsilon: 0.0001
]

{ #category : #tests }
AIHellingerDistanceTest >> testDifferentDistribution4 [

| p q |
p := #( 0.2 0.3 0.1 0.4 ).
q := #( 0.1 0.4 0.3 0.2 ).
self
assert: (metric distanceBetween: p and: q)
closeTo: 0.2368980623511251
epsilon: 0.0001
]

{ #category : #tests }
Expand Down
13 changes: 2 additions & 11 deletions src/AI-EditDistances/AIEuclideanDistance.class.st
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,7 @@ Class {

{ #category : #api }
AIEuclideanDistance >> distanceBetween: firstPoint and: secondPoint [
"It follows the Euclidean distance between two points formula"

"It follows the Euclidean distance between two points formula.
The code is not idiomatic because of performance. We see that writting this instead of
(firstPoint - secondPoint raisedTo: 2) sum sqrt"

| sum |
sum := 0.0.
1 to: firstPoint size do: [ :i |
| diff |
diff := (firstPoint at: i) asFloat - (secondPoint at: i) asFloat.
sum := sum + (diff * diff) ].
^ sum sqrt
^ (firstPoint asPMVector - secondPoint asPMVector) norm
]
38 changes: 26 additions & 12 deletions src/AI-EditDistances/AIHellingerDistance.class.st
Original file line number Diff line number Diff line change
@@ -1,18 +1,39 @@
"
## Description

In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions.
In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. The Hellinger distance is defined in terms of the Hellinger integral [more info in the wikipedia page](https://en.wikipedia.org/wiki/Hellinger_distance).

The Hellinger distance is defined as:

$H^2(P,Q) = 1/2 \int (\sqrt{P(dx)} - \sqrt{Q(dx)})^{2}$, where P(x) and Q(x) are two probability distribution functions.

We can aproximate its value if we calculate n points of the probaility distrubutions for after summing them. At the end, the integral is just an infite sum.
jordanmontt marked this conversation as resolved.
Show resolved Hide resolved

So the above formula can be implemented as the followuing:

$H(P,Q) = \sqrt {1/2 \sum_{x=0}^{n} (\sqrt{P(x)} - \sqrt{Q(x)})^2}$

If we observe this is the same formula as the one of the integral, except that we approximating the value with the sum of n values of the evaluated probability functions.
We can take out the $1/2$ of the `sqrt` as $1/\sqrt {2}$ to have the followinf formula:
jordanmontt marked this conversation as resolved.
Show resolved Hide resolved

$H(P,Q) = 1/\sqrt {2} \sqrt {\sum_{x=0}^{n} (\sqrt{P(x)} - \sqrt{Q(x)})^2}$

We can see that $\sqrt {\sum_{x=0}^{n} (\sqrt{P(x)} - \sqrt{Q(x)})^2$ is nothing more than the norm of the vector $\sqrt {P} - \sqrt {Q}$.

So at the end we have:

$H(P,Q) = 1/\sqrt {2} \lVert \sqrt{P} - \sqrt{Q} \rVert$

## Usage

The following example takes two probability distributions, p and q, as input and returns the Hellinger distance between them. The method iterates over the probability values, calculates the squared difference of the square roots, sums them up, and finally multiplies the square root of the sum by `(1 / (2 sqrt))`.
The following example takes two probability distributions, p and q, as input and returns the Hellinger distance between them.

```language=Pharo
| p q |

p := #(0.36 0.48 0.16).
q := #(0.33 0.33 0.33).
AIHellingerDistance distanceBetween: p and: q.
AIHellingerDistance new distanceBetween: p and: q.
jordanmontt marked this conversation as resolved.
Show resolved Hide resolved
```
"
Class {
Expand All @@ -24,13 +45,6 @@ Class {
{ #category : #api }
AIHellingerDistance >> distanceBetween: firstCollection and: secondCollection [

| numberOfElements |

(numberOfElements := firstCollection size) = secondCollection size
ifFalse: [ self error: 'Distributions must have the same size' ].

^ (1 / 2 sqrt) * ((1 to: numberOfElements)
inject: 0
into: [ :sum :i | sum + ((firstCollection at: i) sqrt - (secondCollection at: i) sqrt) squared.
]) sqrt
^ (firstCollection asPMVector sqrt - secondCollection asPMVector sqrt)
norm / 2 sqrt
]
4 changes: 2 additions & 2 deletions src/AI-EditDistances/AIManhattanDistance.class.st
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"
The Manhattan distance is the absoulte sum off all the distances between coordinates.
The Manhattan distance is defined as the first norm of the difference of two vectors. This is the absoulte sum of the difference of each coordinate.

More info: [Manhattan distance](https://computervision.fandom.com/wiki/Manhattan_distance)
"
Expand All @@ -12,5 +12,5 @@ Class {
{ #category : #api }
AIManhattanDistance >> distanceBetween: anArray and: anotherArray [

^ (anArray - anotherArray ) abs sum
^ (anArray asPMVector - anotherArray asPMVector) firstNorm
]
8 changes: 3 additions & 5 deletions src/AI-EditDistances/AIMinkowskiDistance.class.st
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,10 @@ AIMinkowskiDistance class >> p: anInteger [
{ #category : #api }
AIMinkowskiDistance >> distanceBetween: firstPoint and: secondPoint [

| sum |
sum := 0.
firstPoint with: secondPoint do: [ :x :y |
sum := sum + ((x - y) abs raisedTo: p) ].
| vector |
vector := (firstPoint asPMVector - secondPoint asPMVector) abs raisedTo: p.

^ sum raisedTo: 1 / p
^ vector sum raisedTo: 1 / p
]

{ #category : #accessing }
Expand Down
10 changes: 8 additions & 2 deletions src/BaselineOfAIEditDistances/BaselineOfAIEditDistances.class.st
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,16 @@ Class {
BaselineOfAIEditDistances >> baseline: spec [

<baseline>
spec for: #common do: [ "Packages"
spec for: #common do: [
"External dependencies"
spec
package: 'AI-EditDistances';
baseline: 'AIExternalVectorMatrix'
with: [ spec repository: 'github://pharo-ai/external-dependencies' ].
"Packages"
spec
package: 'AI-EditDistances' with: [ spec requires: #( 'AIExternalVectorMatrix' ) ];
package: 'AI-EditDistances-Tests' with: [ spec requires: #( 'AI-EditDistances' ) ].
"Groups"
spec
group: 'Core' with: #( 'AI-EditDistances' );
group: 'Tests' with: #( 'AI-EditDistances-Tests' );
Expand Down