# The Mighty Distace Function – Part 2

So in our previous step, we went over some possible approaches to define a distance function between a point and a set. At the end we saw that calculating distance between the point and the average (mean) point of the set seems to be the best possible solution. I am afraid that it is not always that easy.

For the reason, consider the below figure.

Both points a and a’ have equal distances from the mean point of the set B shown by the filled circle.

Here, we have the usual set B, but instead we have two test points a and a’. Both test points have equal distances from the mean point of set so we have d(a,B) = d(a’,B), but it seems that point a is closer to the set than the other point. In other words a is more similar to B, or a has a higher probability of being a member of B. Therefore, a should have a smaller distance to B compared to a’, but it is not the case, why is that?

The reason can be explained with a measurement called the variance, which represents the spread of the data along a given axis.

Dataset B has a higher variance along the X axis than the Y axis.

In the example given, the data has larger variance over the X-axis than the Y-axis. To solve our problem, we can divide the distance along each axis by the variance of the set along that axis:

$d(a,B) = \sqrt{\frac{d_X(a,\mu(B))^2}{Var_X(B)} + \frac{d_Y(a,\mu(B))^2}{Var_Y(B)}}$,

where $d_X(a,\mu(B))$ is the distance between a and $\mu(B)$ along the X-axis, and $d_Y(a,\mu(B))$ is the distance along the Y-axis. Division by the variance, moves our points into a new space in which the set has equal variances along both X and Y axes.

By normalizing the variance along both axes, it becomes clear that point a is closer to the set than a’.

As can be seen, in this new space we have $d(a,\mu(B)) < d(a',\mu(B))$. We can imagine this new space in two ways, either by squeezing the points along the X-axis, or by stretching them along the Y-axis, until the points in the set have equal variances along both axes.

Now, lets move on to our next problem, given below:

Point a’ has smaller distance with $\mu(B)$ and accordingly to set B, while it seems that point a is more “similar” to set B and therefore should have smaller distance.

Here, a’ is closer to the mean of B compared to a, but it seems that a better follows the pattern of B and therefore should have a smaller distance. The problem can not be related to the variance, as B has equal variance along X and Y axes. So what is the problem?

This problem can be seen similar to the one we saw before, with only a subtle difference. If we rotate the X and Y axes, we observe the same phenomena as the previous problem. As shown below if we rotate the dimensions, we can use the same equation in the new dimension.

We need to rotate the axes and then calculate the variance.

$d(a,B) = \sqrt{\frac{d_{X'}(a,\mu(B))^2}{Var_{X'}(B)} + \frac{d_{Y'}(a,\mu(B))^2}{Var_{Y'}(B)}}$,
where $d_{X'}(a,\mu(B))$ is the distance between a and $\mu(B)$ along the new X-axis, and $d_{Y'}(a,\mu(B))$ is the distance along the new Y-axis.

For the next problem, consider the example given below:

The distance between to test points a and a’ and the set B is calculated. Although the point a’ has less distance with $\mu(B)$ (and therefore B), it seems that point a is more similar to the dataset and therefore should have less distance compared to a’.

In this Figure also, point a’ is closer to the mean, while point a better follows the pattern of the set B, and therefore a should have a smaller distance while it is not the case (using the formulas discussed so far). Here, the variance along the two axes is equal, and no matter how we rotate the axes the problem remains unchanged.

To solve this problem we change the axes in a different way. This time, the value of the points along the new X axis equals $(X-\mu_X(B))^2$ and the new Y values equal $(Y-\mu_Y(B))^2$ 1.

This time we change the axes in a different way.

The new space is the same as the one presented before, and we can define the distance function in the new space by rotating the axes and calculating the variance in the rotated space.

It’s enough for this post. I just wanted to remind you that we are not done yet, so stay tuned for the next post on the mighty distance function.