So in our previous step, we went over some possible approaches to define a distance function between a point and a set. At the end we saw that calculating distance between the point and the average (mean) point of the set seems to be the best possible solution. I am afraid that it is not always that easy.

For the reason, consider the below figure.

Here, we have the usual set *B*, but instead we have two test points *a* and *a’*. Both test points have equal distances from the mean point of set so we have *d(a,B) = d(a’,B)*, but it seems that point *a* is closer to the set than the other point. In other words *a* is more similar to *B*, or *a* has a higher probability of being a member of *B*. Therefore, *a* should have a smaller distance to *B* compared to *a’*, but it is not the case, why is that?

The reason can be explained with a measurement called the variance, which represents the spread of the data along a given axis.

In the example given, the data has larger variance over the *X*-axis than the *Y*-axis. To solve our problem, we can divide the distance along each axis by the variance of the set along that axis:

,

where is the distance between *a* and along the *X*-axis, and is the distance along the *Y*-axis. Division by the variance, moves our points into a new space in which the set has equal variances along both *X* and *Y* axes.

As can be seen, in this new space we have . We can imagine this new space in two ways, either by squeezing the points along the *X*-axis, or by stretching them along the *Y*-axis, until the points in the set have equal variances along both axes.

Now, lets move on to our next problem, given below:

Here, *a’* is closer to the mean of *B* compared to *a*, but it seems that *a* better follows the pattern of *B* and therefore should have a smaller distance. The problem can not be related to the variance, as *B* has equal variance along *X* and *Y* axes. So what is the problem?

This problem can be seen similar to the one we saw before, with only a subtle difference. If we rotate the *X* and *Y* axes, we observe the same phenomena as the previous problem. As shown below if we rotate the dimensions, we can use the same equation in the new dimension.

,

where is the distance between *a* and along the new *X*-axis, and is the distance along the new *Y*-axis.

For the next problem, consider the example given below:

In this Figure also, point *a’* is closer to the mean, while point *a* better follows the pattern of the set *B*, and therefore *a* should have a smaller distance while it is not the case (using the formulas discussed so far). Here, the variance along the two axes is equal, and no matter how we rotate the axes the problem remains unchanged.

To solve this problem we change the axes in a different way. This time, the value of the points along the new *X* axis equals and the new *Y* values equal ^{1}.

The new space is the same as the one presented before, and we can define the distance function in the new space by rotating the axes and calculating the variance in the rotated space.

It’s enough for this post. I just wanted to remind you that we are not done yet, so stay tuned for the next post on the mighty distance function.