## Perpendicular Regression Of A Line

```When we perform a regression fit of a straight line to a set of (x,y)
data points we typically minimize the sum of squares of the "vertical"
distance between the data points and the line.  In other words, taking
x as the independent variable, we minimize the sum of squares of the
errors in the dependent variable y.  However, this isn't the only
possible approach.  For example, we might choose to optimize the
"horizontal" distances from the points to the line (i.e., the errors
in the x variable), or the "perpendicular" distances to the line.

If we regard each data point (x,y) as a sample, and if we assume the
sample is taken at the precise value of the independent variable x,
then it is sensible to regard each data point as being at the exactly
correct x coordinate, and all the error is in the sampled value of the
dependent coordinate y.  On the other hand, if there is some uncertainty
in the value of x for each sample, then conceptually it could make
sense to take this into account when performing the regression to get
the "best" fit.  If the distribution of errors in both x and y are
random (e.g., normally distributed) then one might think we could just
sweep up the error in x as just one more contribution to the measured
error in y, so the fitted line should be the same.  However, this is
not generally the case, as can be seen by considering the simple example
of three (x,y) data points (0,0), (10,4), (10,8).  To minimize the sum
of squares of the errors in the y variable, the line must clearly pass
through (0,0) and (10,6), whereas to minimize the sum of squares of the
errors in the x variable the optimum line must be tilted more steeply
and not pass through (0,0).  Similarly if we minimize the sum of squares
of the "perpendicular" distance to the line, we will get still a
different line.

However, the meaning of "perpendicular" is ambiguous because in general
the units of x and y may be different, and so the "angles" of lines in
the abstract "xy plane" do not have any absolute significance.  For
example, if x is time, and y is intensity, we can plot the data points
with different scalings, so there is no unique notion of "perpendicular"
in the time-intensity plane.  In order to make the best fit, we need to
scale the plot axes (conceptually) such that the variances of the errors
in the x and y variables are numerically equal.  Once we have done this,
it makes sense to treat the results as geometrical points and find the
line that minimizes the sum of squares of the perpendicular distances
between the points and the line.  Of course, this requires us to know the
variances of the error distributions.  If we don't, then the "best" line
will be ambiguous.  This is presumably why is it common practice to
simply fit the dependent variable, since we don't have sufficient
information to know, a priori, how the variances of the x and y errors
are related.

If we are given a set of (x,y) data points, and we somehow have sufficient
information to scale them so the distributions of errors in the x and y
variables have equal variances, then we can proceed to fit a line using
"perpendicular" regression.  One way of approaching this is to find the
"principle directions" of the data points.  Let's say we have the suitably
scaled (x,y) coordinates of n data points.  To make it simple, let's first
compute the average of the x values, and the average of the y values,
calling them X and Y respectively.  The point (X,Y) is the centroid of
the set of points.  Then we can subtract X from each of the x values, and
Y from each of the y values, so now we have a list of n data points whose
centroid is (0,0).

To find the principle directions, imagine rotating the entire set of
points about the origin through an angle q.  This sends the point
(x,y) to the point (x',y') where

x'  =   x cos(q) + y sin(q)
y'  =  -x sin(q) + y cos(q)

Now, for any fixed angle q, the sum of the squares of the vertical
heights of the n transformed data points is S = SUM [y']^2, and we
want to find the angle q that minimizes this.  (We can look at this
as rotating the regression line so the perpendicular corresponds to
the vertical.)  To do this, we take the derivative with respect to q
and set it equal to zero.  The derivative of [y']^2 is  2y'(dy'/dq),
so we have

dS/dq = 2 SUM [-x sin(q)+y cos(q)][-x cos(q)-y sin(q)]

We set this to zero, so we can immediately divide out the factor
of 2.  Then, expanding out the product and collecting terms into
separate summations gives

[SUM xy] sin(q)^2  + [SUM (x^2 - y^2)] sin(q)cos(q)

- [SUM xy] cos(q)^2  =  0

Dividing through by cos(q)^2, we get a quadratic equation in tan(q):

{xy}tan(q)^2 + {x^2 - y^2}tan(q) - {xy} = 0

where the "curly braces" indicate that we take the sum of the contents
over all n data points (x,y).  Dividing through by the sum {xy} gives

tan(q)^2  +  A tan(q) - 1  =  0

where A = {x^2-y^2}/{xy}.  Solving this quadratic for tan(q) gives
two solutions, which correspond to the "principle directions", i.e.,
the directions in which the "scatter" is maximum and minimum.  We
want the minimum.

Just to illustrate on a trivial example, suppose we have three data
points (5,5), (6,6), and (7,7).  First compute the centroid, which
is (6,6), and then subtract this from each point to give the new set
of points (-1,-1), (0,0), and (1,1).  Then we can tabulate the sums:

x   y   x^2 - y^2   xy
--- ---  ---------  ----
-1  -1       0        1
0   0       0        0
1   1       0        1
-----    -----
0        2

In this simple example we have {x^2-y^2} = 0 and {xy} = 2, which
means that A = 0, so our equation for the principle directions is
simply
tan(q)^2 - 1 = 0

Thus the two roots are tan(q)=1 and tan(q)=-1, which corresponds
to the angles +45 degrees and -45 degrees.  This makes sense,
because our original data points make a 45 degree line, so if
we rotate them 45 degrees clockwise they are flat, whereas if we
rotate them 45 degrees the other way they are vertically arranged.
These are the two principle directions of this set of 3 points.
The "best" fit through the original three points is a 45 degree
line through the centroid - which is obvious in this trivial
example, but the method works in general with arbitrary sets
of points.

For another example, suppose we have four data points (2,6), (4,2),
(16,8), and (14,12).  The centroid of these points is (9,7), so we
can subtract this from each point to give the new set of points
(-7,-1), (-5,-5), (7,1), and (5,5).  Then we can tabulate the sums:

x    y     xy    x^2 - y^2
---  ---   ----   ---------
-7   -1      7       48
-5   -5     25        0
7    1      7       48
5    5     25        0
----     ----
sums:   64       96

In this case we have {xy} = 64 and {x^2-y^2} = 96, which gives A = 3/2,
so our equation for the principle directions is

tan(q)^2 + (3/2)tan(q) - 1 = 0

The two roots are tan(q) = 1/2 and -2, which correspond to the angles
+26.565 degrees and -63.434 degrees.  This is consistent with the fact
that our original four data points are the vertices of a rectangle
whose edges have the slopes 1/2 and -2.  The "best" fit through these
four points is a line through the centroid  with a slope of 1/2.

(It's interesting that the two quantities which characterize the
points, namely xy and x^2 - y^2, are both hyperbolic conic forms,
and they constitute the invariants of Minkowski spacetime when
expressed in terms of null coordinates and spatio-temporal
coordinates, respectively.)
```