Perpendicular Regression Of A Line

When we perform a regression fit of a straight line to a set of (x,y) 
data points we typically minimize the sum of squares of the "vertical" 
distance between the data points and the line.  In other words, taking
x as the independent variable, we minimize the sum of squares of the
errors in the dependent variable y.  However, this isn't the only 
possible approach.  For example, we might choose to optimize the 
"horizontal" distances from the points to the line (i.e., the errors
in the x variable), or the "perpendicular" distances to the line.

If we regard each data point (x,y) as a sample, and if we assume the
sample is taken at the precise value of the independent variable x,
then it is sensible to regard each data point as being at the exactly
correct x coordinate, and all the error is in the sampled value of the
dependent coordinate y.  On the other hand, if there is some uncertainty
in the value of x for each sample, then conceptually it could make
sense to take this into account when performing the regression to get
the "best" fit.  If the distribution of errors in both x and y are
random (e.g., normally distributed) then one might think we could just
sweep up the error in x as just one more contribution to the measured
error in y, so the fitted line should be the same.  However, this is
not generally the case, as can be seen by considering the simple example
of three (x,y) data points (0,0), (10,4), (10,8).  To minimize the sum
of squares of the errors in the y variable, the line must clearly pass
through (0,0) and (10,6), whereas to minimize the sum of squares of the
errors in the x variable the optimum line must be tilted more steeply
and not pass through (0,0).  Similarly if we minimize the sum of squares
of the "perpendicular" distance to the line, we will get still a
different line.

However, the meaning of "perpendicular" is ambiguous because in general
the units of x and y may be different, and so the "angles" of lines in 
the abstract "xy plane" do not have any absolute significance.  For 
example, if x is time, and y is intensity, we can plot the data points 
with different scalings, so there is no unique notion of "perpendicular"
in the time-intensity plane.  In order to make the best fit, we need to 
scale the plot axes (conceptually) such that the variances of the errors
in the x and y variables are numerically equal.  Once we have done this,
it makes sense to treat the results as geometrical points and find the 
line that minimizes the sum of squares of the perpendicular distances 
between the points and the line.  Of course, this requires us to know the
variances of the error distributions.  If we don't, then the "best" line 
will be ambiguous.  This is presumably why is it common practice to 
simply fit the dependent variable, since we don't have sufficient 
information to know, a priori, how the variances of the x and y errors 
are related.

If we are given a set of (x,y) data points, and we somehow have sufficient 
information to scale them so the distributions of errors in the x and y 
variables have equal variances, then we can proceed to fit a line using
"perpendicular" regression.  One way of approaching this is to find the 
"principle directions" of the data points.  Let's say we have the suitably
scaled (x,y) coordinates of n data points.  To make it simple, let's first 
compute the average of the x values, and the average of the y values, 
calling them X and Y respectively.  The point (X,Y) is the centroid of 
the set of points.  Then we can subtract X from each of the x values, and
Y from each of the y values, so now we have a list of n data points whose 
centroid is (0,0).

To find the principle directions, imagine rotating the entire set of 
points about the origin through an angle q.  This sends the point 
(x,y) to the point (x',y') where

              x'  =   x cos(q) + y sin(q)
              y'  =  -x sin(q) + y cos(q)

Now, for any fixed angle q, the sum of the squares of the vertical 
heights of the n transformed data points is S = SUM [y']^2, and we 
want to find the angle q that minimizes this.  (We can look at this 
as rotating the regression line so the perpendicular corresponds to 
the vertical.)  To do this, we take the derivative with respect to q 
and set it equal to zero.  The derivative of [y']^2 is  2y'(dy'/dq), 
so we have

  dS/dq = 2 SUM [-x sin(q)+y cos(q)][-x cos(q)-y sin(q)]

We set this to zero, so we can immediately divide out the factor 
of 2.  Then, expanding out the product and collecting terms into 
separate summations gives

  [SUM xy] sin(q)^2  + [SUM (x^2 - y^2)] sin(q)cos(q)

         - [SUM xy] cos(q)^2  =  0

Dividing through by cos(q)^2, we get a quadratic equation in tan(q):

    {xy}tan(q)^2 + {x^2 - y^2}tan(q) - {xy} = 0

where the "curly braces" indicate that we take the sum of the contents 
over all n data points (x,y).  Dividing through by the sum {xy} gives

          tan(q)^2  +  A tan(q) - 1  =  0

where A = {x^2-y^2}/{xy}.  Solving this quadratic for tan(q) gives 
two solutions, which correspond to the "principle directions", i.e., 
the directions in which the "scatter" is maximum and minimum.  We 
want the minimum.

Just to illustrate on a trivial example, suppose we have three data 
points (5,5), (6,6), and (7,7).  First compute the centroid, which 
is (6,6), and then subtract this from each point to give the new set 
of points (-1,-1), (0,0), and (1,1).  Then we can tabulate the sums:

            x   y   x^2 - y^2   xy
           --- ---  ---------  ----
           -1  -1       0        1
            0   0       0        0
            1   1       0        1
                      -----    -----
                        0        2

In this simple example we have {x^2-y^2} = 0 and {xy} = 2, which
means that A = 0, so our equation for the principle directions is 
                   tan(q)^2 - 1 = 0

Thus the two roots are tan(q)=1 and tan(q)=-1, which corresponds 
to the angles +45 degrees and -45 degrees.  This makes sense,
because our original data points make a 45 degree line, so if 
we rotate them 45 degrees clockwise they are flat, whereas if we 
rotate them 45 degrees the other way they are vertically arranged.
These are the two principle directions of this set of 3 points.  
The "best" fit through the original three points is a 45 degree 
line through the centroid - which is obvious in this trivial
example, but the method works in general with arbitrary sets 
of points.

For another example, suppose we have four data points (2,6), (4,2), 
(16,8), and (14,12).  The centroid of these points is (9,7), so we 
can subtract this from each point to give the new set of points 
(-7,-1), (-5,-5), (7,1), and (5,5).  Then we can tabulate the sums:

                x    y     xy    x^2 - y^2
               ---  ---   ----   ---------
               -7   -1      7       48
               -5   -5     25        0
                7    1      7       48
                5    5     25        0
                          ----     ----
                   sums:   64       96

In this case we have {xy} = 64 and {x^2-y^2} = 96, which gives A = 3/2, 
so our equation for the principle directions is

              tan(q)^2 + (3/2)tan(q) - 1 = 0

The two roots are tan(q) = 1/2 and -2, which correspond to the angles
+26.565 degrees and -63.434 degrees.  This is consistent with the fact
that our original four data points are the vertices of a rectangle
whose edges have the slopes 1/2 and -2.  The "best" fit through these 
four points is a line through the centroid  with a slope of 1/2.

(It's interesting that the two quantities which characterize the 
points, namely xy and x^2 - y^2, are both hyperbolic conic forms, 
and they constitute the invariants of Minkowski spacetime when 
expressed in terms of null coordinates and spatio-temporal 
coordinates, respectively.)

Return to MathPages Main Menu