LINFIT

Purpose:

Performs general least squares fitting (multiple linear regression) of arbitrary basis functions.

Syntax:

LINFIT(A, y, "method")

A

-

An input array, the basis function design matrix.

y

-

An input series, the Y output.

"method"

-

Optional. A string, the solution method.

 

 

"qr"

:

QR decomposition (default)

"norm"

:

solve normal equations directly

"svd"

:

singular value decomposition

Returns:

A series, the basis function coefficients.

Example:

x = {-1, 1, 1.5, 3};

y = {6, 2, 2.25, 6};

 

c1 = polyfit(y, x, 2);

c2 = linfit(ravel(ones(length(x),1), x, x^2), y);

 

c1 == c2 == {3, -2, 1};

 

The first method uses POLYFIT to fit a quadratic polynomial to the data points. As shown, LINFIT produces the same result. The fitted equation becomes:

 

image\svddiv27.gif

Example:

W1: grand(100,.01)

W2: 5*W1 + 3*sin(W1) - 2*exp(W1*W1)

W3: ravel(W1, sin(W1), exp(W1*W1))

W4: linfit(W3, W2)

 

W4 == {5, 3, -2}, the coefficients of W2.

 

In this example, we wish to find the coefficients of the relation:

 

image\svddiv01.gif

 

Notice that although each term is not necessarily linear, the equation as a whole is a linear combination of terms. LINFIT solves the set of simultaneous equations to yield:

 

image\svddiv02.gif

Example:

x1 = gnorm(100, 1);

x2 = gsin(100, 1, 1/10);

x3 = 1..100;

x4 = exp((1..100)/100);

y = 3 * x1 + 10 * x2 - x3 + 30 * x4;

 

d1 = ravel(x1, x2, x3);

d2 = ravel(x1, x2, x3, x4);

 

W1: linfit(d1, y);label("Coef 1")

W2: linfit(d2, y);label("Coef 2")

W3: d1 *^ W1;label("Fit 1")

W4: d2 *^ W2;label("Fit 2")

W5: {norm(y W3)^2};label("Chi^2 1")

W6: {norm(y W4)^2};label("Chi^2 2")

 

In this example, the data is modeled as a function of three unrelated input series and also as a function of the same three input series plus a fourth series. We wish to find the coefficients of the relations:

 

image\svddiv34.gif

 

W1 and W2 compute the least squares coefficients while W3 and W3 compute the actual fit. The fitting error for each result is displayed in W5 and W6. Because the 2nd fit includes all the components of the input data, the fit error is negligible. This form of least squares fitting is often referred to as multiple linear regression.

Remarks:

Consider the problem of fitting a set of data points to a model that is a linear combination of functions, called basis functions:

 

image\svddiv03.gif

 

The basis functions themselves are not required to be linear, only the dependence on the coefficients ck is linear.

 

By employing the method of least squares, we wish to find the coefficients that minimize the sum of the squares of the differences between the data points and the fitted results:

 

image\svddiv04.gif

 

Minimization is achieved by taking the partial derivatives with respect to each coefficient and setting the result to 0. For real basis functions this yields:

 

image\svddiv05.gif

 

Thus, for each basis function Xk , we have:

 

image\svddiv06.gif

 

where N is the number of data points and M is the number of basis functions. We define the NxM design matrix:

 

image\svddiv07.gif

 

where the design matrix has the form:

 

image\svddiv08.gif

 

Each column of the design matrix is a basis function evaluated at the known x values.

 

We also define the output vector y of length N as:

 

image\svddiv09.gif

 

With these matrix definitions, we have:

 

image\svddiv10.gif

 

and in matrix form, the least squares problem becomes:

 

image\svddiv11.gif

 

where c is the vector of basis function coefficients. This formulation of the least squares problem is known as the normal equations.

 

LINFIT(A, y, "norm") solves for the normal equations directly using CROSSPROD. However, the normal equations are often ill-conditioned and solving for the coefficients directly can be susceptible to roundoff errors.

 

To investigate an alternative solution to the least squares problem, notice that the original basis function relation:

 

image\svddiv12.gif

 

can be written in matrix form as:

 

image\svddiv14.gif

 

With our matrix definitions, we restate the least squares criteria as finding the coefficient vector c that minimizes:

 

image\svddiv24.gif

 

The quantity is minimized to 0 when:

 

image\svddiv13.gif

 

i.e. the basis function relation. This system of equations is over determined since N, the number of data points, must be greater than M, the number of basis functions.

 

Using QR decomposition, matrix A is expressed as the product of an NxN orthogonal matrix Q and a NxM matrix R where R is an upper triangular matrix such that the last N-M rows contain zeros.

 

image\svddiv15.gif

 

where I is the identity matrix. Substituting into the basis function relation and multiplying by QT:

 

image\svddiv17.gif

 

Recall that R is an upper triangular matrix of the form:

 

image\svddiv18.gif

 

where Θ is a (N-M)xM matrix of zeros. By making the following partition:

 

image\svddiv20.gif

 

such that q1 is Mx1 and q2 is (N-M)x1, the least squares problem can be solved by:

 

image\svddiv23.gif

 

This upper triangular system can be solved for c using simple back substitution.

 

LINFIT(A, y, "qr") or LINFIT(A, y) uses QR decomposition to solve the over-determined set of simultaneous equations as specified by A and y to calculate the basis function coefficients c that minimize the least squares error between the basis functions and the data values.

 

Singular value decomposition, SVD, decomposes the design matrix A into three matrix components, U, W and V such that:

 

image\svddiv28.gif

 

where U is an orthonormal matrix that contains the left singular values, W is a diagonal matrix that contains the weighting factor or singular values and V is a orthonormal matrix that contains the right singular values.

 

image\svddiv29.gif

 

With this decomposition, the least squares problem can be solved with:

 

image\svddiv31.gif

 

where wj (the singular values) are the diagonal values of matrix W.

 

LINFIT(A, y, "svd") uses SVD, singular value decomposition to solve the over-determined set of simultaneous equations as specified by A and y. The method is usually slower than QR decomposition, but can be more stable for severely ill-conditioned systems.

 

See [1] for further details.

 

See DADiSP/MatrixXL to significantly optimize LINFIT.

See Also:

*^ (Matrix Multiply)

\^ (Matrix Solve)

CROSSPROD

DADiSP/MatrixXL

POLYFIT

QR

SVD

SVDDIV

References:

[1] Press, Flannery, Teukolsky, Vetterling

      Numerical Recipes in C

      2nd Edition

      Cambridge Press, 1988, 1992

      pp. 671-675