Nonparametric Regression: Unveiling Hidden Insights in Financial Modeling
The Power of Nonparametric Regression: A Tool for Testing Parametric Models
The typical parametric regression model is something like Y = f(X;θ) + ε, where f is some function which is completely specified except for the adjustable parameters θ, and ε, as usual, is uncorrelated noise. Usually, but not necessarily, people use a function f that is linear in the variables in X, or perhaps includes some interactions between them. However, how can we tell if the specification is right? If, for example, it's a linear model, how can we check whether there might not be some nonlinearity?
One common approach is to modify the specification by adding in specific departures from the modeling assumptions — say, adding a quadratic term — and seeing whether the coefficients that go with those terms are significantly non-zero, or whether the improvement in fit is significant. For example, one might compare the model Y = θ1x1 + θ2x2 + ε to the model Y = θ1x1 + θ2x2 + θ3x22 + ε by checking whether the estimated θ3 is significantly different from 0, or whether the residuals from the second model are significantly smaller than the residuals from the first.
However, this can work, if you have chosen the right nonlinearity to test. It has the power to detect certain mis-specifications, if they exist, but not others. If you have good reasons to think that when the model is wrong, it can only be wrong in certain ways, fine; if not, though, why only check for those errors? Nonparametric regression effectively lets you check for all kinds of systematic errors, rather than singling out a particular one.
The Limitations of Parametric Models
The parametric model is right, it should predict as well as, or even better than, the non-parametric one, and we can check whether MSEp(bθ) − MSEnp(br) is sufficiently small. If the parametric model is right, the non-parametric estimated regression curve should be very close to the parametric one. So we can check whether f(x; bθ)− br(x) is approximately zero everywhere. If the parametric model is right, then its residuals should be patternless and independent of input features, because E[Y −f(x;θ)|X] = E[f(x;θ) + ε−f(x;θ)|X] = E[ε|X] = 0.
However, the parametric model can be mis-specified. Then its predictions are systematically wrong, even with unlimited amounts of data — there's some bias which never goes away, no matter how big the sample. Since the non-parametric smoother does eventually come arbitrarily close to the true regression function, the smoother will end up predicting better than the parametric model. Smaller errors for the smoother, then, suggest that the parametric model is wrong.
Testing Parametric Models with Nonparametric Regression
One approach to testing parametric models is to use nonparametric regression. The basic procedure is as follows: 1. Get data (x1,y1),(x2,y2),...(xn,yn). 2. Fit the parametric model, getting an estimate bθ, and in-sample mean-squared error MSEp(bθ). 3. Fit your favorite nonparametric regression (using cross-validation to pick control settings as necessary), getting curve br and in-sample mean-squared error MSEnp(br). 4. Calculate bt = MSEp(bθ) − MSEnp(br). 5. Simulate from the parametric model bθ to get faked data (x0 1,y0 1),...(x0 n,y0 n). 6. Fit the parametric model to the simulated data, getting estimate ˜θ and MSEp(˜θ). 7. Fit the non-parametric model to the simulated data, getting estimate ˜r and MSEnp(˜r). 8. Calculate ˜T = MSEp(˜θ) − MSEnp(˜r). 9. Repeat steps 5–8 many times to get an estimate of the distribution of T. 10. The p-value is 1+#{˜T > bt} 1+#{T}.
A Practical Example: Testing a Linear Model
Let's see this in action. First, let's detect a reasonably subtle nonlinearity. Take the non-linear function g(x) = log x + 1, and say that Y = g(x) + ε, with ε being IID Gaussian noise with mean 0 and standard deviation 0.15. The nonlinearity is clear with the curve to "guide the eye", but fairly subtle. A simple linear regression looks pretty good, with R2 ridiculously high — the regression line preserves 85% of the variance in the data.
However, the nonparametric regression has a somewhat smaller MSE. So ˆt = 0.0045: > t.hat = mean(glinfit$residual^2) - gnpr$MSE > t.hat [1] 0.004542232 Now we just repeat this a lot to get a good approximation to the sampling distribution of T under the null hypothesis: null.samples.T <- replicate(200,calc.T(sim.lm(glinfit,x))) This takes some time, because each replication involves not just generating a new simulation sample, but also cross-validation to pick a bandwidth.
A Second Example: Testing a Correctly Specified Linear Model
Let's suppose that the linear model is right — then the test should give us a high p-value. So let us stipulate that in reality Y = 0.2 + 0.5x + ε (10.5) with ε⇠N (0,0.152). Figure 10.4 shows data from this, of the same size as before. Repeating the same exercise as before, we get that ˆt = 6.8 ⇥10−4, together with a slightly different null distribution. Now the p-value is 32%, which one would be quite rash to reject.
Remarks
There is nothing inherently wrong with using parametric models. They are often simpler and more interpretable than non-parametric models. However, they can also be mis-specified, and non-parametric regression provides a way to check for this. By comparing the in-sample mean-squared error of a parametric model to that of a non-parametric model, we can get a sense of whether the parametric model is well-specified.
Implementation
How should investors actually apply this knowledge? One approach is to use non-parametric regression as a diagnostic tool for parametric models. By comparing the in-sample mean-squared error of a parametric model to that of a non-parametric model, we can get a sense of whether the parametric model is well-specified.
Conclusion
Non-parametric regression provides a powerful tool for testing parametric models. By comparing the in-sample mean-squared error of a parametric model to that of a non-parametric model, we can get a sense of whether the parametric model is well-specified. This can be particularly useful in finance, where parametric models are often used to make predictions about future returns.