Are greedy methods such as orthogonal matching pursuit considered obsolete for finding sparse solutions?
If by ''obsolete'' you mean ''researchers are not working on them anymore'', the answer is no - or, at least, I shall hope not! Here are few papers that have been published among the iterative / greedy approaches, that are used to analyze various methods:
- for some new results on orthogonal matching pursuit 2015
- Some results on the efficacy of Graded Hard Thresholding Pursuit (yes, this is shameless advertisement) 2016
- IHT for subexponential measurements 2017
- Some results on hard thresholding 2016
This list is clearly not exhaustive, but represents 4 main papers (in my opinion) that treat of iterative and greedy approaches for sparse approximation. Should be included: all resources related to co-sparse model, dictionary-sparse approaches, one bit approaches & co.
If by ''obsolete'' you mean ''has not practical applications'', well allow me to retort! I have personally used such greedy approaches in the following contexts:
- Approximation of numerical solutions to high dimensional PDEs via sparse polynomial chaos expansions
- Analysis of metagenomics data survey by greedy methods and information theory (this is quite old, needs to be refreshed, and is still pretty much work in progress)
Bottom line, yes! These methods are alive, and are being researched from both a theoretical and a practical point of view!
Below is an example that illustrates a difference between matching pursuit and lasso and demonstrates that they have a different order in which the variables are selected into the active sets.
I believe that this is relevant in certain fields that wish to minimize the size of the active set. For instance
- When developing a model that relates skin-fold measurements with body/fat composition measurements, one may wish to develop a model that reduces the need for the number of skin-fold measurements.
- Or when trying to find a mixture that approximates some original sample, and there is a certain limit to the number of ingredients in the mixture.
I realize that these cases are a bit different than the case in the cited article of the original question, which is about finding in a combination of functions (the question was also initially placed in stats.stackexchange.com and the cases that I provide are more "statistic"). Nevertheless, they may be those cases that place most pressure on the requirement of a small active set and relate to many other practical examples in which matching pursuit or variants (in the example stepAIC) work well (and possibly better, if not in most cases then at least in some cases).
The example:
Requirements are R with the 'lars' package. The data set used for the example are biomedical data related to diabetes.
setup:
> # setting up libraries and data
> library(lars)
> data(diabetes)
> lmdata <- as.data.frame(cbind(diabetes$y, diabetes$x))
matching pursuit:
which will activate variables in the order bmi, ltg, map, tc, sex, ldl
> # matching pursuit
> # (or actually stepAIC but in this case it does do the stepwise addition of the most correlating variable)
> base_model <- lm(V1 ~ 1, data = lmdata)
> stepAIC(base_model, scope = paste0(c("~ 1", colnames(diabetes$x)), collapse=" + "), trace = 0)[13]
$anova
Stepwise Model Path
Analysis of Deviance Table
Initial Model:
V1 ~ 1
Final Model:
V1 ~ bmi + ltg + map + tc + sex + ldl
Step Df Deviance Resid. Df Resid. Dev AIC
1 441 2621009 3841.990
2 + bmi 1 901427.31 440 1719582 3657.697
3 + ltg 1 302887.70 439 1416694 3574.057
4 + map 1 53986.43 438 1362708 3558.884
5 + tc 1 31277.49 437 1331430 3550.621
6 + sex 1 20561.32 436 1310869 3545.742
7 + ldl 1 39377.57 435 1271491 3534.261
lasso:
which will activate variables in the order bmi, ltg, map, hdl, sex, glu...
> # lasso
> lars(diabetes$x, diabetes$y, type="lasso")
Call:
lars(x = diabetes$x, y = diabetes$y, type = "lasso")
R-squared: 0.518
Sequence of LASSO moves:
bmi ltg map hdl sex glu tc tch ldl age hdl hdl
Var 3 9 4 7 2 10 5 8 6 1 -7 7
Step 1 2 3 4 5 6 7 8 9 10 11 12
comparison:
> #comparison of R^2 for a model with 6 variables
> #
> #matching pursuit
> summary(lm(V1 ~ bmi + ltg + map + tc + sex + ldl, data = lmdata))$r.squared
[1] 0.5148848
> #lasso
> summary(lm(V1 ~ bmi + ltg + map + hdl + sex + glu, data = lmdata))$r.squared
[1] 0.5094151
The example clearly shows that lasso has a different order in adding the variables.
This difference may be related to the different priorities and lasso prefers to add variables if this can reduce the sum of the coefficients.
In certain practical cases this behavior of lasso may not be necessary, and in some cases it may even be detrimental.