This function finds spatially adaptive knot locations for one or more continuous one-dimensional covariates.
Usage
runSALSA1D(
initialModel,
salsa1dlist,
varlist,
factorlist = NULL,
predictionData = NULL,
varlist_cyclicSplines = NULL,
splineParams = NULL,
datain,
removal = FALSE,
panelid = NULL,
suppress.printout = FALSE,
logfile = FALSE
)
Arguments
- initialModel
The best fitting
CReSS
model with no continuous covariates specified. This must be a model of classglm
.- salsa1dlist
Vector of objects required for
runSALSA1D
:fitnessMeasure
,minKnots_1d
,maxKnots_1d
,startKnots_1d
degree
,maxIterations
gap
.- varlist
Vector of variable names for the covariates required for knot selection
- factorlist
vector of factor variables specified in
initialModel
. Specified so that a check can be made that there are non-zero counts in all levels of each factor. Uses the functioncheckfactorlevelcounts
. Default setting is NULL.- predictionData
The data for which predictions are to be made. Column names must correspond to the data in
initialModel
. If predictionData is not specified (NULL
), then the range of the data is used to create the smooth terms.- splineParams
List object containing information for fitting splines to the covariates in
varlist
. If not specified (NULL
) this object is created and returned. SeemakesplineParams
for details.- datain
Data used to fit the initial Model.
- removal
(Default:
FALSE
). Logical stating whether a selection procedure should be done to choose smooth, linear or removal of covariates. IfFALSE
all covariates are returned and smooth. IfTRUE
then cross-validation is used to make model selection choices. The folds are specified by a column in the dataset calledfoldid
.- panelid
Vector denoting the panel identifier for each data point (if robust standard errors are to be calculated). Defaults to data order index if not given.
- suppress.printout
(Default:
FALSE
. Logical stating whether to show the analysis printout.- logfile
(Default:
FALSE
). Logical stating whether to store a log file of the analysis printout.
Value
A list object is returned containing 4 elements:
- bestModel
A model object of class
gam.MRSea
from the best model fitted- modelFits1D
A list object with an element for each new term fitted to the model. The first element is a model fitted with a knot at the mean for each of the covariates (startmodel) in
varlist
. Within the first element, the current fit and formula of the start model.The second element is the result of SALSA on the first term in
varlist
. Within this element:term
: term of interestkept
: Statement of whether the term is kept in the model (yes- initial knots, yes - new knots, yes -linear or no)basemodelformula
: the resulting model formula. Ifkept=yes
orkept=linear
then the term of interest is included in the model otherwise it is removed.knotSelected
: the knots chosen for the term of interest (NA if term removed or linear)baseModelFits
: fit statistics for the resulting formulamodelfits
: fit statistics for the model with the term included (same as resulting formula ifkept=yes
)
This continues till all covariates in
varlist
have been through SALSA.- fitstat
The final fit statistic of
bestModel
. The type of statistic was specified insalsa1dlist
.- keptvarlist
The covariates from
varlist
that have been retained in the model
Details
There must be columns called response
(response variable) and foldid
(for cross-validation calculation) in the data used in the initial model to be fitted. If the data is proportion, then there should be two columns called successess
and failures
.
The object salsa1dlist
contains parameters for the runSALSA1D
function.
fitnessMeasure
. The criterion for selecting the `best' model. Available options: AIC, AIC_c, BIC, QIC_b, cv.gamMRSea (use cv.opts in salsa1dlist to specify seed, folds, cost function (Defaults: cv.opts=list(cv.gamMRSea.seed=357, K=10, cost=function(y, yhat) mean((y - yhat)^2))
).
minKnots_1d
. Minimum number of knots to be tried.
maxKnots_1d
. Maximum number of knots to be tried.
startKnots_1d
. Starting number of knots (spaced at quantiles of the data).
degree
. The degree of the B-spline. Does not need to be specified if splineParams
is a parameter in runSALSA1D
.
maxIterations
.The exchange/improve steps will terminate after maxIterations if still running.
gaps
. The minimum gap between knots (in unit of measurement of explanatory), usually set to zero.
splines
. Specify the spline basis for each term. Choose one of "bs" (B-spline), "cc" (cyclic-cubic) or "ns" (natural spline).
minKnots_1d
, maxKnots_1d
, startKnots_1d
and gaps
are vectors the same length as varlist
. This enables differing values of these parameters for each covariate.
The initial model contains all the factor level covariates and any covariates of interest that are not specified in the varlist
argument of runSALSA1D
Note: The algorithm may remove variables in varlist
but not the variables in factorlist
. If there is no better model than with a knot at the mean, the output will include that covariate with a knot at the mean. The best model with a given smooth term is tested both against a model with the term as linear or removed. Cross-Validation is used in the selection process.
References
Walker, C.; M. Mackenzie, C. Donovan and M. O'Sullivan. SALSA - a Spatially Adaptive Local Smoothing Algorithm. Journal of Statistical Computation and Simulation, 81(2):179-191, 2010
Author
Lindesay Scott-Hayward, University of St Andrews; Cameron Walker, Department of Engineering Science, University of Auckland.
Examples
# load data
data(ns.data.re)
# load prediction data
data(ns.predict.data.re)
varlist=c('DayOfMonth')
# set initial model without the spline terms in there
# (so all other non-spline terms)
ns.data.re$response<- ns.data.re$birds
initialModel<- glm(response ~ 1 + offset(log(area)),
family='quasipoisson',data=ns.data.re)
#set some input info for SALSA
salsa1dlist<-list(fitnessMeasure = 'QBIC',
minKnots_1d=c(1),
maxKnots_1d = c(3),
startKnots_1d = c(1),
degree=c(2),
gaps=c(0))
# run SALSA
salsa1dOutput<-runSALSA1D(initialModel = initialModel,
salsa1dlist = salsa1dlist,
varlist = varlist,
predictionData = ns.predict.data.re,
datain = ns.data.re)
#> Loading required package: MuMIn
#> [1] "initialDispersion 20.4447417001531"
#> [1] "Initialising..."
#> Initial fit = 9591.768 13
#> [1] "initialisation complete..."
#> [1] "^^^^^^^^^^^Initial^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^"
#> [1] 9591.768
#> [1] "Exchanging..."
#> [1] "Locating maximum residual......"
#> [1] "Maximum residual found..."
#> [1] "Moving knot..."
#> [1] "Knot moved..."
#> [1] "Exchanging done..."
#> [1] "Improving..."
#> [1] "Shifting up..."
#> [1] "Up done..."
#> [1] "Shifting up..."
#> [1] "Up done..."
#> [1] "Shifting down..."
#> [1] "Down done..."
#> [1] "Improving complete..."
#> [1] "^^^^^^^^^^^Improve^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^"
#> [1] "Exchanging..."
#> [1] "Locating maximum residual......"
#> [1] "Maximum residual found..."
#> [1] "Moving knot..."
#> [1] "Knot moved..."
#> [1] "Exchanging done..."
#> [1] "Improving..."
#> [1] "Shifting up..."
#> [1] "Up done..."
#> [1] "Shifting down..."
#> [1] "Down done..."
#> [1] "Improving complete..."
#> [1] "And we're done..."