Running SALSA for continuous one-dimensional covariates.

This function finds spatially adaptive knot locations for one or more continuous one-dimensional covariates.

Usage

runSALSA1D(
  initialModel,
  salsa1dlist,
  varlist,
  factorlist = NULL,
  predictionData = NULL,
  varlist_cyclicSplines = NULL,
  splineParams = NULL,
  datain,
  removal = FALSE,
  panelid = NULL,
  suppress.printout = FALSE,
  logfile = FALSE
)

Arguments

initialModel: The best fitting CReSS model with no continuous covariates specified. This must be a model of class glm.
salsa1dlist: Vector of objects required for runSALSA1D: fitnessMeasure, minKnots_1d, maxKnots_1d, startKnots_1d degree, maxIterations gap.
varlist: Vector of variable names for the covariates required for knot selection
factorlist: vector of factor variables specified in initialModel. Specified so that a check can be made that there are non-zero counts in all levels of each factor. Uses the function checkfactorlevelcounts. Default setting is NULL.
predictionData: The data for which predictions are to be made. Column names must correspond to the data in initialModel. If predictionData is not specified (NULL), then the range of the data is used to create the smooth terms.
splineParams: List object containing information for fitting splines to the covariates in varlist. If not specified (NULL) this object is created and returned. See makesplineParams for details.
datain: Data used to fit the initial Model.
removal: (Default: FALSE). Logical stating whether a selection procedure should be done to choose smooth, linear or removal of covariates. If FALSE all covariates are returned and smooth. If TRUE then cross-validation is used to make model selection choices. The folds are specified by a column in the dataset called foldid.
panelid: Vector denoting the panel identifier for each data point (if robust standard errors are to be calculated). Defaults to data order index if not given.
suppress.printout: (Default: FALSE. Logical stating whether to show the analysis printout.
logfile: (Default: FALSE). Logical stating whether to store a log file of the analysis printout.

Value

A list object is returned containing 4 elements:

bestModel

A model object of class gam.MRSea from the best model fitted

modelFits1D

A list object with an element for each new term fitted to the model. The first element is a model fitted with a knot at the mean for each of the covariates (startmodel) in varlist. Within the first element, the current fit and formula of the start model.

The second element is the result of SALSA on the first term in varlist. Within this element:

term: term of interest
kept: Statement of whether the term is kept in the model (yes- initial knots, yes - new knots, yes -linear or no)
basemodelformula: the resulting model formula. If kept=yes or kept=linear then the term of interest is included in the model otherwise it is removed.
knotSelected: the knots chosen for the term of interest (NA if term removed or linear)
baseModelFits: fit statistics for the resulting formula
modelfits: fit statistics for the model with the term included (same as resulting formula if kept=yes)

This continues till all covariates in varlist have been through SALSA.

fitstat

The final fit statistic of bestModel. The type of statistic was specified in salsa1dlist.

keptvarlist

The covariates from varlist that have been retained in the model

Details

There must be columns called response (response variable) and foldid (for cross-validation calculation) in the data used in the initial model to be fitted. If the data is proportion, then there should be two columns called successess and failures.

The object salsa1dlist contains parameters for the runSALSA1D function.

fitnessMeasure. The criterion for selecting the `best' model. Available options: AIC, AIC_c, BIC, QIC_b, cv.gamMRSea (use cv.opts in salsa1dlist to specify seed, folds, cost function (Defaults: cv.opts=list(cv.gamMRSea.seed=357, K=10, cost=function(y, yhat) mean((y - yhat)^2))).

minKnots_1d. Minimum number of knots to be tried.

maxKnots_1d. Maximum number of knots to be tried.

startKnots_1d. Starting number of knots (spaced at quantiles of the data).

degree. The degree of the B-spline. Does not need to be specified if splineParams is a parameter in runSALSA1D.

maxIterations.The exchange/improve steps will terminate after maxIterations if still running.

gaps. The minimum gap between knots (in unit of measurement of explanatory), usually set to zero.

splines. Specify the spline basis for each term. Choose one of "bs" (B-spline), "cc" (cyclic-cubic) or "ns" (natural spline).

minKnots_1d, maxKnots_1d, startKnots_1d and gaps are vectors the same length as varlist. This enables differing values of these parameters for each covariate.

The initial model contains all the factor level covariates and any covariates of interest that are not specified in the varlist argument of runSALSA1D

Note: The algorithm may remove variables in varlist but not the variables in factorlist. If there is no better model than with a knot at the mean, the output will include that covariate with a knot at the mean. The best model with a given smooth term is tested both against a model with the term as linear or removed. Cross-Validation is used in the selection process.

References

Walker, C.; M. Mackenzie, C. Donovan and M. O'Sullivan. SALSA - a Spatially Adaptive Local Smoothing Algorithm. Journal of Statistical Computation and Simulation, 81(2):179-191, 2010

Author

Lindesay Scott-Hayward, University of St Andrews; Cameron Walker, Department of Engineering Science, University of Auckland.

Examples

# load data
data(ns.data.re)
# load prediction data
data(ns.predict.data.re)

varlist=c('DayOfMonth')


# set initial model without the spline terms in there 
# (so all other non-spline terms)
ns.data.re$response<- ns.data.re$birds
initialModel<- glm(response ~ 1 + offset(log(area)), 
                    family='quasipoisson',data=ns.data.re)

#set some input info for SALSA
salsa1dlist<-list(fitnessMeasure = 'QBIC', 
                  minKnots_1d=c(1), 
                  maxKnots_1d = c(3), 
                  startKnots_1d = c(1), 
                  degree=c(2),
                  gaps=c(0))

# run SALSA
salsa1dOutput<-runSALSA1D(initialModel = initialModel,
                         salsa1dlist = salsa1dlist,
                         varlist = varlist,
                         predictionData = ns.predict.data.re,
                         datain = ns.data.re)
#> Loading required package: MuMIn
#> [1] "initialDispersion 20.4447417001531"
#> [1] "Initialising..."
#> Initial fit =  9591.768 13 
#> [1] "initialisation complete..."
#> [1] "^^^^^^^^^^^Initial^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^"
#> [1] 9591.768
#> [1] "Exchanging..."
#> [1] "Locating maximum residual......"
#> [1] "Maximum residual found..."
#> [1] "Moving knot..."
#> [1] "Knot moved..."
#> [1] "Exchanging done..."
#> [1] "Improving..."
#> [1] "Shifting up..."
#> [1] "Up done..."
#> [1] "Shifting up..."
#> [1] "Up done..."
#> [1] "Shifting down..."
#> [1] "Down done..."
#> [1] "Improving complete..."
#> [1] "^^^^^^^^^^^Improve^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^"
#> [1] "Exchanging..."
#> [1] "Locating maximum residual......"
#> [1] "Maximum residual found..."
#> [1] "Moving knot..."
#> [1] "Knot moved..."
#> [1] "Exchanging done..."
#> [1] "Improving..."
#> [1] "Shifting up..."
#> [1] "Up done..."
#> [1] "Shifting down..."
#> [1] "Down done..."
#> [1] "Improving complete..."
#> [1] "And we're done..."