Bootstrapping function without model selection using CReSS/SALSA for fitting the second stage count model
Source:R/do.bootstrap.cress.R
do.bootstrap.cress.Rd
This fuction performs a specified number of bootstrapping iterations using CReSS/SALSA for fitting the second stage count model. See below for details.
Usage
do.bootstrap.cress(
orig.data,
predict.data,
ddf.obj = NULL,
model.obj,
splineParams,
g2k,
resample = "transect.id",
rename = "segment.id",
stratum = NULL,
B,
name = NULL,
save.data = FALSE,
nhats = FALSE,
seed = 12345,
nCores = 1
)
Arguments
- orig.data
The original data. In case
ddf.obj
is specified, this should be the original distance data. In caseddf.obj
isNULL
, it should have the format equivalent tocount.data
where each record represents the summed up counts at the segments.- predict.data
The prediction grid data
- ddf.obj
The ddf object created for the best fitting detection model. Defaults to
NULL
for when nodetection function object available.- model.obj
The best fitting
CReSS
model for the original count data. Should be geeglm or a Poisson/Binomial GLM (not quasi).- splineParams
The object describing the parameters for fitting the one and two dimensional splines
- g2k
(N x k) matrix of distances between all prediction points (N) and all knot points (k)
- resample
Specifies the resampling unit for bootstrapping, default is
transect.id
. Must match a column name indis.data
exactly- rename
A vector of column names for which a new column needs to be created for the bootstrapped data. This defaults to
segment.id
for line transects (which is required forcreate.bootcount.data
), others might be added. A new column with new ids will automatically be created for the column listed inresample
. In case of nearshore data, this argument is ignored.- stratum
The column name in
orig.data
that identifies the different strata. The defaultNULL
returns un-stratified bootstrap data. In case of nearshore data, this argument is ignored.- B
Number of bootstrap iterations
- name
Analysis name. Required to avoid overwriting previous bootstrap results. This name is added at the beginning of "predictionboot.RData" when saving bootstrap predictions.
- save.data
If TRUE, all created bootstrap data will be saved as an RData object in the working directory at each iteration, defaults to FALSE
- nhats
(default = FALSE). If you have calculated bootstrap NHATS because there is no simple ddf object then a matrix of these may be fed into the function. The number of columns of data should >= B. The rows must be equal to those in
orig.data
andd2k
and must be in matching order.- seed
Set the seed for the bootstrap sampling process.
- nCores
Set the number of computer cores for the bootstrap process to use (default = 1). The more cores the faster the proces but be wary of over using the cores on your computer. If
nCores
> (number of computer cores - 2), the function defaults tonCores
= (number of computer cores - 2). Note: On a Mac computer the parallel code does not compute so use nCores=1.
Value
The function returns a matrix of bootstrap predictions. The number of rows is equal to the number of rows in predict.data. The number of columns is equal to B
. The matrix may be very large and so is stored directly into the working directory as a workspace object: '"name"predictionboot.RObj'. The object inside is called bootPreds
.
Details
In case of distance sampling data, the following steps are performed for each iteration:
the original data is bootstrapped
a detection function is fitted to the bootstrapped data
a count model is fitted to the bootstrapped data
coefficients are resampled from a multivariate normal distribution defined by MLE and COV from count model
predictions are made to the prediction data using the resampled coefficients
In case of count data, the following steps are performed for each iteration:
coefficients are resampled from a multivariate normal distribution defined by MLE and COV from the best fitting count model
predictions are made to the prediction data using the resampled coefficients
Examples
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# offshore redistribution data
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
data(dis.data.re)
data(predict.data.re)
data(knotgrid.off)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# distance sampling
dis.data.re$survey.id<-paste(dis.data.re$season,dis.data.re$impact,sep="")
result<-ddf(dsmodel=~mcds(key="hn", formula=~1), data=dis.data.re, method="ds",
meta.data=list(width=250))
#> Error in ddf(dsmodel = ~mcds(key = "hn", formula = ~1), data = dis.data.re, method = "ds", meta.data = list(width = 250)): could not find function "ddf"
dis.data.re<-create.NHAT(dis.data.re,result)
#> Error in which(data$distance <= ddf.obj$meta.data$width): object 'result' not found
count.data<-create.count.data(dis.data.re)
#> Error in `$<-.data.frame`(`*tmp*`, "NHAT", value = numeric(0)): replacement has 0 rows, data has 10951
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# spatial modelling
splineParams<-makesplineParams(data=count.data, varlist=c('depth'))
#> Error in makesplineParams(data = count.data, varlist = c("depth")): object 'count.data' not found
#set some input info for SALSA
count.data$response<- count.data$NHAT
#> Error in eval(expr, envir, enclos): object 'count.data' not found
# make distance matrices for datatoknots and knottoknots
distMats<-makeDists(cbind(count.data$x.pos, count.data$y.pos), na.omit(knotgrid.off))
#> Error in cbind(count.data$x.pos, count.data$y.pos): object 'count.data' not found
# choose sequence of radii
r_seq<-getRadiiChoices(8,distMats$dataDist)
#> Error in getRadiiChoices(8, distMats$dataDist): object 'distMats' not found
# set initial model without the spatial term
initialModel<- glm(response ~ as.factor(season) + as.factor(impact) + offset(log(area)),
family='quasipoisson', data=count.data)
#> Error in is.data.frame(data): object 'count.data' not found
# make parameter set for running salsa2d
salsa2dlist<-list(fitnessMeasure = 'QICb', knotgrid = knotgrid.off,
knotdim=c(26,14), startKnots=4, minKnots=4,
maxKnots=20, r_seq=r_seq, gap=4000, interactionTerm="as.factor(impact)")
#> Error in eval(expr, envir, enclos): object 'r_seq' not found
salsa2dOutput_k6<-runSALSA2D(initialModel, salsa2dlist, d2k=distMats$dataDist,
k2k=distMats$knotDist, splineParams=splineParams)
#> Error in runSALSA2D(initialModel, salsa2dlist, d2k = distMats$dataDist, k2k = distMats$knotDist, splineParams = splineParams): object 'initialModel' not found
splineParams<-salsa2dOutput_k6$splineParams
#> Error in eval(expr, envir, enclos): object 'salsa2dOutput_k6' not found
# specify parameters for local radial function:
radiusIndices <- splineParams[[1]]$radiusIndices
#> Error in eval(expr, envir, enclos): object 'splineParams' not found
dists <- splineParams[[1]]$dist
#> Error in eval(expr, envir, enclos): object 'splineParams' not found
radii <- splineParams[[1]]$radii
#> Error in eval(expr, envir, enclos): object 'splineParams' not found
aR <- splineParams[[1]]$invInd[splineParams[[1]]$knotPos]
#> Error in eval(expr, envir, enclos): object 'splineParams' not found
count.data$blockid<-paste(count.data$transect.id, count.data$season, count.data$impact, sep='')
#> Error in paste(count.data$transect.id, count.data$season, count.data$impact, sep = ""): object 'count.data' not found
# Re-fit the chosen model as a GEE (based on SALSA knot placement) and GEE p-values
geeModel<- geeglm(formula(salsa2dOutput_k6$bestModel), data=count.data, family=poisson, id=blockid)
#> Error in geeglm(formula(salsa2dOutput_k6$bestModel), data = count.data, family = poisson, id = blockid): could not find function "geeglm"
dists<-makeDists(cbind(predict.data.re$x.pos, predict.data.re$y.pos), na.omit(knotgrid.off),
knotmat=FALSE)$dataDist
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# bootstrapping
do.bootstrap.cress(dis.data.re, predict.data.re, ddf.obj=result, geeModel, splineParams,
g2k=dists, resample='transect.id', rename='segment.id', stratum='survey.id',
B=4, name="cress", save.data=FALSE, nhats=FALSE, nCores=1)
#> Loading required package: Matrix
#> Loading required package: mvtnorm
#> Error in do.bootstrap.cress(dis.data.re, predict.data.re, ddf.obj = result, geeModel, splineParams, g2k = dists, resample = "transect.id", rename = "segment.id", stratum = "survey.id", B = 4, name = "cress", save.data = FALSE, nhats = FALSE, nCores = 1): object 'result' not found
load("cresspredictionboot.RData") # loading the bootstrap predictions into the workspace
#> Warning: cannot open compressed file 'cresspredictionboot.RData', probable reason 'No such file or directory'
#> Error in readChar(con, 5L, useBytes = TRUE): cannot open the connection
# look at the first 6 lines of the bootstrap predictions (on the scale of the response)
head(bootPreds)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'bootPreds' not found
if (FALSE) {
# In parallel (Note: windows machines only)
require(parallel)
do.bootstrap.cress(dis.data.re, predict.data.re, ddf.obj=result, geeModel, splineParams,
g2k=dists, resample='transect.id', rename='segment.id', stratum='survey.id',
B=4, name="cress", save.data=FALSE, nhats=FALSE, nCores=4)
load("cresspredictionboot.RData") # loading the bootstrap predictions into the workspace
# look at the first 6 lines of the bootstrap predictions (on the scale of the response)
head(bootPreds)
}
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# nearshore redistribution data
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
if (FALSE) {
do.bootstrap.cress(ns.data.re, ns.predict.data.re, ddf.obj=NULL, geeModel, splineParams,
g2k=dists, resample='transect.id', rename='segment.id', stratum=NULL,
B=2, name="cress", save.data=FALSE, nhats=FALSE)
load("cresspredictionboot.RData") # loading the predictions into the workspace
# look at the first 6 lines of the bootstrap predictions (on the scale of the response)
head(bootPreds)}