mcmcensemble 3.0.0: new way of specifying inits values • mcmcensemble

In previous mcmcensemble versions, the starting values of the various were drawn from an uniform distribution whose bounds were determined by the values of lower.inits and upper.inits.

In this new version, the user is in charge of providing a matrix (or a data.frame) containing all the starting values for all the parameters for all the chains. If this increases somewhat the complexity of the package and the workload of the user, it has huge benefits that we will present now.

library(mcmcensemble)
packageVersion("mcmcensemble")

## [1] '3.2.0.9000'

Sample from a non-uniform distribution

The main drawback from the previous behaviour is that it was not possible to sample the initial values from something else than a uniform distribution. You can now do this. If we take the ‘banana’ example from the README and want to start with values samples from a normal distribution centered on 0 with a standard deviation of 2, we do:

## a log-pdf to sample from
p.log <- function(x) {
  B <- 0.03 # controls 'bananacity'
  -x[1]^2 / 200 - 1/2 * (x[2] + B * x[1]^2 - 100 * B)^2
}

## set options and starting point
n_walkers <- 10
unif_inits <- data.frame(
  a = rnorm(n_walkers, mean = 0, sd = 2),
  b = rnorm(n_walkers, mean = 0, sd = 1)
)

res <- MCMCEnsemble(p.log, inits = unif_inits,
                     max.iter = 5000, n.walkers = n_walkers,
                     method = "stretch", coda = TRUE)

## Using stretch move with 10 walkers.

summary(res$samples)

## 
## Iterations = 1:500
## Thinning interval = 1 
## Number of chains = 10 
## Sample size per chain = 500 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##      Mean    SD Naive SE Time-series SE
## a -1.1480 9.514  0.13455         1.1151
## b  0.1388 3.746  0.05298         0.4102
## 
## 2. Quantiles for each variable:
## 
##     2.5%    25%     50%   75%  97.5%
## a -21.46 -7.431 -0.5148 5.785 16.278
## b -11.07 -1.045  1.3233 2.534  4.225

plot(res$samples)

It is also a good opportunity to set quasi-random starting values that maximises to exploration of the available space. One example of a such quasi-random distribution is the Owen-scrambled Sobol sequence available in the spacefillr package.

n_walkers <- 10
sobol_inits <- setNames(
  spacefillr::generate_sobol_owen_set(n_walkers, dim = 2),
  c("a", "b")
)
res <- MCMCEnsemble(p.log, inits = sobol_inits,
                     max.iter = 5000, n.walkers = n_walkers,
                     method = "stretch", coda = TRUE)

## Using stretch move with 10 walkers.

summary(res$samples)

## 
## Iterations = 1:500
## Thinning interval = 1 
## Number of chains = 10 
## Sample size per chain = 500 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##          Mean    SD Naive SE Time-series SE
## para_1 0.3774 8.990  0.12714         1.2219
## para_2 0.5357 3.238  0.04579         0.4396
## 
## 2. Quantiles for each variable:
## 
##           2.5%     25%   50%   75%  97.5%
## para_1 -18.928 -5.3167 0.774 7.048 16.266
## para_2  -9.294 -0.4218 1.476 2.695  4.325

plot(res$samples)

Re-start a chain from where it ended

Another new possibility thanks to this new behaviour in mcmcensemble 3.0.0 is the option to restart a chain from where it ended. Let’s use again the ‘banana’ example from the README but let’s cut it short:

## a log-pdf to sample from
p.log <- function(x) {
  B <- 0.03 # controls 'bananacity'
  -x[1]^2 / 200 - 1/2 * (x[2] + B * x[1]^2 - 100 * B)^2
}

## set options and starting point
n_walkers <- 10
unif_inits <- data.frame(
  a = runif(n_walkers, 0, 1),
  b = runif(n_walkers, 0, 1)
)

## use stretch move
short <- MCMCEnsemble(p.log, inits = unif_inits,
                      max.iter = 50, n.walkers = n_walkers,
                      method = "stretch", coda = TRUE)

## Using stretch move with 10 walkers.

summary(short$samples)

## 
## Iterations = 1:5
## Thinning interval = 1 
## Number of chains = 10 
## Sample size per chain = 5 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##     Mean     SD Naive SE Time-series SE
## a 0.5925 0.8864  0.12536        0.07851
## b 0.5825 0.5392  0.07626        0.04790
## 
## 2. Quantiles for each variable:
## 
##      2.5%    25%    50%    75% 97.5%
## a -1.5280 0.2495 0.6015 0.9862 2.252
## b -0.6924 0.4502 0.6511 0.8021 1.661

plot(short$samples)

You may notice that this example has a very low number of iteration. We may want to let it run a little bit more. We can restart the chain from where it ended with:

last_values <- do.call(rbind, lapply(short$samples, function(c) c[nrow(c), ]))

longer <- MCMCEnsemble(p.log, inits = last_values,
                       max.iter = 4950, n.walkers = n_walkers,
                       method = "stretch", coda = TRUE)

## Using stretch move with 10 walkers.

# `final` is the concatenation of `short` and `longer`
# However, we need to remove the first element of `longer` since it's already
# present in `short`
final <- list(
  samples = coda::as.mcmc.list(lapply(seq_along(longer$samples), function(i) {
    coda::as.mcmc(rbind(short$samples[[i]], longer$samples[[i]][-1, ]))
  })),
  log.p = cbind(short$log.p, longer$log.p[, -1])
)

plot(final$samples)

For non-coda outputs, here is the equivalent coda snippet:

short <- MCMCEnsemble(p.log, inits = unif_inits,
                      max.iter = 50, n.walkers = n_walkers,
                      method = "stretch")

## Using stretch move with 10 walkers.

last_values <- short$samples[, dim(short$samples)[2], ]

longer <- MCMCEnsemble(p.log, inits = last_values,
                       max.iter = 4950, n.walkers = n_walkers,
                       method = "stretch")

## Using stretch move with 10 walkers.

# `final` is the concatenation of `short` and `longer`
# However, we need to remove the first element of `longer` since it's already
# present in `short`
final <- list(
  samples = array(unlist(lapply(seq_len(dim(longer$samples)[3]), function(i) {
    cbind(longer$samples[, , i], short$samples[, , i])
  })), dim = dim(short$samples) + c(0, dim(longer$samples)[2], 0)),
  log.p = cbind(short$log.p, longer$log.p[, -1])
)

Migrating from mcmcensemble 2.X to mcmcensemble 3.X

As mentioned in the introduction of this blog post, the prior distribution in previous versions of mcmcensemble was always a uniform distribution between lower.inits and upper.inits. It means that your previous code snippets:

MCMCEnsemble(
  p.log,
  lower.inits = c(-5, -15), upper.inits = c(5, 15),
  max.iter = 500, n.walkers = 10,
  method = "stretch", coda = TRUE
)

must be updated to:

MCMCEnsemble(
  p.log,
  inits = runif(10, min = c(-5, -15), max = c(5, 15)),
  max.iter = 500, n.walkers = 10,
  method = "stretch", coda = TRUE
)