Title: | Bayesian Estimation of Change-Points in the Slope of Multivariate Time-Series |
---|---|
Description: | Assume that a temporal process is composed of contiguous segments with differing slopes and replicated noise-corrupted time series measurements are observed. The unknown mean of the data generating process is modelled as a piecewise linear function of time with an unknown number of change-points. The package infers the joint posterior distribution of the number and position of change-points as well as the unknown mean parameters per time-series by MCMC sampling. A-priori, the proposed model uses an overfitting number of mean parameters but, conditionally on a set of change-points, only a subset of them influences the likelihood. An exponentially decreasing prior distribution on the number of change-points gives rise to a posterior distribution concentrating on sparse representations of the underlying sequence, but also available is the Poisson distribution. See Papastamoulis et al (2017) <arXiv:1709.06111> for a detailed presentation of the method. |
Authors: | Panagiotis Papastamoulis |
Maintainer: | Panagiotis Papastamoulis <[email protected]> |
License: | GPL-2 |
Version: | 1.1 |
Built: | 2025-01-19 05:30:49 UTC |
Source: | https://github.com/cran/beast |
Assume that a temporal process is composed of contiguous segments with differing slopes and replicated noise-corrupted time series measurements are observed. The unknown mean of the data generating process is modelled as a piecewise linear function of time with an unknown number of change-points. The package infers the joint posterior distribution of the number and position of change-points as well as the unknown mean parameters per time-series by MCMC sampling. A-priori, the proposed model uses an overfitting number of mean parameters but, conditionally on a set of change-points, only a subset of them influences the likelihood. An exponentially decreasing prior distribution on the number of change-points gives rise to a posterior distribution concentrating on sparse representations of the underlying sequence, but also available is the Poisson distribution. See Papastamoulis et al (2017) <arXiv:1709.06111> for a detailed presentation of the method.
The beast
package deals with Bayesian estimation of change-points in the slope of multivariate time-series, introduced by Papastamoulis et al (2017). For a given period we observe multiple time series which are assumed independent, each one consisting of multiple measurements (replicates). Each time series is assumed to have its own segmentation, which is common among its replicates. Thus, different time series have distinct mean parameters in the underlying normal distribution. The variance, which is assumed known, can be either shared between different time series or not and in practice it is estimated at a pre-processing stage.
Our model infers the joint posterior distribution of the number and location of change-points by MCMC sampling. For this purpose a Metropolis-Hastings MCMC sampler is used. The main function of the package is beast
.
Assume that the observed data consists of time-series, each one consisting of
variables (or replicates) measured at
consecutive time-points. The input of the main function
beast
should be given in the form of a list
myDataList
with the following attributes:
length(myDataList)
should be equal to , that is, the number of variables (or replicates)
dim(myDataList)[[1]]
, ,
dim(myDataList)[[R]]
should be all , that is, rows and columns should correspond to time-points and different series, respectively.
Then, a basic usage of the package consists of the following commands:
beastRun <- beast( myDataList = myDataList )
print(beastRun)
plot(beastRun)
which correspond to running the MCMC sampler, printing and plotting output summaries, respectively.
Panagiotis Papastamoulis
Maintainer: Panagiotis Papastamoulis <[email protected]>
Papastamoulis P., Furukawa T., van Rhijn N., Bromley M., Bignell E. and Rattray M. (2017). Bayesian detection of piecewise linear trends in replicated time-series with application to growth data modelling. arXiv:1709.06111 [stat.AP]
beast, print.beast.object, plot.beast.object
# toy-example (MCMC iterations not enough) library('beast') # load package data("FungalGrowthDataset") # load dataset myIndex <- c(392, 62, 3, 117) # run the sampler only for the # specific subset of time-series set.seed(1) # Run MCMC sampler with very small number of iterations (nIter): run_mcmc <- beast(myDataList = FungalGrowthDataset, subsetIndex = myIndex, zeroNormalization = TRUE, nIter = 40, burn = 20) # Print output: print(run_mcmc) # Plot output to file: "beast_plot.pdf" plot(run_mcmc, fileName = "beast_plot_toy.pdf", timeScale=1/6, xlab = "hours", ylab = "growth") # Run the following commands to obtain convergence: ## Not run: # This example illustrates the package using a subset of four # time-series of the fungal dataset. library('beast') # load package data("FungalGrowthDataset") # load dataset myIndex <- c(392, 62, 3, 117) # run the sampler only for the # specific subset of time-series set.seed(1) # optional # Run MCMC sampler with the default number of iterations (nIter =70000): run_mcmc <- beast(myDataList = FungalGrowthDataset, subsetIndex = myIndex, zeroNormalization = TRUE) # Print output: print(run_mcmc) # Plot output to file: "beast_plot.pdf" plot(run_mcmc, fileName = "beast_plot.pdf", timeScale=1/6, xlab = "hours", ylab = "growth") # NOTE 1: for a complete analysis remove the `subsetIndex = myIndex` argument. # NOTE 2: `zeroNormalization = TRUE` is an optional argument that forces all # time-series to start from zero. It is not supposed to be used # for other applications. ## End(Not run)
# toy-example (MCMC iterations not enough) library('beast') # load package data("FungalGrowthDataset") # load dataset myIndex <- c(392, 62, 3, 117) # run the sampler only for the # specific subset of time-series set.seed(1) # Run MCMC sampler with very small number of iterations (nIter): run_mcmc <- beast(myDataList = FungalGrowthDataset, subsetIndex = myIndex, zeroNormalization = TRUE, nIter = 40, burn = 20) # Print output: print(run_mcmc) # Plot output to file: "beast_plot.pdf" plot(run_mcmc, fileName = "beast_plot_toy.pdf", timeScale=1/6, xlab = "hours", ylab = "growth") # Run the following commands to obtain convergence: ## Not run: # This example illustrates the package using a subset of four # time-series of the fungal dataset. library('beast') # load package data("FungalGrowthDataset") # load dataset myIndex <- c(392, 62, 3, 117) # run the sampler only for the # specific subset of time-series set.seed(1) # optional # Run MCMC sampler with the default number of iterations (nIter =70000): run_mcmc <- beast(myDataList = FungalGrowthDataset, subsetIndex = myIndex, zeroNormalization = TRUE) # Print output: print(run_mcmc) # Plot output to file: "beast_plot.pdf" plot(run_mcmc, fileName = "beast_plot.pdf", timeScale=1/6, xlab = "hours", ylab = "growth") # NOTE 1: for a complete analysis remove the `subsetIndex = myIndex` argument. # NOTE 2: `zeroNormalization = TRUE` is an optional argument that forces all # time-series to start from zero. It is not supposed to be used # for other applications. ## End(Not run)
This is the main function of the package, implementing the MCMC sampler described in Papastamoulis et al (2017).
beast(myDataList, burn, nIter, mhPropRange, mhSinglePropRange, startPoint, timeScale, savePlots, zeroNormalization, LRange, tau, gammaParameter, nu0, alpha0, beta0, subsetIndex, saveTheta, sameVariance, Prior )
beast(myDataList, burn, nIter, mhPropRange, mhSinglePropRange, startPoint, timeScale, savePlots, zeroNormalization, LRange, tau, gammaParameter, nu0, alpha0, beta0, subsetIndex, saveTheta, sameVariance, Prior )
myDataList |
Observed data in the form of a |
burn |
Number of iterations that will be discarder as burn-in period. Default value: |
nIter |
Number of MCMC iterations. Default value: |
mhPropRange |
Positive integer corresponding to the parameter |
mhSinglePropRange |
Positive integer denoting the parameter |
startPoint |
An (optional) positive integer pointing at the minimum time-point where changes are allowed to occur. Default value: |
timeScale |
Null. |
savePlots |
Character denoting the name of the folder where various plots will be saved to. |
zeroNormalization |
Logical value denoting whether to normalize to zero all time time-series for |
LRange |
Range of possible values for the number of change-points. Default value: |
tau |
Positive real number corresponding to parameter |
gammaParameter |
Positive real number corresponding to parameter |
nu0 |
Positive real number corresponding to prior parameter |
alpha0 |
Positive real number corresponding to prior parameter |
beta0 |
Positive real number corresponding to prior parameter |
subsetIndex |
Optional subset of integers corresponding to time-series indexes. If not null, the sampler will be applied only to the specified subset. |
saveTheta |
Logical value indicating whether to save the generated values of the mean per time-point across the MCMC trace. Default: FALSE. |
sameVariance |
Logical value indicating whether to assume the same variance per time-point across time-series. Default value: |
Prior |
Character string specifying the prior distribution of the number of change-points. Allowed values: |
The output of the sampler is returned as a list, with the following features:
Cutpoint_posterior_median |
The estimated medians per change-point, conditionally on the most probable number of change-points per time-series. |
Cutpoint_posterior_variance |
The estimated variances per change-points, conditionally on the most probable number of change-points per time-series. |
NumberOfCutPoints_posterior_distribution |
Posterior distributions of number of change-points per time-series. |
NumberOfCutPoints_MAP |
The most probable number of change-points per time-series. |
Metropolis-Hastings_acceptance_rate |
Acceptance of the MCMC move-types. |
subject_ID |
the identifier of individual time-series. |
Cutpoint_mcmc_trace_map |
The sampled values of each change-point per time series, conditionally on the MAP values. |
theta |
The sampled values of the means per time-series, conditionally on the MAP values. |
nCutPointsTrace |
The sampled values of the number of change-points, per time-series. |
The complexity prior distribution with parameter gammaParameter = 2
is the default prior assumption imposed on the number of change-points. Smaller (larger) values of gammaParameter
will a-priori support larger (respectively: smaller) number of change-points.
For completeness purposes, the Poisson distribution is also allowed in the Prior
. In this latter case, the gammaParameter
denotes the rate parameter of the Poisson distribution. Note that in this case the interpretation of gammaParameter
is reversed: Smaller (larger) values of gammaParameter
will a-priori support smaller (respectively: larger) number of change-points.
Panagiotis Papastamoulis
Papastamoulis P., Furukawa T., van Rhijn N., Bromley M., Bignell E. and Rattray M. (2017). Bayesian detection of piecewise linear trends in replicated time-series with application to growth data modelling. arXiv:1709.06111 [stat.AP]
# toy-example (MCMC iterations not enough) library('beast') # load package data("FungalGrowthDataset") # load dataset myIndex <- c(392, 62, 3, 117) # run the sampler only for the # specific subset of time-series set.seed(1) # Run MCMC sampler with very small number of iterations (nIter): run_mcmc <- beast(myDataList = FungalGrowthDataset, subsetIndex = myIndex, zeroNormalization = TRUE, nIter = 40, burn = 20) # Print output: print(run_mcmc) # Plot output to file: "beast_plot.pdf" plot(run_mcmc, fileName = "beast_plot_toy.pdf", timeScale=1/6, xlab = "hours", ylab = "growth") # Run the following commands to obtain convergence: ## Not run: # This example illustrates the package using a subset of four # time-series of the fungal dataset. library('beast') # load package data("FungalGrowthDataset") # load dataset myIndex <- c(392, 62, 3, 117) # run the sampler only for the # specific subset of time-series set.seed(1) # optional # Run MCMC sampler with the default number of iterations (nIter =70000): run_mcmc <- beast(myDataList = FungalGrowthDataset, subsetIndex = myIndex, zeroNormalization = TRUE) # Print output: print(run_mcmc) # Plot output to file: "beast_plot.pdf" plot(run_mcmc, fileName = "beast_plot.pdf", timeScale=1/6, xlab = "hours", ylab = "growth") # NOTE 1: for a complete analysis remove the `subsetIndex = myIndex` argument. # NOTE 2: `zeroNormalization = TRUE` is an optional argument that forces all # time-series to start from zero. It is not supposed to be used # for other applications. ## End(Not run)
# toy-example (MCMC iterations not enough) library('beast') # load package data("FungalGrowthDataset") # load dataset myIndex <- c(392, 62, 3, 117) # run the sampler only for the # specific subset of time-series set.seed(1) # Run MCMC sampler with very small number of iterations (nIter): run_mcmc <- beast(myDataList = FungalGrowthDataset, subsetIndex = myIndex, zeroNormalization = TRUE, nIter = 40, burn = 20) # Print output: print(run_mcmc) # Plot output to file: "beast_plot.pdf" plot(run_mcmc, fileName = "beast_plot_toy.pdf", timeScale=1/6, xlab = "hours", ylab = "growth") # Run the following commands to obtain convergence: ## Not run: # This example illustrates the package using a subset of four # time-series of the fungal dataset. library('beast') # load package data("FungalGrowthDataset") # load dataset myIndex <- c(392, 62, 3, 117) # run the sampler only for the # specific subset of time-series set.seed(1) # optional # Run MCMC sampler with the default number of iterations (nIter =70000): run_mcmc <- beast(myDataList = FungalGrowthDataset, subsetIndex = myIndex, zeroNormalization = TRUE) # Print output: print(run_mcmc) # Plot output to file: "beast_plot.pdf" plot(run_mcmc, fileName = "beast_plot.pdf", timeScale=1/6, xlab = "hours", ylab = "growth") # NOTE 1: for a complete analysis remove the `subsetIndex = myIndex` argument. # NOTE 2: `zeroNormalization = TRUE` is an optional argument that forces all # time-series to start from zero. It is not supposed to be used # for other applications. ## End(Not run)
This function defines the probability of proposing an addition of a change-point.
birthProbs(LRange)
birthProbs(LRange)
LRange |
The range of possible values for the number of change-points. |
probs |
Vector of probabilities |
Panagiotis Papastamoulis
This function computes the complexity prior distribution on the number of change-points, defined as . Note that this distribution has exponential decrease (Castillo and van der Vaart, 2012) when
, so we set
.
complexityPrior(Lmax = 20, gammaParameter, nTime)
complexityPrior(Lmax = 20, gammaParameter, nTime)
Lmax |
maximum number of change-points (default = 20). |
gammaParameter |
positive real number, corresponding to |
nTime |
positive integer denoting the total number of time-points. |
logPrior |
Prior distribution values in the log-scale. |
Panagiotis Papastamoulis
Castillo I. and van der Vaart A (2012). Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences. The Annals of Statistics, 40(4), 2069–2101.
This function computes the empirical mean of the time-series.
computeEmpiricalPriorParameters(myDataList, nu0 = 1, alpha0 = 1, beta0 = 1)
computeEmpiricalPriorParameters(myDataList, nu0 = 1, alpha0 = 1, beta0 = 1)
myDataList |
Observed multivariate time-series. |
nu0 |
positive real number. |
alpha0 |
positive real number. |
beta0 |
positive real number. |
mu0 |
Empirical mean |
Panagiotis Papastamoulis
Compute empirical posterior parameters
computePosteriorParameters(myDataList, priorParameters)
computePosteriorParameters(myDataList, priorParameters)
myDataList |
Observed data. |
priorParameters |
Prior parameters. |
list of posterior parameters
Panagiotis Papastamoulis
Posterior parameters
computePosteriorParametersFree(myDataList, priorParameters)
computePosteriorParametersFree(myDataList, priorParameters)
myDataList |
observed data. |
priorParameters |
list of prior parameters. |
list of posterior parameters.
Panagiotis Papastamoulis
Time-series dataset with growth levels for
replicates of
objects (mutants) measured every 10 minutes for
time-points. See Papastamoulis et al (2017) for a detailed description.
FungalGrowthDataset
FungalGrowthDataset
Time-series data.
Papastamoulis P., Furukawa T., van Rhijn N., Bromley M., Bignell E. and Rattray M. (2017). Bayesian detection of piecewise linear trends in replicated time-series with application to growth data modelling. arXiv:1709.06111 [stat.AP]
Implements Move 3.b of the Metropolis-Hastings MCMC sampler.
localProposal(cutPoints, nTime, mhPropRange, startPoint)
localProposal(cutPoints, nTime, mhPropRange, startPoint)
cutPoints |
Current configuration of change-points. |
nTime |
Total number of time-points. |
mhPropRange |
Parameter |
startPoint |
Integer, at least equal to 2. |
newState |
Candidate state of the chain. |
propRatio |
Proposal ratio. |
Panagiotis Papastamoulis
Log-likelihood function.
logLikelihoodFullModel(myData, cutPoints, theta, startPoint)
logLikelihoodFullModel(myData, cutPoints, theta, startPoint)
myData |
data |
cutPoints |
change-points. |
theta |
means. |
startPoint |
optional integer at least equal to 2. |
log-likelihood value.
Panagiotis Papastamoulis
Logarithm of the prior distribution on the number of change-points.
logPrior(cutPoints, nTime, startPoint)
logPrior(cutPoints, nTime, startPoint)
cutPoints |
change-points. |
nTime |
number of time-points. |
startPoint |
optional integer, at least equal to 2. |
logarithm of the prior distribution.
Panagiotis Papastamoulis
This function implements the Metropolis-Hastings MCMC sampler for individual time-series.
mcmcSampler(myData, nIter, finalIterationPdf, modelVariance, mhPropRange, mhSinglePropRange, movesRange, startPoint, postPar, dName, timeScale, burn, iterPerPlotPrefix, priorParameters, L = 3, LRange, tau, gammaParameter, saveTheta, Prior = "complexity")
mcmcSampler(myData, nIter, finalIterationPdf, modelVariance, mhPropRange, mhSinglePropRange, movesRange, startPoint, postPar, dName, timeScale, burn, iterPerPlotPrefix, priorParameters, L = 3, LRange, tau, gammaParameter, saveTheta, Prior = "complexity")
myData |
observed data. |
nIter |
number of mcmc iterations |
finalIterationPdf |
output folder |
modelVariance |
null |
mhPropRange |
positive integer |
mhSinglePropRange |
positive integer |
movesRange |
null |
startPoint |
positive integer |
postPar |
list of emprirically estimated parameters |
dName |
subject ID |
timeScale |
null |
burn |
burn-in period. |
iterPerPlotPrefix |
null |
priorParameters |
prior parameters. |
L |
null |
LRange |
range of possible values of the number of change-points. |
tau |
real. |
gammaParameter |
real. |
saveTheta |
TRUE. |
Prior |
character. |
MCMC output.
Panagiotis Papastamoulis
Printing various unicode symbols.
myUnicodeCharacters()
myUnicodeCharacters()
printed symbol
Zero normalization at 1st time-point
normalizeTime0(myDataList)
normalizeTime0(myDataList)
myDataList |
data |
null
Panagiotis Papastamoulis
This function plots objects returned by the beast
function. All output is diverted to a pdf file provided in the fileName
argument.
## S3 method for class 'beast.object' plot(x, fileName, width, height, pointsize, ylab, xlab, timeScale, myPal, boxwex, ...)
## S3 method for class 'beast.object' plot(x, fileName, width, height, pointsize, ylab, xlab, timeScale, myPal, boxwex, ...)
x |
An object of class |
fileName |
Name of the output pdf file. |
width |
Width of pdf file. Default: 9 |
height |
Height pdf file. Default: 6 |
pointsize |
Pointsize. Default: 12 |
ylab |
|
xlab |
|
timeScale |
A multiplicative-factor which will be used to scale the |
myPal |
Vector of colors that will be used to produce the plot with all time-series overlayed. If the distinct values of the inferred numbers of change-points is less than 10, the |
boxwex |
A scale factor to be applied to all boxes. The appearance of the plot can be improved by making the boxes narrower or wider. Default: 0.2. |
... |
ignored. |
The function will produce a plot with all time-series coloured according to the corresponding number of change-points. In addition, it will generate individual plots per time-series displaying the observed data with boxplots which summarize the posterior distribution of change-points locations, conditionally on the most probable number of change-points.
Panagiotis Papastamoulis
This function prints a summary of objects returned by the beast
function.
## S3 method for class 'beast.object' print(x, ...)
## S3 method for class 'beast.object' print(x, ...)
x |
An object of class |
... |
ignored. |
The function prints a summary of the most probable number (MAP) of change-points per time-series in the form of a table, as well as a list containing the MAP number of change-points and the corresponding locations (posterior medians) per time-series.
Panagiotis Papastamoulis
Proposes an update of according to Metropolis-Hastings move 2.
proposeTheta(thetaOld, tau, alpha, beta)
proposeTheta(thetaOld, tau, alpha, beta)
thetaOld |
Current values |
tau |
Parameter |
alpha |
null |
beta |
null |
mean |
proposed values. |
Panagiotis Papastamoulis
Generation of mean values according to the prior
simMultiIndNormInvGamma(mu, nu, alpha, beta)
simMultiIndNormInvGamma(mu, nu, alpha, beta)
mu |
means |
nu |
precision parameter |
alpha |
prior parameter |
beta |
prior parameter |
null
Panagiotis Papastamoulis
Generate change-points according to the prior distribution conditionally on a given number of change-points.
simulateFromPrior(nTime, startPoint, L = 3)
simulateFromPrior(nTime, startPoint, L = 3)
nTime |
Number of time-points |
startPoint |
At least equal to 2. |
L |
null |
cutPoints |
Change-point locations |
Panagiotis Papastamoulis
Implement Metropolis-Hastings Move 3.b.
singleLocalProposal(cutPoints, nTime, mhSinglePropRange, startPoint)
singleLocalProposal(cutPoints, nTime, mhSinglePropRange, startPoint)
cutPoints |
Current state |
nTime |
Number of time-points |
mhSinglePropRange |
Prior parameter. |
startPoint |
Optional. |
newState |
Candidate state |
propRatio |
Proposal ratio |
Panagiotis Papastamoulis
Probability density function of the truncated Poisson distribution.
truncatedPoisson(Lmax = 20, gammaParameter = 1)
truncatedPoisson(Lmax = 20, gammaParameter = 1)
Lmax |
Max number |
gammaParameter |
Location parameter. |
logPrior |
Log-pdf values |
Panagiotis Papastamoulis
Update the number of change-points according to Metropolis-Hastings move 1.
updateNumberOfCutpoints(cutPoints, nTime, startPoint, LRange, birthProbs)
updateNumberOfCutpoints(cutPoints, nTime, startPoint, LRange, birthProbs)
cutPoints |
Current configuration |
nTime |
Number of time-points |
startPoint |
Optional integer |
LRange |
Range of possible values |
birthProbs |
Birth probabilities |
newState |
Candidate state |
propRatio |
Proposal ratio |
Panagiotis Papastamoulis