Title: | Post-Processing MCMC Outputs of Bayesian Factor Analytic Models |
---|---|
Description: | A well known identifiability issue in factor analytic models is the invariance with respect to orthogonal transformations. This problem burdens the inference under a Bayesian setup, where Markov chain Monte Carlo (MCMC) methods are used to generate samples from the posterior distribution. The package applies a series of rotation, sign and permutation transformations (Papastamoulis and Ntzoufras (2022) <DOI:10.1007/s11222-022-10084-4>) into raw MCMC samples of factor loadings, which are provided by the user. The post-processed output is identifiable and can be used for MCMC inference on any parametric function of factor loadings. Comparison of multiple MCMC chains is also possible. |
Authors: | Panagiotis Papastamoulis [aut, cre] |
Maintainer: | Panagiotis Papastamoulis <[email protected]> |
License: | GPL-2 |
Version: | 1.4 |
Built: | 2024-11-16 05:05:07 UTC |
Source: | https://github.com/cran/factor.switching |
A well known identifiability issue in factor analytic models is the invariance with respect to orthogonal transformations. This problem burdens the inference under a Bayesian setup, where Markov chain Monte Carlo (MCMC) methods are used to generate samples from the posterior distribution. The package applies a series of rotation, sign and permutation transformations (Papastamoulis and Ntzoufras (2022) <DOI:10.1007/s11222-022-10084-4>) into raw MCMC samples of factor loadings, which are provided by the user. The post-processed output is identifiable and can be used for MCMC inference on any parametric function of factor loadings. Comparison of multiple MCMC chains is also possible.
There are three alternative schemes for minimizing the objective function.
Exact rsp_exact
Partial Simulated Annealing rsp_partial_sa
Full simulated annealing rsp_full_sa
The exact algorithm solves assignment problems per MCMC iteration, where
denotes the number of factors of the fitted model. For typical values of the number of factors (e.g.
) the exact scheme should be preferred. Otherwise, the two approximate algorithms based on simulated annealing may be considered. The Partial simulated annealing is more efficient than the full simulated annealing scheme.
In cases of parallel MCMC chains, applying the RSP algorithm for each chain separately will identify the factor loadings within each chain. However, the results will not be comparable between chains. The comparison of multiple MCMC chains is doable via the compareMultipleChains
function.
The DESCRIPTION file: Index of help topics:
compareMultipleChains Compare multiple chains credible.region Compute a simultaneous credible region (rectangle) from a sample for a vector valued parameter. factor.switching-package Post-Processing MCMC Outputs of Bayesian Factor Analytic Models plot.rsp Plot posterior means and credible regions procrustes_switching Orthogonal Procrustes rotations rsp_exact Rotation-Sign-Permutation (RSP) algorithm (Exact scheme) rsp_full_sa Rotation-Sign-Permutation (RSP) algorithm (Full Simulated Annealing) rsp_partial_sa Rotation-Sign-Permutation (RSP) algorithm (Partial Simulated Annealing) small_posterior_2chains Example data switch_and_permute Apply sign switchings and column permutations weighted_procrustes_switching Weighted Orthogonal Procrustes rotations
Panagiotis Papastamoulis
Maintainer: Panagiotis Papastamoulis
Papastamoulis, P. and Ntzoufras, I. (2022). On the identifiability of Bayesian Factor Analytic models. Statistics and Computing, 32, 23 (2022) https://doi.org/10.1007/s11222-022-10084-4.
rsp_exact
, plot.rsp
, compareMultipleChains
# load 2 chains each one consisting of a # small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) Nchains <- length(small_posterior_2chains) reorderedPosterior <- vector('list',length=Nchains) # post-process the 2 chains for(i in 1:Nchains){ reorderedPosterior[[i]] <- rsp_exact( lambda_mcmc = small_posterior_2chains[[i]], maxIter = 100, threshold = 1e-6, verbose=TRUE ) } # plot posterior summary for chain 1: plot(reorderedPosterior[[1]]) # plot posterior summary for chain 2: plot(reorderedPosterior[[2]]) # make them comparable makeThemSimilar <- compareMultipleChains(rspObjectList=reorderedPosterior) # plot the traces of both chains oldpar <- par(no.readonly =TRUE) par(mfcol=c(2,6),mar=c(4,4,2,1)) plot(makeThemSimilar,auto.layout=FALSE,density=FALSE, ylim=c(-1.1,1.1),smooth=FALSE,col=c('red','blue')) legend('topright',c('post-processed chain 1', 'post-processed chain 2'),lty=1:2,col=c('red','blue')) par(oldpar) # you can also use the summary of mcmc.list summary(makeThemSimilar)
# load 2 chains each one consisting of a # small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) Nchains <- length(small_posterior_2chains) reorderedPosterior <- vector('list',length=Nchains) # post-process the 2 chains for(i in 1:Nchains){ reorderedPosterior[[i]] <- rsp_exact( lambda_mcmc = small_posterior_2chains[[i]], maxIter = 100, threshold = 1e-6, verbose=TRUE ) } # plot posterior summary for chain 1: plot(reorderedPosterior[[1]]) # plot posterior summary for chain 2: plot(reorderedPosterior[[2]]) # make them comparable makeThemSimilar <- compareMultipleChains(rspObjectList=reorderedPosterior) # plot the traces of both chains oldpar <- par(no.readonly =TRUE) par(mfcol=c(2,6),mar=c(4,4,2,1)) plot(makeThemSimilar,auto.layout=FALSE,density=FALSE, ylim=c(-1.1,1.1),smooth=FALSE,col=c('red','blue')) legend('topright',c('post-processed chain 1', 'post-processed chain 2'),lty=1:2,col=c('red','blue')) par(oldpar) # you can also use the summary of mcmc.list summary(makeThemSimilar)
Compares multiples chains after each one of them has been post-processed by the RSP algorithm, so that all of them are switched into a similar labelling.
compareMultipleChains(rspObjectList, scheme, sa_loops, maxIter, threshold)
compareMultipleChains(rspObjectList, scheme, sa_loops, maxIter, threshold)
rspObjectList |
A list consisting of |
scheme |
Character argument with possible values: "exact" (default), "partial" or "full". |
sa_loops |
Number of simulated annealing loops (only applicable when "exact" is disabled). |
maxIter |
Max number of iterations. |
threshold |
Threshold for convergence. |
reorderedChains
: an object of class mcmc.list
containing all simultaneously processed chains.
Panagiotis Papastamoulis
# load 2 chains each one consisting of a # small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) Nchains <- length(small_posterior_2chains) reorderedPosterior <- vector('list',length=Nchains) for(i in 1:Nchains){ reorderedPosterior[[i]] <- rsp_exact( lambda_mcmc = small_posterior_2chains[[i]], maxIter = 100, threshold = 1e-6, verbose=TRUE ) } # make them comparable makeThemSimilar <- compareMultipleChains(rspObjectList=reorderedPosterior)
# load 2 chains each one consisting of a # small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) Nchains <- length(small_posterior_2chains) reorderedPosterior <- vector('list',length=Nchains) for(i in 1:Nchains){ reorderedPosterior[[i]] <- rsp_exact( lambda_mcmc = small_posterior_2chains[[i]], maxIter = 100, threshold = 1e-6, verbose=TRUE ) } # make them comparable makeThemSimilar <- compareMultipleChains(rspObjectList=reorderedPosterior)
See references below for more details. The function has been originally written for the archived bayesSurv
package.
credible.region(sample, probs=c(0.90, 0.975))
credible.region(sample, probs=c(0.90, 0.975))
sample |
a data frame or matrix with sampled values (one column = one parameter) |
probs |
probabilities for which the credible regions are to be computed |
A list (one component for each confidence region) of length equal to
length(probs)
. Each component of the list is a matrix with two
rows (lower and upper limit) and as many columns as the number of
parameters giving the confidence region.
Arnost Komarek
Besag, J., Green, P., Higdon, D. and Mengersen, K. (1995). Bayesian computation and stochastic systems (with Discussion). Statistical Science, 10, 3 - 66, page 30
Held, L. (2004). Simultaneous inference in risk assessment; a Bayesian perspective In: COMPSTAT 2004, Proceedings in Computational Statistics (J. Antoch, Ed.), 213 - 222, page 214
Held, L. (2004b). Simultaneous posterior probability statements from Monte Carlo output. Journal of Computational and Graphical Statistics, 13, 20 - 35.
m <- 10000 sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m)) probs <- c(0.70, 0.90, 0.95) CR <- credible.region(sample, probs=probs) for (kk in 1:length(CR)){ suma <- sum(sample$x1 >= CR[[kk]]["Lower", "x1"] & sample$x1 <= CR[[kk]]["Upper", "x1"] & sample$x2 >= CR[[kk]]["Lower", "x2"] & sample$x2 <= CR[[kk]]["Upper", "x2"] & sample$x3 >= CR[[kk]]["Lower", "x3"] & sample$x3 <= CR[[kk]]["Upper", "x3"]) show <- c(suma/m, probs[kk]) names(show) <- c("Empirical", "Desired") print(show) }
m <- 10000 sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m)) probs <- c(0.70, 0.90, 0.95) CR <- credible.region(sample, probs=probs) for (kk in 1:length(CR)){ suma <- sum(sample$x1 >= CR[[kk]]["Lower", "x1"] & sample$x1 <= CR[[kk]]["Upper", "x1"] & sample$x2 >= CR[[kk]]["Lower", "x2"] & sample$x2 <= CR[[kk]]["Upper", "x2"] & sample$x3 >= CR[[kk]]["Lower", "x3"] & sample$x3 <= CR[[kk]]["Upper", "x3"]) show <- c(suma/m, probs[kk]) names(show) <- c("Empirical", "Desired") print(show) }
This function plot posterior mean estimates per factor along with Highest Density Intervals, as well as simultaneous credible regions.
## S3 method for class 'rsp' plot(x, prob, myCol, mfrow, subSet, simCR, HDI, ...)
## S3 method for class 'rsp' plot(x, prob, myCol, mfrow, subSet, simCR, HDI, ...)
x |
An object of class |
prob |
Coverage probability of credible regions. |
myCol |
Vector of colours. |
mfrow |
Number of rows and columns in the resulting graphic. |
subSet |
Enable to plot a subset of factors. |
simCR |
Logical value for plotting simultaneous credible regions. Default: True. |
HDI |
Logical value for plotting Highest Density Intervals per factor loading. Default: True. |
... |
Ignored |
A plot.
Panagiotis Papastasmoulis
# load small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) # post-process it reorderedPosterior <- rsp_exact( lambda_mcmc = small_posterior_2chains[[1]]) # plot it plot(reorderedPosterior, mfrow = c(1,2), prob=0.95)
# load small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) # post-process it reorderedPosterior <- rsp_exact( lambda_mcmc = small_posterior_2chains[[1]]) # plot it plot(reorderedPosterior, mfrow = c(1,2), prob=0.95)
Orthogonal Procrustes (OP) post-processing (Assmann et al. 2016) augmented with a final varimax rotation as implemented in Papastamoulis and Ntzoufras (2020). The algorithm uses the procrustes
function of the MCMCpack package.
procrustes_switching(lambda_mcmc, maxIter, threshold, verbose, rotate, printIter)
procrustes_switching(lambda_mcmc, maxIter, threshold, verbose, rotate, printIter)
lambda_mcmc |
Input matrix containing a MCMC sample of factor loadings. The column names should read as 'LambdaV1_1',..., 'LambdaV1_q', ..., 'LambdaVp_1',..., 'LambdaVp_q', where |
maxIter |
Maximum number of iterations of the RSP algorithm. Default: 100. |
threshold |
Positive threshold for declaring convergence. The actual convergence criterion is |
verbose |
Logical value indicating whether to print intermediate output or not. |
rotate |
This is argument is always set to FALSE. |
printIter |
Print the progress of the algorithm when processing |
lambda_reordered_mcmc |
Post-processed MCMC sample of factor loadings. |
lambda_hat |
The resulting average of the post-processed MCMC sample of factor loadings. |
objective_function |
A two-column matrix containing the time-to-reach and the value of the objective function for each iteration. |
Panagiotis Papastamoulis
Assmann, C., Boysen-Hogrefem J. and Pape M. (2016). Bayesian analysis of static and dynamic factor models: An ex-post approach towards the rotation problem. Journal of Econometrics: 192(1): Pages 190-206.
Martin AD, Quinn KM, Park JH (2011). MCMCpack: Markov Chain Monte Carlo in R. Journal of Statistical Software: 42(9), 22.
Papastamoulis, P. and Ntzoufras, I. (2020). On the identifiability of Bayesian Factor Analytic models. arXiv:2004.05105 [stat.ME].
# load small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) # post-process it reorderedPosterior <- procrustes_switching( lambda_mcmc = small_posterior_2chains[[1]]) # summarize the post-processed MCMC sample with coda summary(reorderedPosterior$lambda_reordered_mcmc)
# load small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) # post-process it reorderedPosterior <- procrustes_switching( lambda_mcmc = small_posterior_2chains[[1]]) # summarize the post-processed MCMC sample with coda summary(reorderedPosterior$lambda_reordered_mcmc)
Rotation-Sign-Permutation (RSP) algorithm (exact).
rsp_exact(lambda_mcmc, maxIter, threshold, verbose, rotate, printIter)
rsp_exact(lambda_mcmc, maxIter, threshold, verbose, rotate, printIter)
lambda_mcmc |
Input matrix containing a MCMC sample of factor loadings. The column names should read as 'LambdaV1_1',..., 'LambdaV1_q', ..., 'LambdaVp_1',..., 'LambdaVp_q', where |
maxIter |
Maximum number of iterations of the RSP algorithm. Default: 100. |
threshold |
Positive threshold for declaring convergence. The actual convergence criterion is |
verbose |
Logical value indicating whether to print intermediate output or not. |
rotate |
Logical. Default: TRUE. |
printIter |
Print the progress of the algorithm when processing |
lambda_reordered_mcmc |
Post-processed MCMC sample of factor loadings. |
sign_vectors |
The final sign-vectors. |
permute_vectors |
The final permutations. |
lambda_hat |
The resulting average of the post-processed MCMC sample of factor loadings. |
objective_function |
A two-column matrix containing the time-to-reach and the value of the objective function for each iteration. |
Panagiotis Papastamoulis
Papastamoulis, P. and Ntzoufras, I. (2020). On the identifiability of Bayesian Factor Analytic models. arXiv:2004.05105 [stat.ME].
# load small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) # post-process it reorderedPosterior <- rsp_exact( lambda_mcmc = small_posterior_2chains[[1]]) # summarize the post-processed MCMC sample with coda summary(reorderedPosterior$lambda_reordered_mcmc)
# load small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) # post-process it reorderedPosterior <- rsp_exact( lambda_mcmc = small_posterior_2chains[[1]]) # summarize the post-processed MCMC sample with coda summary(reorderedPosterior$lambda_reordered_mcmc)
Rotation-Sign-Permutation (RSP) algorithm (Full Simulated Annealing).
rsp_full_sa(lambda_mcmc, maxIter = 1000, threshold = 1e-06, verbose = TRUE, sa_loops, rotate = TRUE, increaseIter = FALSE, temperatureSchedule = NULL, printIter = 1000)
rsp_full_sa(lambda_mcmc, maxIter = 1000, threshold = 1e-06, verbose = TRUE, sa_loops, rotate = TRUE, increaseIter = FALSE, temperatureSchedule = NULL, printIter = 1000)
lambda_mcmc |
Input matrix containing a MCMC sample of factor loadings. The column names should read as 'LambdaV1_1',..., 'LambdaV1_q', ..., 'LambdaVp_1',..., 'LambdaVp_q', where |
maxIter |
Maximum number of iterations of the RSP algorithm. Default: 1000. |
threshold |
Positive threshold for declaring convergence. The actual convergence criterion is |
verbose |
Logical value indicating whether to print intermediate output or not. |
sa_loops |
Number of simulated annealing loops per MCMC draw. |
rotate |
Logical. Default: TRUE. |
increaseIter |
Logical. |
temperatureSchedule |
Single valued function describing the temperature cooling schedule for the simulated annealing loops. |
printIter |
Print the progress of the algorithm when processing |
lambda_reordered_mcmc |
Post-processed MCMC sample of factor loadings. |
sign_vectors |
The final sign-vectors. |
permute_vectors |
The final permutations. |
lambda_hat |
The resulting average of the post-processed MCMC sample of factor loadings. |
objective_function |
A two-column matrix containing the time-to-reach and the value of the objective function for each iteration. |
Panagiotis Papastamoulis
Papastamoulis, P. and Ntzoufras, I. (2020). On the identifiability of Bayesian Factor Analytic models. arXiv:2004.05105 [stat.ME].
# load small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) # post-process it reorderedPosterior <- rsp_partial_sa( lambda_mcmc = small_posterior_2chains[[1]], sa_loops=5) # sa_loops should be larger in general # summarize the post-processed MCMC sample with coda summary(reorderedPosterior$lambda_reordered_mcmc)
# load small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) # post-process it reorderedPosterior <- rsp_partial_sa( lambda_mcmc = small_posterior_2chains[[1]], sa_loops=5) # sa_loops should be larger in general # summarize the post-processed MCMC sample with coda summary(reorderedPosterior$lambda_reordered_mcmc)
Rotation-Sign-Permutation (RSP) algorithm (Partial Simulated Annealing).
rsp_partial_sa(lambda_mcmc, maxIter = 1000, threshold = 1e-06, verbose = TRUE, sa_loops, rotate = TRUE, increaseIter = FALSE, temperatureSchedule = NULL, printIter = 1000)
rsp_partial_sa(lambda_mcmc, maxIter = 1000, threshold = 1e-06, verbose = TRUE, sa_loops, rotate = TRUE, increaseIter = FALSE, temperatureSchedule = NULL, printIter = 1000)
lambda_mcmc |
Input matrix containing a MCMC sample of factor loadings. The column names should read as 'LambdaV1_1',..., 'LambdaV1_q', ..., 'LambdaVp_1',..., 'LambdaVp_q', where |
maxIter |
Maximum number of iterations of the RSP algorithm. Default: 1000. |
threshold |
Positive threshold for declaring convergence. The actual convergence criterion is |
verbose |
Logical value indicating whether to print intermediate output or not. |
sa_loops |
Number of simulated annealing loops per MCMC draw. |
rotate |
Logical. Default: TRUE. |
increaseIter |
Logical. |
temperatureSchedule |
Single valued function describing the temperature cooling schedule for the simulated annealing loops. |
printIter |
Print the progress of the algorithm when processing |
lambda_reordered_mcmc |
Post-processed MCMC sample of factor loadings. |
sign_vectors |
The final sign-vectors. |
permute_vectors |
The final permutations. |
lambda_hat |
The resulting average of the post-processed MCMC sample of factor loadings. |
objective_function |
A two-column matrix containing the time-to-reach and the value of the objective function for each iteration. |
Panagiotis Papastamoulis
Papastamoulis, P. and Ntzoufras, I. (2020). On the identifiability of Bayesian Factor Analytic models. arXiv:2004.05105 [stat.ME].
# load small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) # post-process it reorderedPosterior <- rsp_partial_sa( lambda_mcmc = small_posterior_2chains[[1]], sa_loops=5) # sa_loops should be larger in general # summarize the post-processed MCMC sample with coda summary(reorderedPosterior$lambda_reordered_mcmc)
# load small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) # post-process it reorderedPosterior <- rsp_partial_sa( lambda_mcmc = small_posterior_2chains[[1]], sa_loops=5) # sa_loops should be larger in general # summarize the post-processed MCMC sample with coda summary(reorderedPosterior$lambda_reordered_mcmc)
A list consisting of two small MCMC chains.
data(small_posterior_2chains)
data(small_posterior_2chains)
List of length 2. Each entry contains a matrix of 20 MCMC draws.
Help function, not really meant to be used by the average user.
switch_and_permute(lambda_mcmc, switch_vectors, permute_vectors)
switch_and_permute(lambda_mcmc, switch_vectors, permute_vectors)
lambda_mcmc |
MCMC input. |
switch_vectors |
Sign vectors. |
permute_vectors |
Permutation vectors. |
reordered lambda_mcmc
according to sign and permutations provided.
Panagiotis Papastamoulis
Weighted Orthogonal Procrustes (WOP) post-processing (Assmann et al. 2016) augmented with a final varimax rotation as implemented in Papastamoulis and Ntzoufras (2020). The algorithm uses the procrustes
function of the MCMCpack package.
weighted_procrustes_switching(lambda_mcmc, maxIter, threshold, verbose, weight, printIter)
weighted_procrustes_switching(lambda_mcmc, maxIter, threshold, verbose, weight, printIter)
lambda_mcmc |
Input matrix containing a MCMC sample of factor loadings. The column names should read as 'LambdaV1_1',..., 'LambdaV1_q', ..., 'LambdaVp_1',..., 'LambdaVp_q', where |
maxIter |
Maximum number of iterations of the RSP algorithm. Default: 100. |
threshold |
Positive threshold for declaring convergence. The actual convergence criterion is |
verbose |
Logical value indicating whether to print intermediate output or not. |
weight |
This is argument is always set to TRUE. |
printIter |
Print the progress of the algorithm when processing |
lambda_reordered_mcmc |
Post-processed MCMC sample of factor loadings. |
lambda_hat |
The resulting average of the post-processed MCMC sample of factor loadings. |
objective_function |
A two-column matrix containing the time-to-reach and the value of the objective function for each iteration. |
Panagiotis Papastamoulis
Assmann, C., Boysen-Hogrefem J. and Pape M. (2016). Bayesian analysis of static and dynamic factor models: An ex-post approach towards the rotation problem. Journal of Econometrics: 192 (1): Pages 190-206.
Martin AD, Quinn KM, Park JH (2011). MCMCpack: Markov Chain Monte Carlo in R. Journal of Statistical Software: 42(9), 22.
Papastamoulis, P. and Ntzoufras, I. (2020). On the identifiability of Bayesian Factor Analytic models. arXiv:2004.05105 [stat.ME].
# load small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) # post-process it reorderedPosterior <- weighted_procrustes_switching( lambda_mcmc = small_posterior_2chains[[1]]) # summarize the post-processed MCMC sample with coda summary(reorderedPosterior$lambda_reordered_mcmc)
# load small mcmc sample of 100 iterations # with p=6 variables and q=2 factors. data(small_posterior_2chains) # post-process it reorderedPosterior <- weighted_procrustes_switching( lambda_mcmc = small_posterior_2chains[[1]]) # summarize the post-processed MCMC sample with coda summary(reorderedPosterior$lambda_reordered_mcmc)