| Title: | Fit High Dimensional Mixtures of Poisson GLMs |
|---|---|
| Description: | Mixtures of Poisson Generalized Linear Models for high dimensional count data clustering. The (multivariate) responses can be partitioned into set of blocks. Three different parameterizations of the linear predictor are considered. The models are estimated according to the EM algorithm with an efficient initialization scheme <doi:10.1016/j.csda.2014.07.005>. |
| Authors: | Panagiotis Papastamoulis [aut, cre]
|
| Maintainer: | Panagiotis Papastamoulis <[email protected]> |
| License: | GPL-2 |
| Version: | 1.4 |
| Built: | 2026-05-15 06:27:13 UTC |
| Source: | https://github.com/cran/poisson.glm.mix |
(m=1) Poisson GLM mixture.
This function applies EM algorithm for estimating a -component mixture of Poisson GLM's, using parameterization , that is the model. Initialization can be done using two different intialization schemes. The first one is a two-step small EM procedure. The second one is a random splitting small EM procedure based on results of a mixture with less components. Output of the function is the updates of the parameters at each iteration of the EM algorithm, the estimate of , the estimated clusters and conditional probabilities of the observations, as well as the values of the BIC, ICL and loglikelihood of the model.
bjkmodel(reference, response, L, m, K, nr, maxnr, m1, m2, t1, t2, msplit, tsplit, prev.z, prev.clust, start.type, prev.alpha, prev.beta)bjkmodel(reference, response, L, m, K, nr, maxnr, m1, m2, t1, t2, msplit, tsplit, prev.z, prev.clust, start.type, prev.alpha, prev.beta)
reference |
a numeric array of dimension |
response |
a numeric array of count data with dimension |
L |
numeric vector of positive integers containing the partition of the |
m |
positive integer denoting the maximum number of EM iterations. |
K |
positive integer denoting the number of mixture components. |
nr |
negative number denoting the tolerance for the convergence of the Newton Raphson iterations. |
maxnr |
positive integer denoting the maximum number of Newton Raphson iterations. |
m1 |
positive integer denoting the number of iterations for each call of the 1st small EM iterations used by Initialization 1 ( |
m2 |
positive integer denoting the number of iterations for each call of the 2nd small EM iterations used by Initialization 1 ( |
t1 |
positive integer denoting the number of different runs of the 1st small EM used by Initialization 1 ( |
t2 |
positive integer denoting the number of different runs of the 2nd small EM used by Initialization 1 ( |
msplit |
positive integer denoting the number of different runs for each call of the splitting small EM used by Initialization 2 ( |
tsplit |
positive integer denoting the number of different runs for each call of the splitting small EM used by Initialization 2 ( |
prev.z |
numeric array of dimension |
prev.clust |
numeric vector of length |
start.type |
binary variable (1 or 2) indicating the type of initialization (1 for initialization 1 and 2 for initialization 2). |
prev.alpha |
numeric array of dimension |
prev.beta |
numeric array of dimension |
alpha |
numeric array of dimension |
beta |
numeric array of dimension |
gamma |
numeric array of dimension |
psim |
numeric array of dimension |
clust |
numeric vector of length |
z |
numeric array of dimension |
bic |
numeric, the value of the BIC. |
icl |
numeric, the value of the ICL. |
ll |
numeric, the value of the loglikelihood, computed according to the |
Panagiotis Papastamoulis
init1.1.jk.j, init1.2.jk.j, init2.jk.j
############################################################ #1. Example with Initialization 1 # ############################################################ ## load a simulated dataset according to the b_jk model ## number of observations: 500 ## design: L=(3,2,1) data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] ## use Initialization 1 with 2 components ## the number of different 1st small runs equals t1=3, ## each one consisting of m1 = 5 iterations ## the number of different 2nd small runs equals t2=3, ## each one consisting of m2 = 5 iterations ## the maximum number of EM iterations is set to m = 1000. nc <- 2 run <- bjkmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=nc, nr=-10*log(10), maxnr=10, m1=5, m2=5, t1=3, t2=3, msplit, tsplit, prev.z, prev.clust, start.type=1, prev.alpha, prev.beta) ## retrieve the iteration that the small em converged: tem <- length(run$psim)/nc ## print the estimate of regression constants alpha. run$alpha[tem,,] ## print the estimate of regression coefficients beta. beta <- run$beta[tem,,,] ## print the estimate of gamma. run$gamma ## print the estimate of mixture weights. run$psim[tem,] ## frequency table of the resulting clustering of the ## 500 observations among the 2 components. table(run$clust) ## print the value of the ICL criterion run$icl ## print the value of the BIC run$bic ## print the value of the loglikelihood run$ll ############################################################ #2. Example with Initialization 2 # ############################################################ #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Given the estimates of Example 1, estimate a 3-component mixture using ~ # Initialization 2. The number of different runs is set to $tsplit=2$ with ~ # each one of them using $msplit=5$ em iterations. ~ #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ run.previous<-run ## number of conditions q <- 3 ## number of covariates tau <- 1 ## number of components nc <- 3 ## estimated conditional probabilities for K=2 z <- run.previous$z ## number of iteration that the previous EM converged ml <- length(run.previous$psim)/(nc - 1) ## estimates of alpha when K=2 alpha <- array(run.previous$alpha[ml, , ], dim = c(q, nc - 1)) ## estimates of beta when K=2 beta <- array(run.previous$beta[ml, , , ], dim = c(q, nc - 1, tau)) clust <- run.previous$clust ##(estimated clusters when K=2) run <- bjkmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=3, nr=-10*log(10), maxnr=10, m1, m2, t1, t2, msplit=5, tsplit=2, prev.z=z, prev.clust=clust, start.type=2, prev.alpha=alpha, prev.beta=beta) # retrieve the iteration that EM converged tem <- length(run$psim)/nc # estimates of the mixture weights run$psim[tem,] # estimates of the regression constants alpha_{jk}, j = 1,2,3, k=1,..,3 run$alpha[tem,,] # estimates of the regression coefficients beta_{jk\tau}, j = 1,2,3, k=1,..,3, \tau=1 run$beta[tem,,,] # note: useR should specify larger values for Kmax, m1, m2, t1, t2, msplit and # tsplit for a complete analysis.############################################################ #1. Example with Initialization 1 # ############################################################ ## load a simulated dataset according to the b_jk model ## number of observations: 500 ## design: L=(3,2,1) data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] ## use Initialization 1 with 2 components ## the number of different 1st small runs equals t1=3, ## each one consisting of m1 = 5 iterations ## the number of different 2nd small runs equals t2=3, ## each one consisting of m2 = 5 iterations ## the maximum number of EM iterations is set to m = 1000. nc <- 2 run <- bjkmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=nc, nr=-10*log(10), maxnr=10, m1=5, m2=5, t1=3, t2=3, msplit, tsplit, prev.z, prev.clust, start.type=1, prev.alpha, prev.beta) ## retrieve the iteration that the small em converged: tem <- length(run$psim)/nc ## print the estimate of regression constants alpha. run$alpha[tem,,] ## print the estimate of regression coefficients beta. beta <- run$beta[tem,,,] ## print the estimate of gamma. run$gamma ## print the estimate of mixture weights. run$psim[tem,] ## frequency table of the resulting clustering of the ## 500 observations among the 2 components. table(run$clust) ## print the value of the ICL criterion run$icl ## print the value of the BIC run$bic ## print the value of the loglikelihood run$ll ############################################################ #2. Example with Initialization 2 # ############################################################ #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Given the estimates of Example 1, estimate a 3-component mixture using ~ # Initialization 2. The number of different runs is set to $tsplit=2$ with ~ # each one of them using $msplit=5$ em iterations. ~ #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ run.previous<-run ## number of conditions q <- 3 ## number of covariates tau <- 1 ## number of components nc <- 3 ## estimated conditional probabilities for K=2 z <- run.previous$z ## number of iteration that the previous EM converged ml <- length(run.previous$psim)/(nc - 1) ## estimates of alpha when K=2 alpha <- array(run.previous$alpha[ml, , ], dim = c(q, nc - 1)) ## estimates of beta when K=2 beta <- array(run.previous$beta[ml, , , ], dim = c(q, nc - 1, tau)) clust <- run.previous$clust ##(estimated clusters when K=2) run <- bjkmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=3, nr=-10*log(10), maxnr=10, m1, m2, t1, t2, msplit=5, tsplit=2, prev.z=z, prev.clust=clust, start.type=2, prev.alpha=alpha, prev.beta=beta) # retrieve the iteration that EM converged tem <- length(run$psim)/nc # estimates of the mixture weights run$psim[tem,] # estimates of the regression constants alpha_{jk}, j = 1,2,3, k=1,..,3 run$alpha[tem,,] # estimates of the regression coefficients beta_{jk\tau}, j = 1,2,3, k=1,..,3, \tau=1 run$beta[tem,,,] # note: useR should specify larger values for Kmax, m1, m2, t1, t2, msplit and # tsplit for a complete analysis.
(m=2) Poisson GLM mixture.
This function applies EM algorithm for estimating a -component mixture of Poisson GLM's, using parameterization , that is the model. Initialization can be done using two different intialization schemes. The first one is a two-step small EM procedure. The second one is a random splitting small EM procedure based on results of a mixture with less components. Output of the function is the updates of the parameters at each iteration of the EM algorithm, the estimate of , the estimated clusters and conditional probabilities of the observations, as well as the values of the BIC, ICL and loglikelihood of the model.
bjmodel(reference, response, L, m, K, nr, maxnr, m1, m2, t1, t2, msplit, tsplit, prev.z, prev.clust, start.type, prev.alpha, prev.beta)bjmodel(reference, response, L, m, K, nr, maxnr, m1, m2, t1, t2, msplit, tsplit, prev.z, prev.clust, start.type, prev.alpha, prev.beta)
reference |
a numeric array of dimension |
response |
a numeric array of count data with dimension |
L |
numeric vector of positive integers containing the partition of the |
m |
positive integer denoting the maximum number of EM iterations. |
K |
positive integer denoting the number of mixture components. |
nr |
negative number denoting the tolerance for the convergence of the Newton Raphson iterations. |
maxnr |
positive integer denoting the maximum number of Newton Raphson iterations. |
m1 |
positive integer denoting the number of iterations for each call of the 1st small EM iterations used by Initialization 1 ( |
m2 |
positive integer denoting the number of iterations for each call of the 2nd small EM iterations used by Initialization 1 ( |
t1 |
positive integer denoting the number of different runs of the 1st small EM used by Initialization 1 ( |
t2 |
positive integer denoting the number of different runs of the 2nd small EM used by Initialization 1 ( |
msplit |
positive integer denoting the number of different runs for each call of the splitting small EM used by Initialization 2 ( |
tsplit |
positive integer denoting the number of different runs for each call of the splitting small EM used by Initialization 2 ( |
prev.z |
numeric array of dimension |
prev.clust |
numeric vector of length |
start.type |
binary variable (1 or 2) indicating the type of initialization (1 for initialization 1 and 2 for initialization 2). |
prev.alpha |
numeric array of dimension |
prev.beta |
numeric array of dimension |
alpha |
numeric array of dimension |
beta |
numeric array of dimension |
gamma |
numeric array of dimension |
psim |
numeric array of dimension |
clust |
numeric vector of length |
z |
numeric array of length |
bic |
numeric, the value of the BIC. |
icl |
numeric, the value of the ICL. |
ll |
numeric, the value of the loglikelihood, computed according to the |
Panagiotis Papastamoulis
init1.1.jk.j, init1.2.jk.j, init2.jk.j
############################################################ #1. Example with Initialization 1 # ############################################################ ## load a simulated dataset according to the b_jk model ## number of observations: 500 ## design: L=(3,2,1) data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] ## use Initialization 1 with 2 components ## the number of different 1st small runs equals t1=2, ## each one consisting of m1 = 5 iterations ## the number of different 2nd small runs equals t2=5, ## each one consisting of m2 = 10 iterations ## the maximum number of EM iterations is set to m = 1000. nc <- 2 run <- bjmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=nc, nr=-10*log(10), maxnr=10, m1=5, m2=10, t1=2, t2=5, msplit, tsplit, prev.z, prev.clust, start.type=1, prev.alpha, prev.beta) ## retrieve the iteration that the small em converged: tem <- length(run$psim)/nc ## print the estimate of regression constants alpha. run$alpha[tem,,] ## print the estimate of regression coefficients beta. beta <- run$beta[tem,,] ## print the estimate of gamma. run$gamma ## print the estimate of mixture weights. run$psim[tem,] ## frequency table of the resulting clustering of the ## 500 observations among the 2 components. table(run$clust) ## print the value of the ICL criterion run$icl ## print the value of the BIC run$bic ## print the value of the loglikelihood run$ll ############################################################ #2. Example with Initialization 2 # ############################################################ #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Given the estimates of Example 1, estimate a 11-component mixture using ~ # Initialization 2. The number of different runs is set to $tsplit=2$ with ~ # each one of them using $msplit=5$ em iterations. ~ #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ run.previous<-run ## number of conditions q <- 3 ## number of covariates tau <- 1 ## number of components nc <- 3 ## estimated conditional probabilities for K=10 z <- run.previous$z ## number of iteration that the previous EM converged ml <- length(run.previous$psim)/(nc - 1) ## estimates of alpha when K=2 alpha <- array(run.previous$alpha[ml, , ], dim = c(q, nc - 1)) ## estimates of beta when K=2 beta <- array(run.previous$beta[ml, , ], dim = c(q, tau)) clust <- run.previous$clust ##(estimated clusters when K=2) run <- bjmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=3, nr=-10*log(10), maxnr=10, m1, m2, t1, t2, msplit=5, tsplit=2, prev.z=z, prev.clust=clust, start.type=2, prev.alpha=alpha, prev.beta=beta) # retrieve the iteration that EM converged tem <- length(run$psim)/nc # estimates of the mixture weights run$psim[tem,] # estimates of the regression constants alpha_{jk}, j = 1,2,3, k=1,..,3 run$alpha[tem,,] # estimates of the regression coefficients beta_{j\tau}, j = 1,2,3, \tau=1 run$beta[tem,,] # note: useR should specify larger values for Kmax, m1, m2, t1, t2, msplit # and tsplit for a complete analysis.############################################################ #1. Example with Initialization 1 # ############################################################ ## load a simulated dataset according to the b_jk model ## number of observations: 500 ## design: L=(3,2,1) data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] ## use Initialization 1 with 2 components ## the number of different 1st small runs equals t1=2, ## each one consisting of m1 = 5 iterations ## the number of different 2nd small runs equals t2=5, ## each one consisting of m2 = 10 iterations ## the maximum number of EM iterations is set to m = 1000. nc <- 2 run <- bjmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=nc, nr=-10*log(10), maxnr=10, m1=5, m2=10, t1=2, t2=5, msplit, tsplit, prev.z, prev.clust, start.type=1, prev.alpha, prev.beta) ## retrieve the iteration that the small em converged: tem <- length(run$psim)/nc ## print the estimate of regression constants alpha. run$alpha[tem,,] ## print the estimate of regression coefficients beta. beta <- run$beta[tem,,] ## print the estimate of gamma. run$gamma ## print the estimate of mixture weights. run$psim[tem,] ## frequency table of the resulting clustering of the ## 500 observations among the 2 components. table(run$clust) ## print the value of the ICL criterion run$icl ## print the value of the BIC run$bic ## print the value of the loglikelihood run$ll ############################################################ #2. Example with Initialization 2 # ############################################################ #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Given the estimates of Example 1, estimate a 11-component mixture using ~ # Initialization 2. The number of different runs is set to $tsplit=2$ with ~ # each one of them using $msplit=5$ em iterations. ~ #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ run.previous<-run ## number of conditions q <- 3 ## number of covariates tau <- 1 ## number of components nc <- 3 ## estimated conditional probabilities for K=10 z <- run.previous$z ## number of iteration that the previous EM converged ml <- length(run.previous$psim)/(nc - 1) ## estimates of alpha when K=2 alpha <- array(run.previous$alpha[ml, , ], dim = c(q, nc - 1)) ## estimates of beta when K=2 beta <- array(run.previous$beta[ml, , ], dim = c(q, tau)) clust <- run.previous$clust ##(estimated clusters when K=2) run <- bjmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=3, nr=-10*log(10), maxnr=10, m1, m2, t1, t2, msplit=5, tsplit=2, prev.z=z, prev.clust=clust, start.type=2, prev.alpha=alpha, prev.beta=beta) # retrieve the iteration that EM converged tem <- length(run$psim)/nc # estimates of the mixture weights run$psim[tem,] # estimates of the regression constants alpha_{jk}, j = 1,2,3, k=1,..,3 run$alpha[tem,,] # estimates of the regression coefficients beta_{j\tau}, j = 1,2,3, \tau=1 run$beta[tem,,] # note: useR should specify larger values for Kmax, m1, m2, t1, t2, msplit # and tsplit for a complete analysis.
(m=3) Poisson GLM mixture.
This function applies EM algorithm for estimating a -component mixture of Poisson GLM's, using parameterization , that is the model. Initialization can be done using two different intialization schemes. The first one is a one-step small EM procedure. The second one is a random splitting small EM procedure based on results of a mixture with less components. Output of the function is the updates of the parameters at each iteration of the EM algorithm, the estimate of , the estimated clusters and conditional probabilities of the observations, as well as the values of the BIC, ICL and loglikelihood of the model.
bkmodel(reference, response, L, m, K, nr, maxnr, t2, m2, prev.z, prev.clust, start.type, prev.alpha, prev.beta)bkmodel(reference, response, L, m, K, nr, maxnr, t2, m2, prev.z, prev.clust, start.type, prev.alpha, prev.beta)
reference |
a numeric array of dimension |
response |
a numeric array of count data with dimension |
L |
numeric vector of positive integers containing the partition of the |
m |
positive integer denoting the maximum number of EM iterations. |
K |
positive integer denoting the number of mixture components. |
nr |
negative number denoting the tolerance for the convergence of the Newton Raphson iterations. |
maxnr |
positive integer denoting the maximum number of Newton Raphson iterations. |
t2 |
positive integer denoting the number of different runs of the small EM used by Initialization 1 ( |
m2 |
positive integer denoting the number of iterations for each call of the small EM iterations used by Initialization 1 ( |
prev.z |
numeric array of dimension |
prev.clust |
numeric vector of length |
start.type |
binary variable (1 or 2) indicating the type of initialization (1 for initialization 1 and 2 for initialization 2). |
prev.alpha |
numeric array of dimension |
prev.beta |
numeric array of dimension |
alpha |
numeric array of dimension |
beta |
numeric array of dimension |
gamma |
numeric array of dimension |
psim |
numeric array of dimension |
clust |
numeric vector of length |
z |
numeric array of length |
bic |
numeric, the value of the BIC. |
icl |
numeric, the value of the ICL. |
ll |
numeric, the value of the loglikelihood, computed according to the |
Panagiotis Papastamoulis
############################################################ #1. Example with Initialization 1 # ############################################################ ## load a simulated dataset according to the b_jk model ## number of observations: 500 ## design: L=(3,2,1) data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] ## use Initialization 1 with 2 components ## the number of different small runs equals t2=5, ## each one consisting of m1 = 5 iterations ## the maximum number of EM iterations is set to m = 1000. nc <- 2 run <- bkmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=nc, nr=-10*log(10), maxnr=10, t2=5, m2=5, prev.z, prev.clust, start.type=1, prev.alpha, prev.beta) ## retrieve the iteration that the small em converged: tem <- length(run$psim)/nc ## print the estimate of regression constants alpha. run$alpha[tem,,] ## print the estimate of regression coefficients beta. beta <- run$beta[tem,,] ## print the estimate of gamma. run$gamma ## print the estimate of mixture weights. run$psim[tem,] ## frequency table of the resulting clustering of the ## 500 observations among the 2 components. table(run$clust) ## print the value of the ICL criterion run$icl ## print the value of the BIC run$bic ## print the value of the loglikelihood run$ll ############################################################ #2. Example with Initialization 2 # ############################################################ #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Given the estimates of Example 1, estimate a 3-component mixture using ~ # Initialization 2. The number of different runs is set to $t2=2$ with ~ # each one of them using $m2=5$ em iterations. ~ #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ run.previous<-run ## number of conditions q <- 3 ## number of covariates tau <- 1 ## number of components nc <- 3 ## estimated conditional probabilities for K=10 z <- run.previous$z ## number of iteration that the previous EM converged ml <- length(run.previous$psim)/(nc - 1) ## estimates of alpha when K=2 alpha <- array(run.previous$alpha[ml, , ], dim = c(q, nc - 1)) ## estimates of beta when K=2 beta <- array(run.previous$beta[ml, , ], dim = c(nc - 1, tau)) clust <- run.previous$clust ##(estimated clusters when K=2) run <- bkmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=nc, nr=-10*log(10), maxnr=10, t2=2, m2=5, prev.z=z, prev.clust=clust, start.type=2, prev.alpha=alpha, prev.beta=beta) # retrieve the iteration that EM converged tem <- length(run$psim)/nc # estimates of the mixture weights run$psim[tem,] # estimates of the regression constants alpha_{jk}, j = 1,2,3, k=1,..,11 run$alpha[tem,,] # estimates of the regression coefficients beta_{k\tau}, k = 1,..,11, \tau=1 run$beta[tem,,] # note: useR should specify larger values for Kmax, m1, m2, t1, t2 # for a complete analysis.############################################################ #1. Example with Initialization 1 # ############################################################ ## load a simulated dataset according to the b_jk model ## number of observations: 500 ## design: L=(3,2,1) data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] ## use Initialization 1 with 2 components ## the number of different small runs equals t2=5, ## each one consisting of m1 = 5 iterations ## the maximum number of EM iterations is set to m = 1000. nc <- 2 run <- bkmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=nc, nr=-10*log(10), maxnr=10, t2=5, m2=5, prev.z, prev.clust, start.type=1, prev.alpha, prev.beta) ## retrieve the iteration that the small em converged: tem <- length(run$psim)/nc ## print the estimate of regression constants alpha. run$alpha[tem,,] ## print the estimate of regression coefficients beta. beta <- run$beta[tem,,] ## print the estimate of gamma. run$gamma ## print the estimate of mixture weights. run$psim[tem,] ## frequency table of the resulting clustering of the ## 500 observations among the 2 components. table(run$clust) ## print the value of the ICL criterion run$icl ## print the value of the BIC run$bic ## print the value of the loglikelihood run$ll ############################################################ #2. Example with Initialization 2 # ############################################################ #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Given the estimates of Example 1, estimate a 3-component mixture using ~ # Initialization 2. The number of different runs is set to $t2=2$ with ~ # each one of them using $m2=5$ em iterations. ~ #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ run.previous<-run ## number of conditions q <- 3 ## number of covariates tau <- 1 ## number of components nc <- 3 ## estimated conditional probabilities for K=10 z <- run.previous$z ## number of iteration that the previous EM converged ml <- length(run.previous$psim)/(nc - 1) ## estimates of alpha when K=2 alpha <- array(run.previous$alpha[ml, , ], dim = c(q, nc - 1)) ## estimates of beta when K=2 beta <- array(run.previous$beta[ml, , ], dim = c(nc - 1, tau)) clust <- run.previous$clust ##(estimated clusters when K=2) run <- bkmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=nc, nr=-10*log(10), maxnr=10, t2=2, m2=5, prev.z=z, prev.clust=clust, start.type=2, prev.alpha=alpha, prev.beta=beta) # retrieve the iteration that EM converged tem <- length(run$psim)/nc # estimates of the mixture weights run$psim[tem,] # estimates of the regression constants alpha_{jk}, j = 1,2,3, k=1,..,11 run$alpha[tem,,] # estimates of the regression coefficients beta_{k\tau}, k = 1,..,11, \tau=1 run$beta[tem,,] # note: useR should specify larger values for Kmax, m1, m2, t1, t2 # for a complete analysis.
() or () parameterization.
This function is the first step of the two-step small initialization procedure (Initialization 1), used for the parameterizations () or (). For each condition , a small EM is run in order to find some good starting values for the -component mixtures: , independently for each . These values are used in order to initialize the second step (init1.2.jk.j) of the small EM algorithm for fitting the overall mixture .
init1.1.jk.j(reference, response, L, K, t1, model, m1,mnr)init1.1.jk.j(reference, response, L, K, t1, model, m1,mnr)
reference |
a numeric array of dimension |
response |
a numeric array of count data with dimension |
L |
numeric vector of positive integers containing the partition of the |
K |
positive integer denoting the number of mixture components. |
t1 |
positive integer denoting the number of different runs. |
model |
binary variable denoting the parameterization of the model: 1 for |
m1 |
positive integer denoting the number of iterations for each run. |
mnr |
positive integer denoting the maximum number of Newton-Raphson iterations. |
alpha |
numeric array of dimension |
beta |
numeric array of dimension |
psim |
numeric vector of length |
ll |
numeric, the value of the loglikelihood, computed according to the |
Panagiotis Papastamoulis
init1.2.jk.j, bjkmodel, bjmodel
############################################################ #1. Example with beta_jk (m=1) model # ############################################################ ## load a simulated dataset according to the b_jk model ## number of observations: 500 ## design: L=(3,2,1) data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] ## initialize the component specific parameters ## for a 2 component mixture start1 <- init1.1.jk.j(reference=x, response=y, L=c(3,2,1), K=2, t1=3, model=1, m1=5,mnr = 5) summary(start1) ############################################################ #2. Example with beta_j (m=2) model # ############################################################ start1 <- init1.1.jk.j(reference=x, response=y, L=c(3,2,1), K=2, t1=3, model=2, m1=5,mnr = 5) summary(start1)############################################################ #1. Example with beta_jk (m=1) model # ############################################################ ## load a simulated dataset according to the b_jk model ## number of observations: 500 ## design: L=(3,2,1) data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] ## initialize the component specific parameters ## for a 2 component mixture start1 <- init1.1.jk.j(reference=x, response=y, L=c(3,2,1), K=2, t1=3, model=1, m1=5,mnr = 5) summary(start1) ############################################################ #2. Example with beta_j (m=2) model # ############################################################ start1 <- init1.1.jk.j(reference=x, response=y, L=c(3,2,1), K=2, t1=3, model=2, m1=5,mnr = 5) summary(start1)
() or () parameterization.
This function is the second step of the two-step small initialization procedure (Initialization 1), used for parameterizations or . At first, init1.1.jk.j is called for each condition . The values obtained from the first step are used for initializing the second step of the small EM algorithm for fitting the overall mixture . The selected values from the second step are the ones that initialize the EM algorithm (bjkmodel or bjmodel), when .
init1.2.jk.j(reference, response, L, K, m1, m2, t1, t2, model,mnr)init1.2.jk.j(reference, response, L, K, m1, m2, t1, t2, model,mnr)
reference |
a numeric array of dimension |
response |
a numeric array of count data with dimension |
L |
numeric vector of positive integers containing the partition of the |
K |
positive integer denoting the number of mixture components. |
m1 |
positive integer denoting the number of iterations for each run of |
m2 |
positive integer denoting the number of iterations for each run of |
t1 |
positive integer denoting the number of different runs of |
t2 |
positive integer denoting the number of different runs of |
model |
binary variable denoting the parameterization of the model: 1 for |
mnr |
positive integer denoting the maximum number of Newton-Raphson iterations. |
alpha |
numeric array of dimension |
beta |
numeric array of dimension |
psim |
numeric vector of length |
ll |
numeric, the value of the loglikelihood, computed according to the |
Panagiotis Papastamoulis
init1.1.jk.j, bjkmodel, bjmodel
############################################################ #1. Example with beta_jk (m=1) model # ############################################################ ## load a simulated dataset according to the b_jk model ## number of observations: 500 ## design: L=(3,2,1) data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] ## initialize the parameters for a 2 component mixture ## the number of the overall small runs are t2 = 2 ## each one consisting of m2 = 2 iterations of the EM. ## the number of the small runs for the first step small EM ## is t1 = 2, each one consisting of m1 = 2 iterations. start2 <- init1.2.jk.j(reference=x, response=y, L=c(3,2,1), K=2, m1=2, m2=2, t1=2, t2=2, model=1,mnr = 3) summary(start2) ############################################################ #2. Example with beta_j (m=2) model # ############################################################ ## initialize the parameters for a 2 component mixture ## the number of the overall small runs are t2 = 3 ## each one consisting of m2 = 2 iterations of the EM. ## the number of the small runs for the first step small EM ## is t1 = 2, each one consisting of m1 = 2 iterations. start2 <- init1.2.jk.j(reference=x, response=y, L=c(3,2,1), K=2, m1=2, m2=2, t1=2, t2=3, model=2,mnr = 5) summary(start2)############################################################ #1. Example with beta_jk (m=1) model # ############################################################ ## load a simulated dataset according to the b_jk model ## number of observations: 500 ## design: L=(3,2,1) data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] ## initialize the parameters for a 2 component mixture ## the number of the overall small runs are t2 = 2 ## each one consisting of m2 = 2 iterations of the EM. ## the number of the small runs for the first step small EM ## is t1 = 2, each one consisting of m1 = 2 iterations. start2 <- init1.2.jk.j(reference=x, response=y, L=c(3,2,1), K=2, m1=2, m2=2, t1=2, t2=2, model=1,mnr = 3) summary(start2) ############################################################ #2. Example with beta_j (m=2) model # ############################################################ ## initialize the parameters for a 2 component mixture ## the number of the overall small runs are t2 = 3 ## each one consisting of m2 = 2 iterations of the EM. ## the number of the small runs for the first step small EM ## is t1 = 2, each one consisting of m1 = 2 iterations. start2 <- init1.2.jk.j(reference=x, response=y, L=c(3,2,1), K=2, m1=2, m2=2, t1=2, t2=3, model=2,mnr = 5) summary(start2)
parameterization ().
This function is the small initialization procedure (Initialization 1) for parameterization . The selected values are the ones that initialize the EM algorithm bkmodel.
init1.k(reference, response, L, K, t2, m2,mnr)init1.k(reference, response, L, K, t2, m2,mnr)
reference |
a numeric array of dimension |
response |
a numeric array of count data with dimension |
L |
numeric vector of positive integers containing the partition of the |
K |
positive integer denoting the number of mixture components. |
t2 |
positive integer denoting the number of different runs. |
m2 |
positive integer denoting the number of iterations for each run. |
mnr |
positive integer denoting the maximum number of Newton-Raphson iterations. |
alpha |
numeric array of dimension |
beta |
numeric array of dimension |
psim |
numeric vector of length |
ll |
numeric, the value of the loglikelihood, computed according to the |
Panagiotis Papastamoulis
## load a simulated dataset according to the b_jk model ## number of observations: 500 ## design: L=(3,2,1) data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] ## initialize the parameters for a 2 component mixture ## the number of the small runs are t2 = 3 ## each one consisting of m2 = 5 iterations of the EM. start1 <- init1.k(reference=x, response=y, L=c(3,2,1), K=2, m2=5, t2=3,mnr = 5) summary(start1)## load a simulated dataset according to the b_jk model ## number of observations: 500 ## design: L=(3,2,1) data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] ## initialize the parameters for a 2 component mixture ## the number of the small runs are t2 = 3 ## each one consisting of m2 = 5 iterations of the EM. start1 <- init1.k(reference=x, response=y, L=c(3,2,1), K=2, m2=5, t2=3,mnr = 5) summary(start1)
() or () parameterization.
This function applies a random splitting small EM initialization scheme (Initialization 2), for parameterizations or 2. It can be implemented only in case where a previous run of the EM algorithm is available (with respect to the same parameterization). The initialization scheme proposes random splits of the existing clusters, increasing the number of mixture components by one. Then an EM is ran for (msplit) iterations and the procedure is repeated for tsplit times. The best values in terms of observed loglikelihood are chosen to initialize the main EM algorithm (bjkmodel or bjmodel).
init2.jk.j(reference, response, L, K, tsplit, model, msplit, previousz, previousclust, previous.alpha, previous.beta,mnr)init2.jk.j(reference, response, L, K, tsplit, model, msplit, previousz, previousclust, previous.alpha, previous.beta,mnr)
reference |
a numeric array of dimension |
response |
a numeric array of count data with dimension |
L |
numeric vector of positive integers containing the partition of the |
K |
positive integer denoting the number of mixture components. |
tsplit |
positive integer denoting the number of different runs. |
model |
binary variable denoting the parameterization of the model: 1 for |
msplit |
positive integer denoting the number of iterations for each run. |
previousz |
numeric array of dimension |
previousclust |
numeric vector of length $n$ containing the estimated clusters according to the MAP rule obtained by the previous run of EM. |
previous.alpha |
numeric array of dimension |
previous.beta |
numeric array of dimension |
mnr |
positive integer denoting the maximum number of Newton-Raphson iterations. |
alpha |
numeric array of dimension |
beta |
numeric array of dimension |
psim |
numeric vector of length |
ll |
numeric, the value of the loglikelihood, computed according to the |
In case that an exhaustive search is desired instead of a random selection of the splitted components, use tsplit = -1.
Panagiotis Papastamoulis
init1.1.jk.j, init1.2.jk.j, bjkmodel, bjmodel
data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] # At first a 2 component mixture is fitted using parameterization $m=1$. run.previous<-bjkmodel(reference=x, response=y, L=c(3,2,1), m=100, K=2, nr=-10*log(10), maxnr=5, m1=2, m2=2, t1=1, t2=2, msplit, tsplit, prev.z, prev.clust, start.type=1, prev.alpha, prev.beta) ## Then the estimated clusters and parameters are used to initialize a ## 3 component mixture using Initialization 2. The number of different ## runs is set to $tsplit=3$ with each one of them using msplit = 2 ## em iterations. q <- 3 tau <- 1 nc <- 3 z <- run.previous$z ml <- length(run.previous$psim)/(nc - 1) alpha <- array(run.previous$alpha[ml, , ], dim = c(q, nc - 1)) beta <- array(run.previous$beta[ml, , , ], dim = c(q, nc - 1, tau)) clust <- run.previous$clust run<-init2.jk.j(reference=x, response=y, L=c(3,2,1), K=nc, tsplit=2, model=1, msplit=2, previousz=z, previousclust=clust, previous.alpha=alpha, previous.beta=beta,mnr = 5) # note: useR should specify larger values for msplit and tsplit for a complete analysis.data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] # At first a 2 component mixture is fitted using parameterization $m=1$. run.previous<-bjkmodel(reference=x, response=y, L=c(3,2,1), m=100, K=2, nr=-10*log(10), maxnr=5, m1=2, m2=2, t1=1, t2=2, msplit, tsplit, prev.z, prev.clust, start.type=1, prev.alpha, prev.beta) ## Then the estimated clusters and parameters are used to initialize a ## 3 component mixture using Initialization 2. The number of different ## runs is set to $tsplit=3$ with each one of them using msplit = 2 ## em iterations. q <- 3 tau <- 1 nc <- 3 z <- run.previous$z ml <- length(run.previous$psim)/(nc - 1) alpha <- array(run.previous$alpha[ml, , ], dim = c(q, nc - 1)) beta <- array(run.previous$beta[ml, , , ], dim = c(q, nc - 1, tau)) clust <- run.previous$clust run<-init2.jk.j(reference=x, response=y, L=c(3,2,1), K=nc, tsplit=2, model=1, msplit=2, previousz=z, previousclust=clust, previous.alpha=alpha, previous.beta=beta,mnr = 5) # note: useR should specify larger values for msplit and tsplit for a complete analysis.
parameterization ().
This function applies a random splitting small EM initialization scheme (Initialization 2), for parameterization . It can be implemented only in case where a previous run of the EM algorithm is available (with respect to the same parameterization). The initialization scheme proposes random splits of the existing clusters, increasing the number of mixture components by one. Then EM is ran for (m2) iterations, and the procedure is repeated for t2 times. The best values in terms of observed loglikelihood are chosen in order to initialize the main EM algorithm (bkmodel), when .
init2.k(reference, response, L, K, t2, m2, previousz, previousclust, previous.alpha, previous.beta,mnr)init2.k(reference, response, L, K, t2, m2, previousz, previousclust, previous.alpha, previous.beta,mnr)
reference |
a numeric array of dimension |
response |
a numeric array of count data with dimension |
L |
numeric vector of positive integers containing the partition of the |
K |
positive integer denoting the number of mixture components. |
t2 |
positive integer denoting the number of different runs. |
m2 |
positive integer denoting the number of iterations for each run. |
previousz |
numeric array of dimension |
previousclust |
numeric vector of length |
previous.alpha |
numeric array of dimension |
previous.beta |
numeric array of dimension |
mnr |
positive integer denoting the maximum number of Newton-Raphson iterations. |
alpha |
numeric array of dimension |
beta |
numeric array of dimension |
psim |
numeric vector of length |
ll |
numeric, the value of the loglikelihood, computed according to the |
In case that an exhaustive search is desired instead of a random selection of the splitted components, uset2 = -1.
Panagiotis Papastamoulis
# this is to be used as an example with the simulated data data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] # At first a 2 component mixture is fitted using parameterization $m=1$. run.previous<-bkmodel(reference=x, response=y, L=c(3,2,1), m=100, K=2, nr=-10*log(10), maxnr=5, m2=3, t2=3, prev.z, prev.clust, start.type=1, prev.alpha, prev.beta) ## Then the estimated clusters and parameters are used to initialize a ## 3 component mixture using Initialization 2. The number of different ## runs is set to tsplit=3 with each one of them using msplit = 5 ## em iterations. q <- 3 tau <- 1 nc <- 3 z <- run.previous$z ml <- length(run.previous$psim)/(nc - 1) alpha <- array(run.previous$alpha[ml, , ], dim = c(q, nc - 1)) beta <- array(run.previous$beta[ml, , ], dim = c(nc - 1, tau)) clust <- run.previous$clust run<-init2.k(reference=x, response=y, L=c(3,2,1), K=nc, t2=3, m2=5, previousz=z, previousclust=clust, previous.alpha=alpha, previous.beta=beta,mnr = 5) summary(run) # note: useR should specify larger values for m2, t2 for a complete analysis.# this is to be used as an example with the simulated data data("simulated_data_15_components_bjk") x <- sim.data[,1] x <- array(x,dim=c(length(x),1)) y <- sim.data[,-1] # At first a 2 component mixture is fitted using parameterization $m=1$. run.previous<-bkmodel(reference=x, response=y, L=c(3,2,1), m=100, K=2, nr=-10*log(10), maxnr=5, m2=3, t2=3, prev.z, prev.clust, start.type=1, prev.alpha, prev.beta) ## Then the estimated clusters and parameters are used to initialize a ## 3 component mixture using Initialization 2. The number of different ## runs is set to tsplit=3 with each one of them using msplit = 5 ## em iterations. q <- 3 tau <- 1 nc <- 3 z <- run.previous$z ml <- length(run.previous$psim)/(nc - 1) alpha <- array(run.previous$alpha[ml, , ], dim = c(q, nc - 1)) beta <- array(run.previous$beta[ml, , ], dim = c(nc - 1, tau)) clust <- run.previous$clust run<-init2.k(reference=x, response=y, L=c(3,2,1), K=nc, t2=3, m2=5, previousz=z, previousclust=clust, previous.alpha=alpha, previous.beta=beta,mnr = 5) summary(run) # note: useR should specify larger values for m2, t2 for a complete analysis.
This function computes the observed loglikelihood given the means and the mixing proportions of each component. Instead of computing , , where , are computed for all . Let , then .
mylogLikePoisMix(y, mean, pi)mylogLikePoisMix(y, mean, pi)
y |
a numeric array of count data with dimension |
mean |
a list of length K (the number of mixture components) of positive data. Each list element is a matrix with dimension |
pi |
a numeric vector of length K (the number of mixture components) containing the mixture weights. |
ll |
the value of the loglikelihood. |
Panagiotis Papastamoulis
## This example computes the loglikelihood of a K = 10 component ## Poisson GLM mixture. The number of response variables is ## d = 6, while the sample size equals to n = 5000. They are ## stored in the array sim.data[,-1]. The number of covariates ## equals 1 (corresponding to sim.data[,1]). We will use a ## random generation of the regression coefficients alpha and ## beta, in order to show that the loglikelihood can be computed ## without computational errors even in cases where the parameters ## are quite ''bad'' for the data. data("simulated_data_15_components_bjk_full") K <- 10 d <- 6 n <- dim(sim.data)[1] condmean=vector("list",length=K) weights<-rep(1,K)/K ar<-array(data=NA,dim=c(n,d)) for (k in 1:K){ for (i in 1:d){ ar[,i]<-runif(n)+(1+0.1*(runif(n)-1))*sim.data[,1]} condmean[[k]]<-ar} mylogLikePoisMix(sim.data[,-1],condmean,weights)## This example computes the loglikelihood of a K = 10 component ## Poisson GLM mixture. The number of response variables is ## d = 6, while the sample size equals to n = 5000. They are ## stored in the array sim.data[,-1]. The number of covariates ## equals 1 (corresponding to sim.data[,1]). We will use a ## random generation of the regression coefficients alpha and ## beta, in order to show that the loglikelihood can be computed ## without computational errors even in cases where the parameters ## are quite ''bad'' for the data. data("simulated_data_15_components_bjk_full") K <- 10 d <- 6 n <- dim(sim.data)[1] condmean=vector("list",length=K) weights<-rep(1,K)/K ar<-array(data=NA,dim=c(n,d)) for (k in 1:K){ for (i in 1:d){ ar[,i]<-runif(n)+(1+0.1*(runif(n)-1))*sim.data[,1]} condmean[[k]]<-ar} mylogLikePoisMix(sim.data[,-1],condmean,weights)
This function is the main function of the package. User has only to call it by specifying the data ( and ), the vector , the parameterization (), the desirable range for the number of components, the type of initialization and the number of EM runs and iterations for the small-EM strategy. When , EM algorithm is initialized according to Initialization scheme 1 (the functions init1.1.jk.j, init1.2.jk.j, init1.k). For consecutive run (), EM algorithm is initialized using Initialization 2 (the functions init2.jk.j or init2.k).
pois.glm.mix(reference, response, L, m, max.iter, Kmin, Kmax, m1, m2, t1, t2, msplit, tsplit,mnr)pois.glm.mix(reference, response, L, m, max.iter, Kmin, Kmax, m1, m2, t1, t2, msplit, tsplit,mnr)
reference |
a numeric array of dimension |
response |
a numeric array of count data with dimension |
L |
numeric vector of positive integers containing the partition of the |
m |
variable denoting the parameterization of the model: 1 for |
max.iter |
positive integer denoting the maximum number of EM iterations. |
Kmin |
the minimum number of mixture components. |
Kmax |
the maximum number of mixture components. |
m1 |
positive integer denoting the number of iterations for each call of the 1st small EM iterations used by Initialization 1 ( |
m2 |
positive integer denoting the number of iterations for each call of the overall small EM iterations used by Initialization 1 ( |
t1 |
positive integer denoting the number of different runs of the 1st small EM used by Initialization 1 ( |
t2 |
positive integer denoting the number of different runs of the overall small EM used by Initialization 1 ( |
msplit |
positive integer denoting the number of different runs for each call of the splitting small EM used by Initialization 2 ( |
tsplit |
positive integer denoting the number of different runs for each call of the splitting small EM used by Initialization 2 ( |
mnr |
positive integer denoting the maximum number of Newton-Raphson iterations. |
The output of the function is a list of lists. During the run of the function pois.glm.mix two R graphic devices are opened: The first one contains the graph of the information criteria (BIC and ICL). In the second graphe, the resulting fitted clusters per condition are plotted until the ICL criterion no longer selects a better model. Notice that in this graph the replicates of condition are summed.
The EM algorithm is run until the increase to the loglikelihood of the mixture model is less than . The Newton - Raphson iterations at the Maximization step of EM algorithm are repeated until the square Euclidean norm of the gradient vector of the component specific parameters is less than .
information.criteria |
numeric array of dimension |
runs |
A list containing the output for the estimated mixture for each |
sel.mod.icl |
The selected number of mixture components according to the ICL criterion. |
sel.mod.bic |
The selected number of mixture components according to the BIC. |
est.sel.mod.icl |
The final estimates for the selected number of mixture components according to the ICL criterion. It is a list containing |
est.sel.mod.bic |
The final estimates for the selected number of mixture components according to the BIC. It is a list containing |
In case that an exhaustive search is desired instead of a random selection of the splitted components, use tsplit = -1.
Panagiotis Papastamoulis
bjkmodel, bjmodel, bkmodel, init1.1.jk.j, init1.2.jk.j, init1.k, init2.jk.j, init2.k, mylogLikePoisMix
## load a small dataset of 500 observations data("simulated_data_15_components_bjk") ## in this example there is V = 1 covariates (x) ## and d = 6 response variables (y). The design is ## L = (3,2,1). V <- 1 x <- array(sim.data[,1],dim=c(dim(sim.data)[1],V)) y <- sim.data[,-1] ## We will run the algorithm using parameterization ## m = 1 and the number of components in the set ## {2,3,4}. rr<-pois.glm.mix(reference=x, response=y, L=c(3,2,1), m=1, max.iter=1000, Kmin=2, Kmax= 4, m1=3, m2=3, t1=3, t2=3, msplit=3, tsplit=3,mnr = 5) # note: useR should specify larger values for Kmax, m1, m2, # t1, t2, msplit and tsplit for a complete analysis. # retrieve the selected models according to BIC or ICL rr$sel.mod.icl rr$sel.mod.bic # retrieve the estimates according to ICL # alpha rr$est.sel.mod.icl$alpha # beta rr$est.sel.mod.icl$beta # gamma rr$est.sel.mod.icl$gamma # pi rr$est.sel.mod.icl$pi # frequency table with estimated clusters table(rr$est.sel.mod.icl$clust) # histogram of the maximum conditional probabilities hist(apply(rr$est.sel.mod.icl$tau,1,max),30) ##(the full data of 5000 observations can be loaded using ## data("simulated_data_15_components_bjk_full")## load a small dataset of 500 observations data("simulated_data_15_components_bjk") ## in this example there is V = 1 covariates (x) ## and d = 6 response variables (y). The design is ## L = (3,2,1). V <- 1 x <- array(sim.data[,1],dim=c(dim(sim.data)[1],V)) y <- sim.data[,-1] ## We will run the algorithm using parameterization ## m = 1 and the number of components in the set ## {2,3,4}. rr<-pois.glm.mix(reference=x, response=y, L=c(3,2,1), m=1, max.iter=1000, Kmin=2, Kmax= 4, m1=3, m2=3, t1=3, t2=3, msplit=3, tsplit=3,mnr = 5) # note: useR should specify larger values for Kmax, m1, m2, # t1, t2, msplit and tsplit for a complete analysis. # retrieve the selected models according to BIC or ICL rr$sel.mod.icl rr$sel.mod.bic # retrieve the estimates according to ICL # alpha rr$est.sel.mod.icl$alpha # beta rr$est.sel.mod.icl$beta # gamma rr$est.sel.mod.icl$gamma # pi rr$est.sel.mod.icl$pi # frequency table with estimated clusters table(rr$est.sel.mod.icl$clust) # histogram of the maximum conditional probabilities hist(apply(rr$est.sel.mod.icl$tau,1,max),30) ##(the full data of 5000 observations can be loaded using ## data("simulated_data_15_components_bjk_full")
This package can be used to cluster high dimensional count data under the presence of covariates. A mixture of Poisson Generalized Linear models (GLM's) is proposed. Conditionally to the covariates, Poisson multivariate distribution describing each cluster is a product of independent Poisson distributions. Different parameterizations for the slopes are proposed. Case of partioning the response variables into a set of replicates is considered. Poisson GLM mixture is estimated via Expectation Maximization (EM) algorithm with Newton-Raphson steps. An efficient initialization of EM algorithm is proposed to improve parameter estimation. It is a splitting scheme which is combined with a Small EM strategy. The user is referred to the function pois.glm.mix for an automatic evaluation of the proposed methodology.
| Package: | poisson.glm.mix |
| Type: | Package |
| Version: | 1.4 |
| Date: | 2023-08-19 |
Assume that the observed data can be written as where , , , with and , . Index denotes the observation, while the vector defines a partition of the variables into blocks: the first block consists of the first variables, the second block consists of the next variables and so on. We will refer to and using the terms “condition” and “replicate”, respectively. In addition to , consider that a vector of covariates is observed, denoted by , for all . Assume now that conditional to , a model indicator taking values in the discrete set and a positive integer , the response , is a realization of the corresponding random vector
where denotes the Poisson distribution. The following parameterizations for the Poisson means are considered: If (the “” parameterization), then
If (the “” parameterization), then
If (the “” parameterization), then
For identifiability purposes assume that , .
Papastamoulis Panagiotis Maintainer: Papastamoulis Panagiotis <[email protected]>
Papastamoulis, P., Martin-Magniette, M. L., & Maugis-Rabusseau, C. (2016). On the estimation of mixtures of Poisson regression models with large number of components. Computational Statistics & Data Analysis, 93, 97-106.
## load a small dataset of 500 observations data("simulated_data_15_components_bjk") ## in this example there is V = 1 covariates (x) ## and d = 6 response variables (y). The design is ## L = (3,2,1). V <- 1 x <- array(sim.data[,1],dim=c(dim(sim.data)[1],V)) y <- sim.data[,-1] ## We will run the algorithm using parameterization ## m = 1 and the number of components in the set ## {2,3,4}. rr<-pois.glm.mix(reference=x, response=y, L=c(3,2,1), m=1, max.iter=1000, Kmin=2, Kmax= 4, m1=3, m2=3, t1=3, t2=3, msplit=4, tsplit=3,mnr = 5) # note: useR should specify larger values for Kmax, m1, m2, t1, # t2, msplit and tsplit for a complete analysis. # retrieve the selected models according to BIC or ICL rr$sel.mod.icl rr$sel.mod.bic # retrieve the estimates according to ICL # alpha rr$est.sel.mod.icl$alpha # beta rr$est.sel.mod.icl$beta # gamma rr$est.sel.mod.icl$gamma # pi rr$est.sel.mod.icl$pi # frequency table with estimated clusters table(rr$est.sel.mod.icl$clust) # histogram of the maximum conditional probabilities hist(apply(rr$est.sel.mod.icl$tau,1,max),30) ##(the full data of 5000 observations can be loaded using ## data("simulated_data_15_components_bjk_full")## load a small dataset of 500 observations data("simulated_data_15_components_bjk") ## in this example there is V = 1 covariates (x) ## and d = 6 response variables (y). The design is ## L = (3,2,1). V <- 1 x <- array(sim.data[,1],dim=c(dim(sim.data)[1],V)) y <- sim.data[,-1] ## We will run the algorithm using parameterization ## m = 1 and the number of components in the set ## {2,3,4}. rr<-pois.glm.mix(reference=x, response=y, L=c(3,2,1), m=1, max.iter=1000, Kmin=2, Kmax= 4, m1=3, m2=3, t1=3, t2=3, msplit=4, tsplit=3,mnr = 5) # note: useR should specify larger values for Kmax, m1, m2, t1, # t2, msplit and tsplit for a complete analysis. # retrieve the selected models according to BIC or ICL rr$sel.mod.icl rr$sel.mod.bic # retrieve the estimates according to ICL # alpha rr$est.sel.mod.icl$alpha # beta rr$est.sel.mod.icl$beta # gamma rr$est.sel.mod.icl$gamma # pi rr$est.sel.mod.icl$pi # frequency table with estimated clusters table(rr$est.sel.mod.icl$clust) # histogram of the maximum conditional probabilities hist(apply(rr$est.sel.mod.icl$tau,1,max),30) ##(the full data of 5000 observations can be loaded using ## data("simulated_data_15_components_bjk_full")
This is a small dataset of 500 observations according to the parameterization. The number of reference variables is 1 and correspond to the values in the first column of the array sim.data. The number of response variables is 6 and correspond to the values in the last 6 column of the array sim.data. There are conditions, while the number of replicates per condition is .
data(simulated_data_15_components_bjk)data(simulated_data_15_components_bjk)
A numeric array (sim.data) containing observations.