This tutorial assumes that a reader is familiar with core **semopy** concepts, such as syntax and models. If that's not the case, read Introduction and Syntax sections of the **semopy** tutorial. **semba** uses exactly the same syntax as **semopy** for specifying SEM models, however there are new operations that are used to impose custom priors on parameters.

There are default priors for loadings and (co)variances that are assumed automatically. Namely,

Regression coefficients

\( \mathcal{N} (x_0 ,0.75^2)\) if \(x_0 \neq 0\), else \( \mathcal{N} (0 , 1.5^2)\)

Variance of observed variables

\(\mathcal{HN} (x_0^2 {\frac{\pi}{2}})\)

Covariance of observed variables

\(\mathcal{N} (x_0, 1 )\)

Variance of latent variables

\(\mathcal{HN} ({\frac{\pi}{2}})\)

Covariance if one variable is latent

\(\mathcal{N} (0, 1 )\)

Random effects variance

\(\mathcal{HN} ({\frac{\pi}{2^3}})\)

Random effects covariance

\(\mathcal{N} (0, 0.5^2 )\)

To override default priors and to impose a custom priot on a specific parameter, a following set of operations can be used:

Imposes a custom prior on parameters that are listed after operation. For example,

y ~ a*x1 + b*x2 + c*x3 PRIOR(Normal, loc=0, scale=2) a b PRIOR(Exponential, rate=0.5) c

Overrides default prior for regression coefficients. An example:

PRIOR_LOADING(Normal, loc=0, scale=1)

Overrides default prior for variances of observed variables. An example:

PRIOR_VARIANCE(HalfCauchy, scale=1)

Overrides default prior for covariances of observed variables.

Overrides default prior for variances of latent variables.

Overrides default prior for covariances of latent variables.

Overrides default prior for variances of random effects.

Overrides default prior for covariances of random effects.

Any **Numpyro** distribution can be passed to the operations above. For a detailed list with possible parameters, see official **Numpyro** documentation here.

Model

Conventional SEM model.

ModelMeans

SEM model with a fixed effects part that can be used to model intercepts.

ModelEffects

SEM model with random effects distributed with a static covariance matrix.

(soon) ModelGeneralizedEffects

SEM model with random effects distributed with a parametrised covariance matrix, aka. a Gaussian Process SEM.

from semopy.examples import political_democracy as ex import semba desc, data = ex.get_model(), ex.get_data() model = semba.Model(desc) model.fit(data) ins = model.inspect() print(ins)This produces an output:

lval op rval Estimate 5.0% 95.0% std n_eff r_hat _b12 dem60 ~ ind60 -1.57 -2.37 -0.88 0.47 1,225.98 1.00 _b13 dem65 ~ ind60 -0.65 -1.15 -0.15 0.32 1,035.10 1.00 _b14 dem65 ~ dem60 0.91 0.71 1.08 0.11 612.31 1.00 _b1 x1 ~ ind60 1.00 - - - - - _b2 x2 ~ ind60 -2.48 -2.99 -1.94 0.32 867.74 1.00 _b3 x3 ~ ind60 -2.28 -2.77 -1.79 0.30 696.56 1.00 _b4 y1 ~ dem60 1.00 - - - - - _b5 y2 ~ dem60 1.41 1.07 1.75 0.20 818.40 1.00 _b6 y3 ~ dem60 1.15 0.88 1.41 0.16 915.11 1.00 _b7 y4 ~ dem60 1.42 1.14 1.67 0.17 514.43 1.00 _b8 y5 ~ dem65 1.00 - - - - - _b9 y6 ~ dem65 1.22 0.96 1.48 0.16 856.59 1.00 _b10 y7 ~ dem65 1.30 1.07 1.57 0.15 869.51 1.00 _b11 y8 ~ dem65 1.29 1.03 1.54 0.16 800.41 1.00 _c15 dem60 ~~ dem60 2.99 2.11 3.99 0.58 765.03 1.00 _c17 dem65 ~~ dem65 0.33 0.01 0.64 0.22 725.92 1.00 _c20 ind60 ~~ ind60 0.30 0.17 0.42 0.08 731.05 1.00 _c1 y1 ~~ y5 0.70 0.13 1.23 0.34 872.95 1.00 _c2 y1 ~~ y1 2.23 1.43 2.92 0.47 874.95 1.00 _c3 y2 ~~ y4 0.79 -0.10 1.57 0.52 685.30 1.00 _c4 y2 ~~ y6 1.42 0.53 2.14 0.49 1,343.32 1.00 _c5 y2 ~~ y2 6.47 4.72 8.12 1.02 1,227.05 1.00 _c6 y3 ~~ y7 0.72 -0.12 1.50 0.51 1,285.40 1.00 _c7 y3 ~~ y3 5.10 3.59 6.38 0.86 1,342.31 1.00 _c8 y4 ~~ y8 0.35 -0.25 1.09 0.40 1,043.56 1.00 _c9 y4 ~~ y4 3.03 1.87 4.17 0.70 675.44 1.00 _c10 y6 ~~ y8 1.04 0.37 1.84 0.47 976.38 1.00 _c11 y6 ~~ y6 4.52 3.24 5.73 0.77 914.27 1.00 _c12 x2 ~~ x2 0.38 0.07 0.65 0.18 430.96 1.00 _c13 y7 ~~ y7 3.44 2.38 4.45 0.66 1,143.13 1.00 _c14 x1 ~~ x1 1.35 0.99 1.67 0.21 1,372.02 1.00 _c16 x3 ~~ x3 0.39 0.15 0.66 0.16 481.36 1.00 _c18 y5 ~~ y5 2.56 1.71 3.36 0.51 1,020.10 1.00 _c19 y8 ~~ y8 3.21 2.32 4.31 0.62 792.91 1.00

Here, "Estimate" is a mean across chain values after burn-in samples, "5.0%" and "95.0%" are confidence intervals, "std" is standard deviation of estimates, "n_eff" is an effective number of samples and "r_hat" is a Rubin-Gelman statistics that quantifies a divergence between different chains. Values of "r_hat" that are strongly different from 1.00 may indicate a presense of local minima and/or that a model is overidentified. In the example above, we've used the default number of chains, which is 1, and hence it is not informative.

Hyperparameters of interest that can be passed to fit are:

num_samples

Number of MCMC samples. The default value is number of parameters times 30. Note that this is a very generous default value, usually it is suffice a much smaller value, so if a performance of concern, consider setting a custom smaller value.

num_warmup

Number of warmup/burn-in samples. The default values is num_samples/5.

num_chains

Number of MCMC chains. The default is 1.

solver

Name of solver. At the moment, it is reserved for MCMC kernels. All kernels that are available in **Numpyro** can be utilised. The default is "NUTS".

seed

Seed for pseudo random value generator. The default is 0.

Actually, there is an extra argument to the fit method, chains_mode, that is set to "best". This makes **semba** to use only a single best (in terms of likelihood) chain for parameter estimation. Such behavior handles a problem of local minima to a much greater extent than averaging over all chains. If you still want to join all chains, pass "mean" instead.

For example, the article_example with default priors and seed value exhibits local minimum that is far off from "true" parameter estimates. This is effectively solved by introducing extra chains:

from semopy.examples import example_article as ex from semopy.utils import compare_results from semba import Model import numpy as np desc, data = ex.get_model(), ex.get_data() true = ex.get_params() m = Model(desc) r = m.fit(data, num_chains=5) print('{:.2f}%'.format(np.mean(compare_results(m, true)) * 100))Mean absolute percentage error is:

18.79%

We aim to make all features of **semopy** available to **semba**, however some things are offl-limits due to the bayesian nature of the package, namely:

- Different estimators such as ULW, WLS, GLS, etc.
- Although conventional SEM fit indices can be computed via calc_stats function, but they have no theoretical grounding.
- Bounds and constraints.