Original link: tecdat.cn/?p=19664
Original Source: Topend Data Tribe
MCMC is a general technique for sampling from complex probability models.
 Monte Carlo
 Markov chain
 MetropolisHastings algorithm
problem
If you need to calculate the average or expected value of the function f( ) of the random variable with complex posterior pdf p(  y).
You may need to calculate the maximum value of the posterior probability distribution p( ).
One way to solve the expected value is to draw N random samples from p( ). When N is large enough, we can approximate the expected value or maximum value by the following formula
The same strategy is applied to find argmaxp(  y) by sampling from p(  y) and sampling the maximum value in the set.
Solution
1.1 Direct simulation
1.2 Inverse CDF
1.3 Reject/accept sampling
If we don't know the exact/standardized pdf or are very complex, MCMC will come in handy.
Markov chain
In order to simulate the Markov chain, we must formulate a transition kernel T(xi, xj). The transition core is the probability of transition from state xi to state xj.
The convergence of a Markov chain means that it has a stationary distribution . The statistical distribution of the Markov chain is stable, so it means that the distribution will not change over time.
Metropolis algorithm
For a Markov chain is smooth of . Basically means
The probability of being in state x and transitioning to state x'must be equal to the probability of being in state x'and transitioning to state x
or
The method is to divide the conversion into two substeps; Candidate and Accept Reject.
Let q(x'x) denote the candidate density , we can use the probability (x'x) to adjust q.
The candidate distribution Q(X'X) is the conditional probability of the state X'of a given candidate X,
And Accept the conditional probability of the distribution (x'x) to accept the candidate state X'X'. We designed the acceptance probability function to meet the detailed balance.
The transition probability can be written as:
Insert the previous equation, we have
MetropolisHastings algorithm
The choice of A follows the following logic.
The transition from x to x'under q is too frequent. Therefore, we should choose (x  x')=1. However, in order to satisfy the meticulous and stable, we have
The next step is to choose an acceptance that meets the above conditions. MetropolisHastings is a common choice :
That is, when the acceptance is greater than 1, we always accept, and when the acceptance is less than 1, we will reject accordingly. Therefore, the MetropolisHastings algorithm includes the following:
 Initialization: randomly select an initial state x;
 Randomly select a new state x'according to q(x'x);
3. Accept the state according to (x'x). If you do not accept it, no transfer will occur, so there is no need to update anything. Otherwise, transfer to x';
4. Transfer to 2 until the T state is generated;
5. Save state x and go to 2.
In principle, we extract saved states from the distribution P(x), because step 4 guarantees that they are not relevant. The value of T must be selected according to different factors such as the candidate distribution. Importantly, it is not clear which distribution q(x'x) should be used; adjustments must be made to the specific problem at hand.
Attributes
An interesting feature of the MetropolisHastings algorithm is that it only depends on the ratio
Is the probability between the candidate sample x'and the previous sample xt,
Is the ratio of candidate densities in two directions (from xt to x'and vice versa). If the candidate density is symmetric, it is equal to 1.
The Markov chain starts from any initial value x0, and the algorithm runs multiple iterations until the "initial state" is "forgotten". These discarded samples are called burnin. The remaining sets of acceptable values for x represent samples in the distribution P(x)
Metropolis sampling
A simple MetropolisHastings sampling
Let us look at simulating arbitrary shape and scale parameters from the gamma distribution , using the MetropolisHastings sampling algorithm.
The functions of the MetropolisHastings sampler are given below. The chain is initialized to zero, and N (a/b, a/(b * b)) candidates are recommended at each stage.
MetropolisHastings independent sampling based on normal distribution and the same gamma of mean and variance

Start xt from a certain state. The x in the code.

Propose a new state x'candidate in the code

Calculate the "Probability of Acceptance"

Get some uniformly distributed random number u from [0,1]; if u < accept this point, set xt + 1 = x'. Otherwise, reject it and set xt + 1 = xt.
MH visualization
set.seed(123) for (i in 2:n) { can < rnorm(1, mu, sig) aprob < min(1, (dgamma(can, a, b)/dgamma(x, a, b))/(dnorm(can, mu, sig)/dnorm(x, mu, sig))) u < runif(1) if (u <aprob) x < can vec[i] < x Copy code
Drawing
Setting parameters.
nrep< 54000 burnin< 4000 shape< 2.5 rate<2.6 Copy code
Modify the diagram to include only the chain after the burnin period
vec=vec[(1:burnin)] #vec=vec[burnin:length(vec)] Copy code
par(mfrow=c(2,1)) # Change the main frame, how many graphics are in a frame plot(ts(vec), xlab="Chain", ylab="Draws") abline(h = mean(vec), lwd="2", col="red") Copy code
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.007013 0.435600 0.724800 0.843300 1.133000 3.149000 Copy code
var(vec[(1:burnin)]) Copy code
[1] 0.2976507Copy code
Initial value
First sample
x < 3*a/b vec[1] < x Copy code
Options
If the candidate density matches the shape of the target distribution P(x), that is, q(x'xt) P(x')q(x'), the algorithm works best. xt) P(x'). If the normal candidate density q is used, the variance parameter 2 must be adjusted during the burnin period.
Usually, this is done by calculating the acceptance rate, which is the proportion of candidate samples accepted in the window of the last N samples.
If 2 is too large, the acceptance rate will be very low because the candidate may fall in an area with a much lower probability density, so a1 will be very small and the chain will converge very slowly.
Example 2: Bayesian estimation of regression
MetropolisHastings sampling is used in Bayesian estimation regression model.
Setting parameters
DGP and graphs
# Create an independent x value, approximately zero x < ((Size1)/2):((Size1)/2) # Create related values based on ax + b + N (0, sd) y < trueA * x + trueB + rnorm(n=Size,mean=0,sd=trueSd) Copy code
Normal distribution
pred = a*x + b singlelikelihoods = dnorm(y, mean = pred, sd = sd, log = T) sumll = sum(singlelikelihoods) Copy code
Why use logarithm
The logarithm of the probability in the likelihood function, which is why I sum the probabilities of all data points (the logarithm of the product is equal to the sum of the logarithms).
Why do we do this? This is strongly recommended, because the multiplication of many small probabilities will become very small. At some stage, the computer program will get into numerical rounding or underflow problems.
So when you write probabilities , always use logarithms
Example: Draw a likelihood curve with slope a
# Example: Draw the likelihood curve of slope a plot (seq(3, 7, by=.05), slopelikelihoods, type="l") Copy code
Prior distribution
The uniform distribution and normal distribution of these three parameters.
# Prior distribution # Change the priority, log is True, so these are all logs density/likelihood aprior = dunif(a, min=0, max=10, log = T) bprior = dnorm(b, sd = 2, log = T) sdprior = dunif(sd, min=0, max=30, log = T) Copy code
Posterior
The product of the prior and the probability is the actual quantity that MCMC will process. This function is called the posterior function. Again, here we use sums because we use logarithms.
posterior < function(param){ return (likelihood(param) + prior(param)) } Copy code
Metropolis algorithm
This algorithm is one of the most common Bayesian statistical applications for sampling from posterior density .
The posterior defined above.
 Start with random parameter values
 According to the probability density of a candidate function, select a new parameter value close to the old value
 Jump to this new point with probability p(new)/p(old), where p is the objective function, and p>1 also means jump
 Note that we have a symmetric jump/ candidate distribution q(x'x).
The standard deviation is fixed.
So the probability of acceptance is equal to
######## Metropolis Algorithm################ for (i in 1:iterations){ probab = exp(posterior(proposal)posterior(chain[i,])) if (runif(1) <probab){ chain[i+1,] = proposal }else{ chain[i+1,] = chain[i,] } Copy code
Implement
(E) Output the accepted value and explain it.
chain = metrMCMC(startvalue, 5500) burnIn = 5000 accep = 1mean(duplicated(chain[(1:burnIn),])) Copy code
The first step of the algorithm may be biased by the initial value, so it is usually discarded for further analysis (burnin period). The interesting output is the acceptance rate: How often is the candidate accepted and rejected by the algorithm? The candidate function affects the acceptance rate: generally, the closer the candidates, the greater the acceptance rate. However, a very high acceptance rate is usually unhelpful: it means that the algorithm "stays" on the same point, which leads to less than ideal handling of the parameter space (mixing).
We can also change the initial value to see if it changes the result/converges.
startvalue = c (4,0,10) copying the code
summary
V1 V2 V3 Min. :4.068 Min. :6.7072 Min.: 6.787 1st Qu.:4.913 1st Qu.:2.6973 1st Qu.: 9.323 Median :5.052 Median :1.7551 Median :10.178 Mean :5.052 Mean :1.7377 Mean :10.385 3rd Qu.:5.193 3rd Qu.:0.8134 3rd Qu.:11.166 Max. :5.989 Max.: 4.8425 Max. :19.223 Copy code
#Compare: summary(lm(y~x)) Copy code
Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max 22.259 6.032 1.718 6.955 19.892 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 3.1756 1.7566 1.808 0.081. x 5.0469 0.1964 25.697 <2e16 ***  Signif. codes: 0 ?**?0.001 ?*?0.01 ??0.05 ??0.1 ??1 Residual standard error: 9.78 on 29 degrees of freedom Multiple Rsquared: 0.9579, Adjusted Rsquared: 0.9565 Fstatistic: 660.4 on 1 and 29 DF, pvalue: <2.2e16 Copy code
summary (lm (y ~ x) ) $ sigma duplicated code
[1] 9.780494Copy code
coefficients(lm(y~x))[1] Copy code
(Intercept) 3.175555 Copy code
coefficients(lm(y~x))[2] Copy code
x 5.046873 Copy code
summary:
### summary: ####################### par(mfrow = c(2,3)) hist(chain[(1:burnIn),1],prob=TRUE,nclass=30,col="109" abline(v = mean(chain[(1:burnIn),1]), lwd="2") Copy code
Most popular insights
1. Use R language to simulate mixed queuing random service queuing system
2. Use queuing theory to predict waiting time in R language
3. Implement Markov chain Monte Carlo MCMC model in R language
4. Markov regime switching model in R language
5. Matlab Bayesian Hidden Markov hmm model
6. Use R language to simulate mixed queuing random service queuing system
7. Python portfolio optimization based on particle swarm optimization
8. R language Markov transformation model to study traffic casualties accident prediction