Skip to content

Bayesian Data Analysis

kirkvanacore edited this page Sep 29, 2021 · 3 revisions

https://bookdown.org/marklhc/notes_bookdown/ by Mark Lai

1 Introduction

Why Use Baysean Sats?

  • Incorporate prior knowledge
  • Flexibility
    • more fitting options that feq stats
  • Handles missing data well
  • ease of model comparisons
  • people are sick of p-hacking

Probability

Classical probability is equal among independent cases Frequentist probability is the long term relative frequency of an outcome

The problem with the Fequ approach is that some events never repeat (presidential elections, sports competitions... etc.), at least in the way a count toss does.

Subjectivist probability Incorporate your belief into the probability. But the belief must follow the rules of probability and be rational.

Basics of probability

  • Probability has to be less than zero
  • sum of all p() for possible events = 1
  • p() that one of two mutually exclusive events occur is the sum of the probabilities

Conditional Probability

The probability of an event given another event

p(A|B) the p() of A given b

= P(AnB) the probability that both A and B will occur (pronounced A-cap-B)

/ P(B)

Indpendence

if the P(A|B) = P(A) then the vents are independent or P(AnB) = P(A)P(B)

Law of Total Probability

If P(A) is a marginal probability (it won't happen without some B), then the P(A) is the sum of all of the P(AnB) (or probabilities of A and B)

Bayes Theorem

P(B|A) = P(A|B)P(B) / P(A)

Bayseian Statistics

Posterior Probability ∝ Prior Probability × Likelihood

P(θ=t|y)∝P(θ=t)P(y|θ=t)

  • Posterior Probability P(θ=t|y)
  • likelihood P(y|θ=t)
  • prior probability P(θ=t)

The probability that some parameter (θ) is equal to some value (t) given some data (y) is equal to the prior probability that the parameter equals the value P(θ=t) times the probability of the data (y) given the parameter equaling the value P(y|θ=t)

the frequentist approach is to leave off the prior

2 Bayesian Inference

"turning the Bayesian crank"

1. Collect/observe data

2. Choose a model that fits the data and the question

  • Assumption of Exchangeability subpopulations don't have different probabilities
  • Probability Distribution must match the data - discrete/bimodal, normal etc.
  • Likelihood calculate the likelihood of parameters(s) given data

3. Specify Prior distributions

  • pick an informed prior of the parameter(s) distribution(s)
  • Not just picking a value, selecting a probability distribution for that value

4. Calculate posterior distributions:

Use Bayes Rule to calculate the posterior

Methods

Grid Approximation

  • Pick a number of values of parameters
  • Evaluate the posterior for each parameter value
  • This provides the poster distribution

Conjecgate Priors

  • the probability of the parameter give y is the distribution of the sums of the prior and observed numbers of success and values

P(θ|y)∼ Beta(a+y,b+n−y)

a -> prior num of success b -> is the prior number of faluies y -> observed success
n-y -> observed falues

how does this work for a nonbinary distribution?

Laplace Approximation with Maximum A Posteriori Estimation

  • find the maximum point in the posterior distribution, called the maximum a posteriori (MAP) estimate What about the other components of the distribution? variance?

Markov Chain Monte Carlo (MCMC)

  • draws samples from the posterior distribution
  • these samples are correlated, which requires drawing of more samples Is this the same as drawing samples from the data? is this similar to bootstrapping.

Summarizing the Posterior distribution

Mean, Median, & Mode

  • Posterior Mean -> point generally used as the estimate
  • Posterior meaning -> worth considering as it is more robust to outliers
  • Postier Mode -> the maximum a posteriori (MAP) is the point with the highest posterior probability

Uncertainty Estimates

  • standard deviation of the posterior distribution
  • mean absolute deviation from the median (MAD) of the posterior distribution (more robust when the distribution is skewed)

Credible Intervals

  • Similar to confidence intervals

90% credible interval is the interval that has a 90% probability of containing the true value of the parameter

(This is different from CIs, which show that 90% of the interval constructed with repeated sampling will contain the true parameter)

  • Credible intervals can be defined however you want to (50%, 80%, 1st, and 51st %iles)

5. Posterior predictive check

Does the model fit the distribution?

Posterior Predictive Distribution: Weighted prediction of the parameter by the corresponding posterior prediction.

The Posterior Predictive Distribution is used to check against the actual data or simulated samples of the data Like a residual?

6. Intemperate/visualize the results

Mentioned as step six but not discussed in this chapter.