STATS 412

My Office Hour:

My office hours are on 16:30 - 18:00 Tuesday and 13:30 - 15:00 Friday. You may check the campus map to get to my office.

Calculus Review:

The calculus review will be held at USB 2165, inside the study room, at 14:00 - 15:30 Sunday. The topics include:

\(\bullet\) U-substitution and integration by parts for single variable integration

\(\bullet\) Double integration, Assigning bounds for iterative integral

\(\bullet\) I saw a great suggestion from a student who mentioned that there is a great resource for self-study about the calculus. The website is: https://instruct.math.lsa.umich.edu/lecturedemos/ma215/docs/

\(\bullet\) I also uploaded a book for probability to the Piazza. Particularly, I recommend you to read (Type the page number in PDF): (1) Improper integral: page 203 - 210 (2) Double integral: page 214 - 218 (3) Cumulative distribution function: page 224 - 229 (4) Expectation and Variance: page 235 - 244 (5) Joint Distributions: page 299 - 304

Reminders for Assignment 3

\(\bullet\) Please correctly identify the CDF \(F(X)\), the cumulative distribution function. Let \(X\) be a random variable defined on \([a,b]\), then the (integral/summation of discrete points) is only valid for x inside the domain (support); the cumulative distribution function is \(F(x) = 0, x<a\) and \(F(x) = 1, x\ge b\).

\(\bullet\) To derive the standard deviation for the summation of independent random variables, say \(X_1 + X_2 + ...... + X_n\), please compute the variance first by formula \(Var(X_1 + X_2 + ...... + X_n) = Var(X_1) + ...... + Var(X_n)\). Then, you can take the square root to obtain the standard deviation. Do not directly add the standard deviations.

\(\bullet\) The notion that we repeat the same trial or experiment until obtaining the first success/failure/(something) indicates that the random variable X follows geometric distribution. It is a discrete probability distribution with probability mass function:

\[\begin{equation} \boxed{P(X = k) = \begin{cases} (1-p)^{k-1} \cdot p, k = 1,2,3,......\\ 0, \ otherwise \end{cases}} \end{equation}\]

Here splitting the case is important; it distinguishes the points which we have the probability value different from 0.

\(\bullet\) We can extend that notion to repeat the same trial or experiment until the r success/failure occurs. In this case, the random variable X follows negative binomial distribution. It is also a discrete probability distribution, with probability mass function:

\[\begin{equation} \boxed{P(X = n) = \begin{cases}{{n-1}\choose{r-1}}(1-p)^{n-r}p^{r-1} \cdot p = {{n-1}\choose{r-1}}(1-p)^{n-r}p^{r}, n = r, r+1, r+2,......\\ 0, \ otherwise \end{cases}} \end{equation}\]

\(\bullet\) If \(X_1, X_2,......,X_n\) is a simple random sample from a population with mean \(\mu\) and variance \(\sigma^2\), then the sample mean \(\bar X = \frac{\sum_{i=1}^n X_i}{n}\) is a random variable with

\[\begin{equation} \boxed{\mu_{\bar X} = E(\bar X) = E\Big(\frac{\sum_{i=1}^n X_i}{n}\Big) = \frac{E(X_1) + E(X_2) + ...... + E(X_n)}{n} = \frac{n \cdot \mu}{n} = \mu} \end{equation}\] \[\begin{equation} \boxed{\sigma^2_{\bar X} = Var(\bar X) = Var\Big(\frac{\sum_{i=1}^n X_i}{n}\Big) = \frac{Var(X_1) + Var(X_2) + ...... + Var(X_n)}{n^2} = \frac{n \cdot \sigma^2}{n^2} = \frac{\sigma^2}{n}} \end{equation}\] \[\begin{equation} \boxed{\sigma_{\bar X} = SD(\bar X) = \frac{\sigma}{\sqrt{n}}} \end{equation}\]

Key Points during Lecture 6:

Conditional Probability Clarification:

First of all, conditional probability should be a proper probability distribution. That is, for discrete random variable X, Y, the conditional probability should sum up to 1.

Then we should be aware about the notational equivalence:

\(\bullet P(y|2) = P(y|x=2) = \frac{p(x=2,y)}{p_x(2)}\)

\(\bullet P(y \le 2|x=2) = P(y \le 2 |2) = P(y = 2|2)+P(y=1|2) = P(2|2) + P(1|2)\)

It is encouraged that you know about the notational equivalences. However, when you are writing the homework, it is better to make the notation as clear as possible.

Conditional Probability:

During the class there is a great question about conditional probability which is not intuitively easy:

\[\begin{equation} \boxed{P(Y>2000 |X>1500) = \frac{P(X>1500 \cap Y>2000)}{P(X>1500)} = \frac{\int\int f(x,y) dydx}{\int_{1500}^{\infty}f_x(x)dx}} \end{equation}\]

First of all, recall that \(f(x,y) = 0.000006e^{-0.001x - 0.002y} \ for \ 0<x<y<\infty\) (0 otherwise).

Then, \(f_X(x) = \int_x^{\infty}0.000006e^{-0.001x - 0.002y} dy = 0.003e^{-0.001x} \cdot [-e^{-0.002y}]_x^{\infty} = 0.003e^{-0.003x}, x>0\) (0 otherwise).

Be careful here, we do not want to integrate \(Y\) first but instead, doing this:

\[\begin{equation} \boxed{P(Y>2000 |X>1500) = \frac{\int_{2000}^{\infty}\int_{1500}^{y} 0.000006e^{-0.001x - 0.002y} dxdy}{\int_{1500}^{\infty}0.003e^{-0.003x}dx} = \frac{\int_{2000}^{\infty}e^{-0.002y} [-0.006e^{-0.001x}]_{1500}^{y}dy}{[-e^{-0.003}]_{1500}^{\infty}}} \end{equation}\] \[\begin{equation} \boxed{= \frac{\int_{2000}^{\infty}\Big(0.006e^{-0.002y}e^{-1.5} - 0.006 e^{-0.003y}\Big)dy}{e^{-4.5}} = e^{4.5}\Big\{[-3e^{-0.002y}]_{2000}^{\infty} e^{-1.5} + [2e^{-0.003y}]_{2000}^{\infty} \Big\}} \end{equation}\] \[\begin{equation} \boxed{= e^{4.5}[-2e^{-6} + 3e^{-4}e^{-1.5}] = 3e^{-1} - 2e^{-1.5} \approx 0.6574} \end{equation}\]

Definition of Independence:

If random variables \(X_1, X_2, ...... , X_n\) are independent, then (1) If \(X_1, X_2, ......, X_n\) are jointly discrete, the joint pmf is equal to the product of the marginals:

\[\begin{equation} \boxed{p(x_1, x_2, ......., x_n) = p_{X_1}(x_1) \cdot p_{X_2}(x_2) \cdot ...... \cdot p_{X_n}(x_n)} \end{equation}\]

If \(X_1, X_2, ......, X_n\) are jointly continuous, the joint pdf is equal to the product of the marginals:

\[\begin{equation} \boxed{f(x_1, x_2, ......., x_n) = f_{X_1}(x_1) \cdot f_{X_2}(x_2) \cdot ...... \cdot f_{X_n}(x_n)} \end{equation}\]

If \(X,Y\) are independent and jointly discrete, and given respectively the marginal of x \(p_X(x)>0\) and marginal of y \(p_Y(y)>0\), then

\[\begin{equation} \boxed{p(x,y) = p_{X}(x) \cdot p_{Y}(y) = p_{Y|X}(y|x) \cdot p_{X}(x) = p_{X|Y}(x|y) \cdot p_{Y}(y)} \end{equation}\]

Therefore, making comparison of terms, we have \(p_{Y|X}(y|x) = p_{Y}(y)\) and \(p_{X|Y}(x|y) = p_{X}(x)\).

If \(X,Y\) are independent and jointly continuous, and given respectively the marginal of x \(f_X(x)>0\) and marginal of y \(f_Y(y)>0\), then

\[\begin{equation} \boxed{f(x,y) = f_{X}(x) \cdot f_{Y}(y) = f_{Y|X}(y|x) \cdot f_{X}(x) = f_{X|Y}(x|y) \cdot f_{Y}(y)} \end{equation}\]

Therefore, making comparison of terms, we have \(f_{Y|X}(y|x) = f_{Y}(y)\) and \(f_{X|Y}(x|y) = f_{X}(x)\).

Check Independence:

For discrete random variables X and Y, to check independence, we basically need to check every pair of \(p(x) \cdot p(y) = p(x,y)\). There are tricky cases when the result holds true for some pairs but not all pairs. Next week I believe you will encounter the concept of covariance; it is good and important for your preparation that generally, \(cov(X,Y) = 0 \nrightarrow X,Y \ independent\).

Key Points during Lecture 7:

Variance:

The variance of \(X_1 + X_2 + X_3\) is generally different from the variance of \(3X\). The reason is:

\[\begin{equation} \boxed{Var(X_1 + X_2 + X_3) = Var(X_1) + Var(X_2) + Var(X_3) + 2cov(X_1,X_2) + 2cov(X_1, X_3) +2cov(X_2, X_3)} \end{equation}\] \[\begin{equation} \boxed{Var(3X) = 3^2Var(X) =9Var(X)} \end{equation}\]

Test Reminder:

Do not lose point by not writing the 0 otherwise, whenever you encounter a question asking you to write down a probability distribution.

Conditional Probability:

During the previous class there is a great question about conditional probability: given \(f(x,y) = 0.000006e^{-0.001x - 0.002y} \ for \ 0<x<y<\infty\) (0 otherwise), derive \(P(Y>2000 |X>1500)\). From the equation list, you may know

\[\begin{equation} \boxed{P(Y>2000 |X>1500) = \frac{P(X>1500 \cap Y>2000)}{P(X>1500)} = \frac{\int\int f(x,y) dydx}{\int_{1500}^{\infty}f_x(x)dx}} \end{equation}\]

How can we derive the final answer? After calculation, you may refer the sixth class note for solution.

Property of Independence:

Recall if random variables \(X_1, X_2, ...... , X_n\) are independent, then (1) If \(X_1, X_2, ......, X_n\) are jointly discrete/continuous, the joint pmf/pdf is equal to the product of the marginals.

If \(X,Y\) are independent and (a) jointly discrete, and given respectively the marginal of x \(p_X(x)>0\) and marginal of y \(p_Y(y)>0\), (b) jointly continuous, and given respectively the marginal of x \(f_X(x)>0\) and marginal of y \(f_Y(y)>0\), then by making comparison of terms we have:

\[\begin{equation} \boxed{p_{Y|X}(y|x) = p_{Y}(y), \ \ p_{X|Y}(x|y) = p_{X}(x)} \end{equation}\] \[\begin{equation} \boxed{f_{Y|X}(y|x) = f_{Y}(y), \ \ f_{X|Y}(x|y) = f_{X}(x)} \end{equation}\]

Check Independence:

For discrete random variables X and Y, to check independence, we basically need to check every pair of \(p(x) \cdot p(y) = p(x,y)\). There are tricky cases when the result holds true for some pairs but not all pairs. Now we summarize the relationship between independence and correlated. If random variables \(X,Y\) are independent, then \(cov(X,Y) = 0\). However, if \(cov(X,Y) = 0\), \(X,Y\) are not necessarily independent. In short, we have \(X,Y \ independent \rightarrow cov(X,Y) = 0\), but \(cov(X,Y) = 0 \nrightarrow X,Y \ independent\).

Measurement Error Example:

The four readings (in pounds) are 148, 151, 150, and 152. Each time the person gets off the scale, the reading is 2 pounds. We can estimate the uncertainty by calculating the standard deviation. Let us break it down.

\[\begin{equation} \boxed{E(X) = \frac{148+ 151+ 150+152}{4} = 150.25} \end{equation}\] \[\begin{equation} \boxed{Var(X) = \frac{1}{4-1}\Big((-2.25)^2 + 0.75^2 + (-0.25)^2 + 1.75^2 \Big) = 2.9167} \end{equation}\] \[\begin{equation} \boxed{SD(X) = \sqrt{Var(X)} = 1.7078 \approx 1.71} \end{equation}\]

Key Points during Lecture 8,9:

Relationship between Bernoulli and Binomial Distribution:

If \(X_1, X_2, ......,X_n\) are independent Bernoulli trials with \(Ber(p)\), then the random variable \(Y = X_1 + X_2 + ...... + X_n \sim Bin(n,p)\). Also, since the expectation and variance of Bernoulli distribution are \(E(X) = p\) and \(Var(X) = p(1-p)\), by independence, we obtain the expectation and variance of \(Bin(n,p)\) are

\[\begin{equation} \boxed{E(Y) = E(X_1 + X_2 + ...... + X_n) = E(X_1) + E(X_2) + ...... + E(X_n) = p + p + ...... + p =np} \end{equation}\] \[\begin{equation} \boxed{Var(Y) = Var(X_1 + X_2 + ...... + X_n) = Var(X_1) + ...... + Var(X_n) = p(1-p) + ...... + p(1-p) =np(1-p)} \end{equation}\]

Optional Question:

Since the probability mass function for binomial distribution is given by,

\[\begin{equation} \boxed{P(Y = y) = \begin{cases} {{n}\choose{y}} p^y(1-p)^{n-y}, y = 0, 1, 2, ......, n-1, n\\ 0, otherwise \end{cases}} \end{equation}\]

Could you compute \(E(X)\) and \(V(X)\) by applying the formula (summation of possible points)? This will give the formal proof of binomial distribution.

Poisson Distribution:

It is clear in the lecture that Poisson distribution is an important and special distribution because its mean and variance are the same. That is, for a Poisson random variable \(X \sim Poisson(\lambda)\):

\[\begin{equation} \boxed{E(X) = V(X) = \lambda} \end{equation}\]

Test Reminder:

When you take a square root of variance to obtain the standard deviation, be sure to include the absolute value. It is because standard deviation is the positive square root of the variance. For example,

\[\begin{equation} \boxed{SD(\hat p) = SD\Big(\frac{X}{n}\Big) = \Big|\frac{1}{n} \Big| SD(X) = \frac{1}{n} \sqrt{np(1-p)} = \sqrt{\frac{p(1-p)}{n}}} \end{equation}\]

As we do not know p, we use \(\hat p\) to estimate p with uncertainty \(\sqrt{\frac{\hat p (1-\hat p)}{n}}\).

Optional Question:

When does the uncertainty achieve its maximum?

Ans: When \(p = \frac{1}{2}\), the uncertainty achieves its maximum, which is \(\sqrt{\frac{1}{4n}} = \frac{1}{2\sqrt{n}}\). This quantity is often mentioned in statistics books as the conservative expression of uncertainty.

Last Comment:

Please inform me to fix the typos and grammatical mistakes if they exist. It is a great practice of writing and I appreciate your help!

STATS 412

Conditional, Marginal and Joint Distributions

Conditional, Marginal and Joint Distributions

In Son Zeng

10/09/2018

My Office Hour:

Calculus Review:

Reminders for Assignment 3

Key Points during Lecture 6:

Conditional Probability Clarification:

Conditional Probability:

Definition of Independence:

Check Independence:

Key Points during Lecture 7:

Variance:

Test Reminder:

Conditional Probability:

Property of Independence:

Check Independence:

Measurement Error Example:

Key Points during Lecture 8,9:

Relationship between Bernoulli and Binomial Distribution:

Optional Question:

Poisson Distribution:

Test Reminder:

Optional Question:

Last Comment: