Applying law of total probability to conditional probability

I was solving problems based on Bayes theorem from the book "A First Course in Probability by Sheldon Ross". The problem reads as follows:

An insurance company believes that there are two types of people: accident prone and not accident prone. Company statistics states that accident prone person have an accident in any given year with probability $0.4$, whereas the probability is $0.2$ for not-accident prone person. If we assume $30\%$ of population is accident prone, what is the conditional probability that a new policyholder will have an accident in his or her second year of policy ownership, given that the policyholder has had an accident in the first year?

The solution given is as follows:

Book Solution
$$ \begin{align} P(A)=0.3 & & (given)\\ \therefore P(A^c)=1-P(A)=0.7 & & \\ P(A_1|A)=P(A_2|AA_1)=0.4 & &(given)\\ P(A_1|A^c)=P(A_2|A^cA_1)=0.2 & & (given) \end{align} $$ $$ P(A_1)=P(A_1|A)P(A)+P(A_1|A^c)P(A^c) =(.4)(.3)+(.2)(.7)=.26 \\ P(A|A_1)=\frac{(.4)(.3)}{.26}=\frac{6}{13} \\ P(A^c|A_1)=1-P(A|A_1)=\frac{7}{13} $$ $$ \begin{align} P(A_2|A_1)& =P(A_2|AA_1)P(A|A_1)+P(A_2|A^cA_1)P(A^c|A_1) &&...(I)\\ &=(.4)\frac{6}{13}+(.2)\frac{7}{13}\approx .29\\ \end{align} $$

I dont understand the statement $(I)$.

My Solution
Shouldnt it be like this: $$P(A_2|A_1)=P(A_2|AA_1)P(AA_1)+P(A_2|A^cA_1)P(A^cA_1)$$ Continuing further:
$$ \begin{align} P(A_2|A_1)&=P(A_2|AA_1)P(A_1|A)P(A)+P(A_2|A^cA_1)P(A_1|A^c)P(A^c)\\ &=(.4)(.4)(.3)+(.2)(.2)(.7)=0.076 \end{align} $$

Am I wrong? If yes, where did I go wrong?

Added Later

After going through comments and thinking more, it seems that I am struggling to apply law of total probability (and my above solution is very well wrong). The basic form of law of total probability, which I came across till now, is as follows: $$P(A)=P(A|\color{red}{B})P(\color{red}{B})+P(A|\color{magenta}{B^c})P(\color{magenta}{B^c})$$ I am first time facing application of this law for conditional probability, as done book solution: $$P(A_2|A_1)=P(A_2|AA_1)P(A|A_1)+P(A_2|A^cA_1)P(A_c|A_1)$$ as it involves three events ($A,A_1,A_2$). Book did not explained this. Though in current problem, it looks "somewhat" intuitive,

  1. can someone generalize it, so as to make my understanding more clear? Say for $n$ events?

  2. Also, in $P(A_2|A_1)=P(A_2|\color{red}{AA_1})P(\color{red}{A|A_1})+P(A_2|\color{magenta}{A^cA_1})P(\color{magenta}{A^c|A_1})$, I feel red colored stuff should be same and pink colored stuff should be same, as in case of simple form law of total probability.

  3. I felt it should be $P(A_2|\color{red}{(A_1|A)})P(\color{red}{A_1|A})+P(A_2|\color{magenta}{(A_1|A^c)})P(\color{magenta}{A_1|A^c})$. Am I absolutely stupid here?

  4. For a moment I felt its related to:$P(E_1E_2E_2...E_n)=P(E_1)P(E_2|E_1)P(E_3|E_1E_2)...P(E_n|E_1...E_{n-1})$. Is it so?

I am now screwed at my ability to apply law of total probability. Please enlighten me.


Solution 1:

  1. can someone generalize it, so as to make my understanding more clear? Say for $n$ events?

If $(B_k)_n$ is a sequence of $n$ events that partition the sample space (or if at least $(B_k\cap A_1)_n$ partitions $A_1$) then, $\mathsf P(A_2\mid A_1) = \sum_{k=1}^n \mathsf P(A_2\mid A_1\cap B_k)\mathsf P(B_k\mid A_1)$

  1. Also, in $P(A_2|A_1)=P(A_2|\color{red}{AA_1})P(\color{red}{A|A_1})+P(A_2|\color{magenta}{A^cA_1})P(\color{magenta}{A^c|A_1})$, I feel red colored stuff should be same and pink colored stuff should be same, as in case of simple form law of total probability.

They are not the same in the case of the simple form. So why should they be?

Where $\Omega$ is the entire sample space, then:

$${{\mathsf P(A_2)~}{= \mathsf P(A_2\mid \Omega)\\=\mathsf P(A_2\mid \color{red}{A}, \Omega)P(\color{red}{A}\mid \Omega)+\mathsf P(A_2\mid \color{magenta}{A^c}, \Omega)\,\mathsf P(\color{magenta}{A^c}\mid \Omega)\\=\mathsf P(A_2\mid \color{red}{A})P(\color{red}{A})+\mathsf P(A_2\mid \color{magenta}{A^c})\,\mathsf P(\color{magenta}{A^c})}}$$

  1. I felt it should be $P(A_2|\color{red}{(A_1|A)})P(\color{red}{A_\,\mathsf 1|A})+P(A_2|\color{magenta}{(A_1|A^c)})P(\color{magenta}{A_1|A^c})$. Am I absolutely stupid here?

:) Well, I would not say absolutely.   But seriously, it is a rather common misunderstanding.

The conditioning bar is not a set operation.   It seperates the event from the condtion that the probability function is being measured over.   There can only be one inside any probability function; they do not nest.

  1. For a moment I felt its related to:$P(E_1E_2E_2...E_n)=P(E_1)P(E_2|E_1)P(E_3|E_1E_2)...P(E_n|E_1...E_{n-1})$. Is it so?

Yes, this is so.   Specifically $\mathsf P(A_2,A,A_1)=\mathsf P(A_2\mid A,A_1)\mathsf P(A\mid A_1)\mathsf P(A_1)\\ \mathsf P(A_2,A^\mathsf c,A_1)=\mathsf P(A_2\mid A^\mathsf c,A_1)\mathsf P(A^\mathsf c\mid A_1)\mathsf P(A_1)$

$$\begin{align}\mathsf P(A_2\mid A_1) ~ & = \mathsf P((A\cup A^\mathsf c){\cap} A_2\mid A_1) && \text{Union of Complements} \\[1ex] & = \mathsf P((A{\cap}A_2)\cup(A^\mathsf c{\cap}A_2)\mid A_1) && \text{Distributive Law} \\[1ex] & = \mathsf P(A{\cap}A_2\mid A_1) + \mathsf P(A^\mathsf c{\cap}A_2\mid A_1) && \text{Additive Rule for Union of Exclusive Events} \\[1ex] & = \dfrac{\mathsf P(A{\cap}A_1{\cap}A_2)+\mathsf P(A^\mathsf c{\cap}A_1{\cap}A_2)}{\mathsf P(A_1)} && \text{by Definition} \\[1ex] & = \dfrac{\mathsf P(A_2\mid A{\cap}A_1)\,\mathsf P(A{\cap}A_1)+\mathsf P(A_2\mid A^\mathsf c{\cap}A_1)\,\mathsf P(A^\mathsf c{\cap}A_1)}{\mathsf P(A_1)} && \text{by Definition} \\[1ex] & = {\mathsf P(A_2\mid A{\cap}A_1)\,\mathsf P(A\mid A_1)+\mathsf P(A_2\mid A^\mathsf c{\cap}A_1)\,\mathsf P(A^\mathsf c\mid A_1)} && \text{by Definition of Conditional Probability} \end{align}$$