Show that minimal CFG is undecidable (Sipser 5.36)

Question: Say that a CFG (context-free grammar) is minimal if none of its rules can be removed without changing the language generated. Let $MIN_{\text{CFG}}$ = $\{\, \langle G \rangle$ | $G$ is a minimal CFG$\}$.

a) Show that $MIN_{\text{CFG}}$ is Turing-recognizable.

b) Show that $MIN_{\text{CFG}}$ is undecidable.

Now part a) is relatively straightforward. If all rules are indispensable for some CFG $G$, we just need to find for each rule $R$ of $G$ a string $w_R$ that can be generated by $G$, but not with $R$ omitted. Thus, enumerating all the strings one by one for potential "certificates" for indispensability yields a recognizer.

However, I got stuck on part b). It would be easier (to prove undecidability) if the question concerns a specific rule of $G$, because then it can be reduced from the known undecidable language $ALL_{\text{CFG}} = \{\, \langle G \rangle$ | $G$ generates all strings $\}$. But I'm not sure whether it is possible to reduce between those two variants.

Thank you for any help!


It suffices to focus on the case $\Sigma = \{0,1\}$. We argue that $ALL_{\textsf{minCFG}} \leq_T MIN_{\textsf{CFG}}$. That is, we can determine whether a minimal CFG can generate all possible strings, by using the oracle for $MIN_{\textsf{CFG}}$.

Let $G$ denote the minimal CFG we are interested. Since $G$ is minimal, there exists a length $p$ such that, for each rule $R$ in $G$ there exists a string $w_R$ with $|w_R|<p$ that can be generated by $G$, but not with $R$ deleted. Note here that $p$ is computable in a brute force manner. Construct $G^+_{a_1a_2\cdots a_p}$ where $a_i \in \Sigma$, by adding following rules in $G$: $$ S \to a_1 a_2 \cdots a_pT \\ T \to T0 \ | \ T1 \ | \ \epsilon $$ First, consider what will happen if $G^+_{a_1a_2\cdots a_p}$ is not minimal, thus, there exists a rule $R$ in $G^+$ being dispensable. By our construction it is clear that each rule $R$ in $G$ is indispensable.

  • If $S \to a_1 a_2 \cdots a_pT$ or $T \to \epsilon$ is dispensable, then we have $a_1 a_2 \cdots a_p \Sigma^* \subseteq L(G)$.
  • If $T \to T0$ is dispensable, then we have $(a_1 a_2 \cdots a_p \Sigma^* - a_1 a_2 \cdots a_p 1^*) \subseteq L(G)$. Because whether $a_1 a_2 \cdots a_p1^* \subseteq L(G)$ is easy to determine by some classical algorithm, we can also determine whether $a_1 a_2 \cdots a_p \Sigma^* \subseteq L(G)$.
  • It is similar for $T \to T1$.

Therefore, if $G^+_{a_1a_2\cdots a_p}$ is not minimal, we can determine whether $a_1 a_2 \cdots a_p \Sigma^* \subseteq L(G)$. What if it is minimal? In this case we will immediately know $L(G) \neq \Sigma^*$. So we can suppose that all possible $G^+_{a_1a_2\cdots a_p}$ is not minimal. Then by checking whether $a_1 a_2 \cdots a_p \Sigma^* \subseteq L(G)$ one by one and checking those strings of less-than-$p$ length, we can still decide whether $L(G) = \Sigma^*$.

Update. (Oct 12, 2020) After a short discussion with @xdavidliu I realized that the argument above can directly prove what we want, without a justification of uncomputability of $ALL_{\textsf{minCFG}}$. Recall that the reduction we have established achieves the following: Given a minimal CFG $G$, determine whether $L(G) = \Sigma^*$. Now if we want to decide whether $L(G_0) = \Sigma^*$ for an arbitrary CFG $G_0$, just simply feed into the (reduction) algorithm all minimal sub-CFG (test each sub-CFG by the $MIN_{\mathsf{CFG}}$ oracle to determine whether it is minimal) of $G_0$ and see if there is one that generates all strings.


Since this problem appears in Chapter 5, in which Turing reduction has not introduced. I personally believe that there is an elegant proof using only mapping reduction.


Update. @Sylvain gives an elegant proof on https://cstheory.stackexchange.com/a/39454/46760, by showing $$ PCP \leq_T FullPCP \leq_m MIN_{\textsf{CFG}}$$ in which I personally thought the key is that context-free language is closed under $h^{-1}$ he defined.


Here is @Lwins answer expressed in another way. All credit goes to @Lwins for the idea.

Claim 1: Let $ALL_\mathsf{CFG}$ be the language of CFGs that accept all strings in some finite alphabet $\Sigma$. This language is undecidable.

Proof 1: (also Theorem 5.13 in Sipser) A decider could be used, for any given Turing machine $M$ and input $w$, to construct a CFG that rejects accepting computation histories for that $M$ on that $w$. Running the decider for $ALL_\mathsf{CFG}$ on this CFG would also decide $A_\mathsf{TM}$, the language of pairs $\langle M, w \rangle$ where $M$ accepts $w$. However, this would be impossible because $A_\mathsf{TM}$ is not decidable.

Claim 2: Let $MIN_\mathsf{CFG}$ be the language of minimum CFGs, i.e. CFGs for which all of its rules are necessary, and none can be omitted without losing strings. Let $ALL_\mathsf{MINCFG} \subset MIN_\mathsf{CFG}$ be the language of minimum CFGs that accept all strings. The language $ALL_\mathsf{MINCFG}$ is undecidable.

Proof 2: If we had such a decider, we could also decide $ALL_\mathsf{CFG}$ for any $G$ by running on subsets of rules of $G$.

Claim 3: Let $A_\mathsf{CFG}$ be the language of pairs $\langle G, w \rangle$ where the CFG $G$ accepts the input $w$. This language is decidable. (Proof is omitted here and can easily be found elsewhere).

Claim 4: For any necessary rule $R$ in some CFG $G$, it is possible to compute in finite time some string $w_R \in G$ which could not be generated without $R$.

Proof 4: Lexicographically generate all strings $w$ and use the $A_\mathsf{CFG}$ decider on $w$ and $G$ with and without $R$. Since $R$ is necessary, some $w_R$ exists and this procedure is guaranteed to halt.

Claim 5: Suppose it were possible to decide $MIN_\mathsf{CFG}$. Suppose also that we are given a minimum $G$ and a string $v$ that for all rules $R$ is longer than at least one $w_R$. Then it would be decidable whether $v \Sigma^\star \subseteq G$.

Proof 5: Augment $G$ to $G_v$ by adding the rules $S \rightarrow v T$ (where $S$ is the start variable of $G$ and $T$ is a new variable) along with $T \rightarrow \varepsilon$ and $T \rightarrow T \sigma$ for every $\sigma \in \Sigma$. These new rules generate only $v \Sigma^\star$. The original rules $R$ are guaranteed to still be necessary in $G_v$, since $|w_R| < |v|$.

We then ask if any of the new rules are necessary. To answer this, we repeatedly run our $MIN_\mathsf{CFG}$ decider, each time keeping a different non-empty subset of the $T$ rules. If $MIN_\mathsf{CFG}$ ever accepts, then we know that some strings in $v \Sigma^\star$ are missing in $G$. Otherwise, none of the new rules in $G_v$ are necessary, and thus $v \Sigma^\star \subseteq G$.

Main result:

A decider for $MIN_\mathsf{CFG}$ would imply a decider for $ALL_\mathsf{MINCFG}$ (Hence, $MIN_\mathsf{CFG}$ is undecidable).

Proof

Suppose we had a decider for $MIN_\mathsf{CFG}$ and input $G$. If this decider rejects $G$, then $ALL_\mathsf{MINCFG}$ must also reject.

Otherwise, we know $G \in MIN_\mathsf{CFG}$ and thus for each rule $R$ in $G$, we can (Claim 4) compute a $w_R$ for which that $R$ is necessary. Choose some integer $\ell$ larger than the length of the longest $w_R$ we happened to find, and use $A_\mathsf{CFG}$ to check if $u \subseteq G$ for all $|u| < \ell$, and (Claim 5) also if $v \Sigma^\star \subseteq G$ for all $|v| = \ell$. $ALL_\mathsf{MINCFG}$ accepts if and only if both of these are true.