Substitute for triangle inequality for Kullback-Leibler divergence

I know that the Kullback-Leibler divergence does not satisfy the triangle inequality. But is there a substitute, maybe in one of the forms below? $$ D(P|Q)\leq C\left(D(P|R)+D(R|Q)\right), \quad\text{for some} C> 0 $$ $$ D(P|Q)\leq D(R|P)+D(P|R)+D(R|Q)+D(Q|R), $$ $$ D(P|Q)\leq D(R|P)^a+D(R|Q)^a\quad\text{for some } a> 0 $$


$\text{}$1. Relative entropy does not behave like a distance measure. Regarding your question, please check for the Pythagorean theorem of relative entropy.

$\text{}$2. One can define the Jensen-Shannon divergence between $P$ and $Q$ as $${{D(P|M)+D(Q|M)}\over2},$$where $M$ is the midpoint between $P$ and $Q$. It has been proved that the square root of the Jensen-Shannon divergence satisfies the triangle inequality.

There exist sequences $P_n$, $Q_n$, and $R_n$ such that $D(P_n|Q_n)$ tends to zero as $n$ tends to infinity and $D(Q_n|R_n)$ tends to zero as $n$ tends to infinity, but $D(P_n|Q_n)$ does not tend to zero as $n$ tends to infinity. This result will provide counterexamples to several suggested inequalities.

Can you link me to Pythagorean theorem you mention?

Regarding properties of relative entropy, please see the following for discussion of $I$-divergence properties. I was referring to the parallelogram identity (identity 7).

https://www.clsp.jhu.edu/~sanjeev/520.447/Spring00/I-divergence-properties.ps

And maybe also to a description of the sequences you mention?

First construct three probability measures $P$, $Q$, and $R$ such that $D(P|Q)$ and $D(Q|R)$ are finite but $D(P|R)$ is infinite. This is possible if the sample space is countable by choosing $P$ more heavy tailed than $Q$ that should be more heavy tailed than $R$. Then let $P_n$ be the probability measure$$\left({1\over{n}}\right)P+\left(1-{1\over n}\right)R$$ and $Q_n$ be the probability measure$$\left({1\over n}\right)Q+\left(1-{1\over n}\right)R$$ and $R_n = R$.


For the binary KL-divergernce we have the following "approximate triangle-inequality": $$ D(a\,\|\,c)+D(c\,\|\,b)=D(a\,\|\,b) + (c-a)D'(c\,\|\,b), $$ where $D'(c\,\|\,b) = \log\frac{c(1-b)}{(1-c)b}$.

I wonder if one can show something similar for the case of general probability distributions.