Proving $(p \to (q \to r)) \to ((p \to q) \to (p \to r))$
As you suspected, this can be proved from the first three axioms only. I couldn't find a short proof, though – I tried brute force enumeration of the theorems deducible from the three axioms (by taking all pairs of theorems already proved and unifying one with the premise of the other), but didn't find your target in the first $80000$ theorems proved.
I then found some guidance in the article on relevance logic in the Handbook of Philosophical Logic. Relevance logic focuses on the fragment of logic in which, roughly speaking, the premises are relevant to the conclusions. It doesn't include the axiom $p\to(q\to p)$, which allows us to add an irrelevant premise to a theorem already proved without that premise, and is thus strictly weaker than the system you're using, but we can nevertheless make use of the results cited in that article.
I'll first describe the structure of the proof and how I found it, and then give the proof in detail. Here are the names I'll use for the axioms; the first column names the corresponding axioms of combinator logic, for comparison with the discussion in the comments under the question:
$$ \begin{array}{c|l|l} \mathbf I&\text{self-implication}&p\to p\\ \mathbf K&\text{weakening}& p \to (q \to p) \\ \hline \mathbf B&\text{prefixing}& (p \to q) \to ((r \to p) \to (r \to q)) \\ \mathbf A&\text{suffixing}& (p \to q) \to ((q \to r) \to (p \to r)) \\ \hline \mathbf W&\text{contraction}& (p \to (p \to q)) \to (p \to q) \\ \mathbf S&\text{self-distribution}&(p \to (q \to r)) \to ((p \to q) \to (p \to r))\\ \hline \mathbf C&\text{permutation}&(p\to(q\to r))\to(q\to(p\to r))\\ &\text{assertion}&p\to((p\to q)\to q) \end{array} $$
(The names are the ones used in the article, except I use "weakening" instead of "positive paradox", since it's shorter and makes more sense to me.)
Theorem $1$ of the article states that, with modus ponens (and implicitly universal substitution), the axiom sets formed by self-implication and one each from the three pairs prefixing/suffixing, contraction/self-distribution and permutation/assertion lead to the same theory.
What you have is weakening, suffixing and contraction. Self-implication can be deduced from weakening and contraction in a single step (by substituting $p$ for $q$ everywhere). Thus, if we can deduce assertion in your system, the theorem will tell us that we can deduce everything else, including your target, self-distribution. I did find a proof for assertion by brute force search.
The article doesn't give a proof of its Theorem $1$ and only says that it can be proved by consulting a book that isn't available online and doing some "fiddling", so we still have to show how to get from self-implication, suffixing, contraction and assertion to self-distribution.
I found a deduction of self-distribution online that uses prefixing and permutation. It turns out that prefixing is deducible in a single step from suffixing and permutation, so the problem remained only to deduce permutation. Again, I found a proof for this by brute force search.
So here's the entire proof put together, starting with your three axioms and ending with your target. First, a high-level description similar to the actual calls in my Java code:
assertion = t (t (weakening,suffixing),contraction);
permutation = t (suffixing,m (assertion,suffixing));
prefixing = m (suffixing,permutation);
target = t (m (prefixing,prefixing),t (permutation,m (contraction,prefixing)));
Each call to m
is an application of modus ponens, in which the first argument is $A$, the second argument is $A\to B$ and the most general unifier that makes the $A$s coincide is used. Each call to t
is an invocation of transitivity (i.e. deducing $A\to C$ from $A\to B$ and $B\to C$), which can be implemented as
t (A->B,B->C) = m (B->C,m (A->B,suffixing))
using suffixing, or as
t (A->B,B->C) = m (A->B,m (B->C,prefixing))
once prefixing is available.
Here's the proof spelled out in $14$ steps. The first table shows the theorems used to generate the antecedents $A$ and the implications $A\to B$ for modus ponens, as well as the resulting theorems $B$:
$$ \begin{array}{c|c|c|c|c} &&A&A\to B&B\\\hline \text{(a)}&\text{weakening}&&&p \to (q \to p)\\ \text{(b)}&\text{suffixing}&&&(p \to q) \to ((q \to r) \to (p \to r))\\ \text{(c)}&\text{contraction}&&&(p \to (p \to q)) \to (p \to q)\\ \hline \text{(d)}&\text{*}&\text{(a)}&\text{(b)}&((p \to q) \to r) \to (q \to r)\\ \text{(e)}&&\text{(b)}&\text{(d)}&p \to ((p \to q) \to (r \to q))\\ \text{(f)}&\text{*}&\text{(e)}&\text{(b)}&(((p \to q) \to (r \to q)) \to s) \to (p \to s)\\ \text{(g)}&\text{assertion}&\text{(c)}&\text{(f)}&p \to ((p \to q) \to q)\\ \text{(h)}&&\text{(g)}&\text{(b)}&(((p \to q) \to q) \to r) \to (p \to r)\\ \text{(i)}&\text{*}&\text{(b)}&\text{(b)}&(((p \to q) \to (r \to q)) \to s) \to ((r \to p) \to s)\\ \text{(j)}&\text{permutation}&\text{(h)}&\text{(i)}&(p \to (q \to r)) \to (q \to (p \to r))\\ \text{(k)}&\text{prefixing}&\text{(b)}&\text{(j)}&(p \to q) \to ((r \to p) \to (r \to q))\\ \text{(l)}&&\text{(k)}&\text{(k)}&(p \to (q \to r)) \to (p \to ((s \to q) \to (s \to r)))\\ \text{(m)}&&\text{(c)}&\text{(k)}&(p \to (q \to (q \to r))) \to (p \to (q \to r))\\ \text{(n)}&\text{*}&\text{(j)}&\text{(b)}&((p \to (q \to r)) \to s) \to ((q \to (p \to r)) \to s)\\ \text{(o)}&&\text{(m)}&\text{(n)}&(p \to (q \to (p \to r))) \to (q \to (p \to r))\\ \text{(p)}&\text{*}&\text{(l)}&\text{(b)}&((p \to ((q \to r) \to (q \to s))) \to t) \to ((p \to (r \to s)) \to t)\\ \text{(q)}&\text{self-distribution}&\text{(o)}&\text{(p)}&(p \to (q \to r)) \to ((p \to q) \to (p \to r))\\ \end{array} $$
The asterisks mark intermediate steps in invocations of transitivity. Note that most theorems with more than three variables occur only in such intermediate steps. Substitutions are being performed as late as possible; by performing them as early as possible, the proof could be written using only theorems with at most three variables.
The second table shows the substitutions used; you can also find these automatically by unification. The variables are named such that they appear in alphabetical order in the resulting theorems.
$$ \begin{array}{c|l|l} &A&A\to B\\\hline \text{(d)}& p\mapsto q,q\mapsto p& p\mapsto q,q\mapsto (p \to q),r\mapsto r\\ \text{(e)}& p\mapsto r,q\mapsto p,r\mapsto q& p\mapsto r,q\mapsto p,r\mapsto ((p \to q) \to (r \to q))\\ \text{(f)}& p\mapsto p,q\mapsto q,r\mapsto r& p\mapsto p,q\mapsto ((p \to q) \to (r \to q)),r\mapsto s\\ \text{(g)}& p\mapsto (p \to q),q\mapsto q& p\mapsto p,q\mapsto q,r\mapsto (p \to q),s\mapsto ((p \to q) \to q)\\ \text{(h)}& p\mapsto p,q\mapsto q& p\mapsto p,q\mapsto ((p \to q) \to q),r\mapsto r\\ \text{(i)}& p\mapsto r,q\mapsto p,r\mapsto q& p\mapsto (r \to p),q\mapsto ((p \to q) \to (r \to q)),r\mapsto s\\ \text{(j)}& p\mapsto q,q\mapsto r,r\mapsto (p \to r)& p\mapsto (q \to r),q\mapsto r,r\mapsto p,s\mapsto (q \to (p \to r))\\ \text{(k)}& p\mapsto r,q\mapsto p,r\mapsto q& p\mapsto (r \to p),q\mapsto (p \to q),r\mapsto (r \to q)\\ \text{(l)}& p\mapsto q,q\mapsto r,r\mapsto s& p\mapsto (q \to r),q\mapsto ((s \to q) \to (s \to r)),r\mapsto p\\ \text{(m)}& p\mapsto q,q\mapsto r& p\mapsto (q \to (q \to r)),q\mapsto (q \to r),r\mapsto p\\ \text{(n)}& p\mapsto q,q\mapsto p,r\mapsto r& p\mapsto (q \to (p \to r)),q\mapsto (p \to (q \to r)),r\mapsto s\\ \text{(o)}& p\mapsto q,q\mapsto p,r\mapsto r& p\mapsto q,q\mapsto p,r\mapsto (p \to r),s\mapsto (q \to (p \to r))\\ \text{(p)}& p\mapsto p,q\mapsto r,r\mapsto s,s\mapsto q& p\mapsto (p \to (r \to s)),q\mapsto (p \to ((q \to r) \to (q \to s))),r\mapsto t\\ \text{(q)}& p\mapsto p,q\mapsto (p \to q),r\mapsto r& p\mapsto p,q\mapsto p,r\mapsto q,s\mapsto r,t\mapsto ((p \to q) \to (p \to r))\\ \end{array} $$
It seems to me that a much simpler and human readable proof is possible, unless I’m misunderstanding something. Using the Deduction Theorem, the result is relatively straightforward to prove. This motivated me to prove the Deduction Theorem for this logical system, which I found to be less straightforward, but still not particularly difficult.
To make sure we’re all on the same page, the logical system in question consists of the inference rule modus ponens (MP) and the following three axiom schema:
axiom 1 $\;\;\;\;\; p \; \rightarrow \; (q \rightarrow p)$
axiom 2 $\;\;\;\;\;(p \rightarrow q) \;\; \rightarrow \;\; [\; (q \rightarrow r) \rightarrow (p \rightarrow r) \; ]$
axiom 3 $\;\;\;\;\;[\; p \rightarrow (p \rightarrow q) \; ] \;\; \rightarrow \;\; (p \rightarrow q)$
We want to show that the following wff (well formed formula) is provable in this logical system:
$$[p \rightarrow (q \rightarrow r)] \;\; \rightarrow \;\; [(p \rightarrow q) \rightarrow (p \rightarrow r)]$$
By 3 applications of the Deduction Theorem (proved further below), it suffices to prove $r$ under the following assumptions: $p \rightarrow (q \rightarrow r),$ $p \rightarrow q,$ and $p.$ That is, it suffices to prove
$$ p \rightarrow (q \rightarrow r), \; p \rightarrow q, \; p \;\vdash \; r$$
(1) $\;\;\;p \rightarrow q$
(2) $\;\;\;$(line 1) $\rightarrow [\;(q \rightarrow r) \rightarrow (p \rightarrow r)\;]$
(3) $\;\;\;(q \rightarrow r) \rightarrow (p \rightarrow r)$
(4) $\;\;\;p \rightarrow (q \rightarrow r)$
(5) $\;\;\;p$
(6) $\;\;\;q \rightarrow r$
(7) $\;\;\;p \rightarrow r$
(8) $\;\;\;r$
Reasons for the above steps
(1) $\;\;\;$assumption
(2) $\;\;\;$axiom 2
(3) $\;\;\;$MP (lines 1, 2)
(4) $\;\;\;$assumption
(5) $\;\;\;$assumption
(6) $\;\;\;$MP (lines 5, 4)
(7) $\;\;\;$MP (lines 6, 3)
(8) $\;\;\;$MP (lines 5, 7)
In trying to prove the Deduction Theorem for this logical system (i.e. $\Gamma, \;p \vdash q$ implies $\Gamma \vdash p \rightarrow q$), I simply followed the standard proof (which makes use of axiom 1 and the wff we originally wanted to prove), and noted that the standard proof only requires us to make use of the following 3 results:
If $q$ is an axiom or a member of $\Gamma$, then for any wff $p$ we can prove $p \rightarrow q$ in our logical system.
We can prove $p \rightarrow p$ in our logical system.
Given $p \rightarrow r$ and $p \rightarrow (r \rightarrow q)$, we can prove $p \rightarrow q$ in our logical system.
proof of 1: $\;\;\;$Apply MP to $q$ and $q \rightarrow (p \rightarrow q)$ (axiom 1).
proof of 2: $\;\;\;$Apply MP to $p \rightarrow (p \rightarrow p)$ (axiom 1) and axiom 3.
proof of 3: $\;\;\;$This is the difficult part. Below is a proof of what’s needed, namely
$$ p \rightarrow r, \; p \rightarrow (r \rightarrow q) \;\vdash \; p \rightarrow q$$
(1) $\;\;\;p \rightarrow r$
(2) $\;\;\;p \rightarrow (r \rightarrow q)$
(3) $\;\;\;$(line 1) $\rightarrow \; [(r \rightarrow q) \rightarrow (p \rightarrow q)]$
(4) $\;\;\;(r \rightarrow q) \rightarrow (p \rightarrow q)$
(5) $\;\;\;$(line 2) $\;\;\rightarrow \;\; \{\;$(line 4)$ \rightarrow [p \rightarrow (p \rightarrow q)] \; \}$
(6) $\;\;\;$(line 4) $\;\rightarrow \; [p \rightarrow (p \rightarrow q)]$
(7) $\;\;\;p \rightarrow (p \rightarrow q)$
(8) $\;\;\;[p \rightarrow (p \rightarrow q)] \; \rightarrow \; (p \rightarrow q)$
(9) $\;\;\;p \rightarrow q$
Reasons for the above steps
(1) $\;\;\;$assumption
(2) $\;\;\;$assumption
(3) $\;\;\;$axiom 2 ($r$ is $q$)
(4) $\;\;\;$MP (lines 1, 3)
(5) $\;\;\;$axiom 2 ($r$ is $p \rightarrow q$)
(6) $\;\;\;$MP (lines 2, 5)
(7) $\;\;\;$MP (lines 4, 6)
(8) $\;\;\;$axiom 3
(9) $\;\;\;$MP (lines 7, 8)
Here is how I discovered the above proof. Working backwards, I noticed that the conclusion of axiom 3 was what I wanted, so I made note of the fact that it would be enough to obtain $p \rightarrow (p \rightarrow q).$ Then I tried working forward. First, I applied axiom 2 followed by MP to the assumption $p \rightarrow r,$ using $q$ as the introduced suffix. (Since I already had $p$ and $r$ appearing, this seemed to be a natural way to get $q$ to appear.) Then I tried applying axiom 2 followed by MP to the assumption $p \rightarrow (r \rightarrow q).$ At some point (perhaps my 3rd attempt), I used $p \rightarrow q$ as the introduced suffix, motivated by the fact that this got line 4 to show up. After this, the proof immediately fell into place, since in line 5 the conclusion of the conclusion is $p \rightarrow (p \rightarrow q),$ which I had previouly noted was sufficient.
Incidentally, the logical system above is the same (in the sense of having the same set of provable wffs) as the logical system with the inference rule MP and the following two axioms: axiom 1 and the wff we originally wanted to prove. Each of these logical systems is also equal to the logical system with MP and Deductive Theorem as inference rules and no axioms (thus, one might call this system “DT Logic”). I think logicians call this the positive implicational fragment of intuitionistic propositional logic, but I like “DT Logic” better. Other axiomatizations of DT Logic can be found at the Wikipedia page “List of logic systems” under the category “Positive implicational calculus”.
For completeness, here’s a proof that DT Logic can be characterized by no axioms along with the inference rules MP and DT (and also the Rule of Assumptions, I suppose). It suffices to prove, in this no-axiom logical system, axiom 1 and the wff we were proving in this thread.
$\;\;\;p,\; q \vdash p\;$ implies $\;p \vdash q \rightarrow p\;$ implies $\;\vdash p \rightarrow (q \rightarrow p)$
$\;\;\;p \rightarrow (q \rightarrow r), \; p \rightarrow q, \; p \; \vdash \; r\;\;\;$ (MP, 3 times)
implies $\;p \rightarrow (q \rightarrow r), \; p \rightarrow q \; \vdash \; p \rightarrow r\;\;\;$ (DT)
implies $\;p \rightarrow (q \rightarrow r) \; \vdash \; (p \rightarrow q) \rightarrow (p \rightarrow r)\;\;\;$ (DT)
implies $\;\vdash \; [p \rightarrow (q \rightarrow r)] \;\; \rightarrow \;\; [(p \rightarrow q) \rightarrow (p \rightarrow r)] \;\;\;$ (DT)
For comparison, here is Joriki's solution in the combinator language we used in the comment thread between me and Zhen Lin: $$\begin{align} \mathbf X &= \mathbf A (\mathbf A \; \mathbf K \; \mathbf A) \mathbf W \\ \mathbf C &= \mathbf A \; \mathbf A (\mathbf A \; \mathbf X) \\ \mathbf B &= \mathbf C \; \mathbf A \\ \mathbf S &= \mathbf A (\mathbf B \; \mathbf B) \; (\mathbf A \; \mathbf C (\mathbf B \; \mathbf W)) \end{align}$$ where $\mathbf X$ is an ad-hoc name for Joriki's "assertion" formula.
Zhen Lin's constuction for the final line $$\mathbf{S} = \mathbf{A A} ( \mathbf{A} ( \mathbf{B W} ) ( \mathbf{A A} ) )$$ is slightly more efficient than Joriki's because it contains only one $\mathbf B$ that needs to be unfolded. This yields the final term $$ \mathbf S = \mathbf A\; \mathbf A\;(\mathbf A ( \mathbf A \; \mathbf A (\mathbf A \; (\mathbf A (\mathbf A \; \mathbf K \; \mathbf A) \mathbf W) ) \mathbf A \; \mathbf W) \; (\mathbf A\; \mathbf A)) $$ which encodes a Hilbert-style proof with 15 fully substituted axiom instances and 14 modus ponens steps.