What is "magic" about the combination of addition and multiplication in formal arithmetic?
Looking back at this question I'm quite dissatisfied with my original answer: it wasn't wrong, but it missed some important content. I believe this new version is much better.
The end of your post focuses on the second incompleteness theorem, but you open with the first, so let me split this question into two parts:
-
How much arithmetic do we in fact need for the first incompleteness theorem?
-
How much arithmetic do we in fact need for the second incompleteness theorem?
Briefly, the answer is this:
Only the simplest properties of addition and multiplication are needed for the first incompleteness theorem. However, the second incompleteness theorem is more complicated.
To get into the right spirit, we need to realize that these are different theorems: although the usual treatment of GFIT yields GSIT as a corollary, there's no reason to believe that that usual treatment is optimal in terms of axiomatic overhead, and hence no reason to believe that GFIT and GSIT each require the same amount of arithmetic. In fact, GSIT is much more "expensive," while GFIT uses only the very basic additive/multiplicative structure of $\mathbb{N}$ (and consequently is much better understood).
First, GFIT. Here's a more-or-less optimal version:
Suppose $T$ is a computably axiomatizable and consistent theory in the language $\{0,1,+,\cdot,<\}$ such that the following is provable in $T$:
Each true quantifier-free sentence.
For each $n\in\mathbb{N}$, the sentence $$\lambda_n\equiv\forall x(\underline{n}<x\vee \bigwedge_{i\le n}x=\underline{i}),$$ where "$\underline{k}$" is the numeral corresponding to $k$.
Then $T$ is incomplete.
In snappier jargon, Robinson's arithmetic $R$ is essentially incomplete.
If we like we can use the relational version, replacing each operation with its graph - this matches Willard's approach, we just have to add "single-valuedness" axioms, at least for tuples of numerals.
More snappily, Robinson's arithmetic $R$ is essentially incomplete; usually the better-known $Q$ is mentioned here, but it's overkill. There is still more to be said on exactly how optimal this is - see e.g. Section $4$ of this paper of Beklemishev. However, I would confidently say that this is optimal phrasing amongst "natural" theories.
So as you expected the very bare minimum of arithmetic is enough for at least one Godelian phenomenon to occur, and consequently for GFIT we have a very snappy characterization of what "enough arithmetic" is: more or less, we need addition, multiplication, and "discreteness," and that's it. We don't even need that addition is commutative, or that there is no largest number, or that $<$ is a linear order, or etc.! (Note however that we can't drop that second bulletpoint above since $Th(\mathbb{R};0,1,+,\cdot,<)$ is computable.)
For GSIT, on the other hand, per Willard things are much messier.
For any Godel-y argument to apply we need to be able to code sequences appropriately and prove basic facts about the (image within the theory of the) coding apparatus, and the meaning of "basic facts" varies between GFIT and GSIT. For GFIT, it's basically enough for computable functions to be representable; for GSIT, the manipulations we need to perform are much more intricate and don't lead (as yet) to a snappy characterization. In particular, GFIT doesn't need addition and multiplication to be always defined, just defined on each specific pair of actual natural numbers, while by contrast dropping totality was exactly what opens the door for Willard's self-verifying theories.
To see why GSIT plausibly requires more axiomatic power, recall its proof: from a putative proof of "I am consistent," we extract a proof of the Godel-Rosser sentence $\sigma$ by arguing within the theory that the failure of $\sigma$ (namely, a proof of $\sigma$ shorter than any disproof of $\sigma$) would entail a contradiction in our theory. But this hides a totality assumption! Specifically, we would need to argue within the theory that we can "search through all possible proofs of a given length in finite time," and while this may appear trivial at first it's really not.
If we have a proof of $\sigma$ of length $m$ shorter than any disproof of $\sigma$, the corresponding disproof of $\sigma$ would naively have length something like $\sum_{i<m} i2^i$ since for each $i$ we need to check each of the exponential-in-$i$-many possible disproofs of $\sigma$ of length $i$). We can do somewhat better, but we're never going to be able (per Willard) to get around the issue that the exhaustive proof p' of $\neg\sigma$ which we get from a putative proof p of $\sigma$ with no shorter disproof of $\sigma$ will in general have to be much longer than $\sigma$, so our theory can't prove that we can actually get p' from p without appropriate totality assumptions.
I think this paper of Willard - though technical - is well worth your time, since it really dives into some of the fine points of exactly what needs to break for what else to break. Visser's Another look at the second incompleteness theorem is also quite good; and, while not directly related, Visser's delightfully-titled paper Oracle bites theory discusses another situation where an "obvious" result about proof construction breaks down since we can't prove the necessary totality principle.
(And although it's orthogonal to your question, we can also ask what basic properties of the underlying logical system are necessary. This isn't something I know much about, but see here.)