[Firstly, please excuse the lazy IPA: I'm writing this on my phone, and sadly IPA input is not possible.]

There are two things that conspire to make this happen, both quite normal processes in standard speech with its slurriness and tendency to be lazy wherever possible:

  1. The diphthong /aɪ̯/ for ‘I’ is reduced to a monophthong /a/, particularly in an unstressed position (which it usually is as the subject of a clause).

  2. The alveolar stop /d/ is reduced to an alveolar flap /ɾ/ intervocalically, and in pretonic position (i.e., right before a stressed or semi-stressed syllable), this flap can be weakened further to the point of deletion. Intervocalic ‹nt› is usually reduced to a nasalised version of the same flap, while ‹nt› before most consonants mostly gets reduced to /ʔn/ (where ‘ʔ’ signifies a type of glottal stop-like consonant, or simply a creaky phonation of the preceding consonant; this is then optionally lost entirely in fast speech).

The result is that you get a monophthong directly followed by a diphthong, and it is quite natural to simply swish those two together into a single diphthong.

So you get a development that goes something like /aɪ̯ doʊn(t)/ => /a doʊʔn/ => /a ɾoʊn/ => /a oʊn/ => /aʊn/ or /aon/.