Pronouncing -ed when it comes after a voiced final consonant
Short Answer
The /d/ at the end of the word phoned will be indeed be devoiced when the word is said in isolation and will therefore sound similar to a [t] - if one is listening very carefully. If you teach -ed endings you'll probably end up clearly enunciating the final consonant, whereas it would normally have no audible release in normal speech. This is why the Original Poster is getting a noise similar to aspiration at the end of the word. One should still teach the full -ed ending rules even though the /d/ will be devoiced, because this will affect how the rest of the word is pronounced and perceived.
Full Answer
The pedagogy
As pointed out in Mari-Lou's comment under the Original Question, the issue for learners here is that they don't add an extra syllable to words like lived, begged, liked such that they are pronounced /lɪvɪd, begɪd, laɪkɪd/ (livid, beggid and likid).
However, teaching students about voicing in general is very useful and productive (it helps in many other areas) and it's also helpful for instructors to understand the real processes involved here so that they don't mislead their students or focus on the wrong details.
It's important to note here that the OP is demonstrating quite some integrity when it comes to the teaching of this point. A lot of ESL teachers will bend their descriptions of pronunciation and indeed their pronunciation itself to fit with what they presume to be the facts. This is even more frequent when some coursebook or other has told us what these facts are. In contrast, the OP is reflecting on their careful observation of the actual natural sound and calling into question their previous assumptions. This is the mark of a reflective and insightful language teacher.
The phonetics
So here is what's going on, and what we should expect to hear at the end of words like phoned.
Voiced consonants may become DEVOICED in various environments in a word or in connected speech. Different types of consonant will be affected to different degrees and in different contexts, but all voiced consonants will be either fully or partially devoiced in some specific environments.
Importantly for us, obstruent consonants, (these are basically the real consonant-like consonants, not namby-pamby nasals or semivowels), become partially or often fully devoiced when not surrounded by voiced sounds. So they will become partially or fully devoiced after voiceless sounds and after silence. They also become partially or fully devoiced when occurring before voiceless sounds including silence. So, for example if we say friend without any following words, the final /d/ will be devoiced in normal speech. This is not "bad English", this is what actually happens with all speakers when speaking naturally.
So, this should lead us to ask, is the devoiced /d/ at the end of such a word actually a /t/; should we represent it phonetically like this [frent]? The answer is an emphatic NO! This would be a very misleading thing to do.
The reason is this: there are many different subtle characteristics of the different sounds that we use when we speak. For the purposes of being able to classify them and talk about them easily we talk about consonants in terms of their "voice, place and manner". However when, for example, a [d] becomes devoiced, it still keeps all its other characteristics in terms of the many different articulatory settings involved, the length of the consonant and also, very importantly, the effect that these factors have on the surrounding sounds. So a fully devoiced [d] at the end of the word friend is still a [d] and not a [t]. In phonetics, if we want to do a narrow transcription so that the detail of the devoicing is shown, we need to use a devoicing diacritic [d̥] (the diacritic is that little circle which should be directly underneath the 'd' - unfortunately Stack Exchange software doesn't seem to like this diacritic and kicks it over to the right).
Fortis and lenis consonants
For the reasons described above, phonologists and phoneticians of English prefer the terms lenis and fortis to refer to consonants that we think of as usually being voiced or unvoiced. This is because, for example, lenis consonants retain all of their lenis characteristics even when they become devoiced, and fortis consonants also retain their characteristics even if they become voiced for any reason.
Pre-fortis clipping
So why is it important to maintain the distinction between a [t] and a devoiced [d]? Well, you will notice if you are a native speaker (and probably also if you aren't), that you can clearly distinguish between the two words fond and font. This is the case even though, if you say the two words in isolation, the [d] at the end of the word fond is likely to be devoiced. The reason for this is that a real [t], unlike a devoiced [d], is a fortis consonant. Fortis consonants at the end of a syllable cause the preceding vowel and any other voiced consonants following it to be radically shortened, or to use the technical term, clipped. So if you listen very carefully you will notice that the vowel plus /n/ section in font is much shorter than the one in fond. Similarly you will find that the vowel in bead is approximately double the length of the vowel in beat. This phenomenon is known as pre-fortis clipping. It is a cross-linguistic phenomenon.
Now, it is the prefortis clipping of the preceding vowel that tells a native speaker that the consonant at the end of a syllable is voiceless, or rather fortis. If the vowel isn't clipped we know that the last phoneme in the syllable is lenis. The actual voicing of the last sound itself is entirely irrelevant.
The Original Poster's observations
The /d/ in phoned will indeed be realised by a devoiced lenis consonant, [d̥]. If we focus on the release of the [d], it will definitely have a [t]-like quality, because there will be no vocal fold vibration and we will be able to hear the release of the air - there being no voicing to drown this out. Notice that this will only happen if we carefully pronounce the end stage of the consonant. Usually syllable final plosives such as /d/ or /t/ will not have an audible release if not followed by a vowel. Of course, if we are teaching past tense endings to students we are going to carefully enunciate the ending.
This brings up the question of whether we should teach the voice matching rule for -ed endings to students at all. The answer is, yes. Students who understand that there is a /t/ at the end of ceased and a /d/ at the end of siezed will automatically pronounce ceased with a shorter vowel. In addition, just the practice of working out which sounds are voiced and which aren't is useful for them, especially if they have preexisting difficulties with pairs of voiced and unvoiced sounds.
Just understanding that some sounds are voiced and some aren't will increase students' awareness of their problems in other areas. For example, Spanish speakers are unlikely to be aware that /z/ and /s/ are two distinct phonemes in English. The same will be true for Arabic speakers who will have difficulty in distinguishing /b/ and its fortis partner /p/ and in pronouncing them differently.
Of course the other reason for doing so, is that we do not usually say words in isolation! As the Original Poster observed, when a /d/ is preceded and followed by a voiced sound, it will remain fully voiced. So in "We phoned a lot of people" we need to produce and recognise /fəʊnd ə lɒt/, not /*fəʊnt ə lɒt/. The latter would give students a very marked non-native pronunciation.
I think it is pretty common in various English dialects for /b/ /d/ /g/ at the end of a word to be pronounced as unaspirated voiceless [p] [t] [k], by which I mean consonants said without vibration of the vocal cords, but without the little puff of air referred to as aspiration. I don't know about British dialects (I'm from Ohio, USA). I haven't heard a dialect with word final voiceless aspirated stops for these consonants, but it is certainly possible that you speak one. (We might not mean exactly the same thing by the term "aspirate".)