Differentiate between past and present just by pronunciation when word is followed by d- or similiar sound

We can distinguish them because they are pronounced differently, and "-ed" is past tense. I don't see what is confusing about this.

Edited to add: I see that I may not have addressed part of your question. You wrote:

Native speakers pronounce both sentences so that we just hear one [d], so we don't know the tense by pronunciation in these cases. Am I right?

The answer is: No, you are not right. I am a native speaker and would never say "killed the" and not pronouce the "-ed". To be sure, "killed" is not prounounced with two syllables, like "kill-ed", but neither is the "d" silent. The "e" is silent, however. The pronunciation goes like this: "killd".

When I reflect upon it, I don't know why the "ed" is silent. With many verbs ending in "-ed" the "e" IS pronounced. Some examples:

hated
waited
extended
painted
tooted

Examples where the "e" is silent and the final "d" is not:

killed
tooled
pained
warred
tried

I am sure a linguist could come up with a general rule, but I'm not one of those, so I must defer to an expert. It does seem like verbs whose base ends in "d" or "t" will have pronounced "e" in their "-ed"s.

Full answer and experiments you can try at home

[For short answer see above/below]

An important part of your mouth:

If you feel behind your top teeth with your tongue, you should be able to feel a little shelf-like part right there behind your top teeth. Now if you run your tongue backwards behind that, you'll feel that your mouth suddenly arches upwards to form the so-called roof of your mouth.

That little shelf-like bit behind your teeth is your alveolar ridge. Now, if you're a native English speaker, then if you make a /t/ sound ('tuh', 'tuh', tuh'), you should be able to feel the top of the tip of your tongue making contact with your alveolar ridge to articulate the sound. If you try it with /d/, /n/ or /l/ you should find the same thing.

Place of ariticulation

Now try saying this phrase a few times:

hid us

You should still be able to feel your tongue making contact with the alveolar ridge for the /d/ there, hopefully. The sound of the /d/ might be a little different for you if you speak an American variety of English.

Now, if you've done that a few times try this sequence:

hid them

What you'll find when you say this (probably) is that you aren't making your /d/ on your alveolar ridge any more. Instead, you'll be making it on your teeth, in the same place that you make the 'th' sound, [ð]. The /d/ has moved. It has become dental (made on the teeth).

Now - if you don't replace your /t/ with a glottal stop at the end of a word - you should find that the same thing happens with the following sequences too:

hit us (/t/ on the alveolar ridge)
hit them (/t/on the teeth, dental)

All the sounds that we make on the alveolar ridge in English are highly unstable. One reason for this is that it is difficult to move quickly enough from an alveolar contact to another contact nearby. There are many other reasons too, which we won't go into here. You can try this kind of thing with other alveolar sounds such as /l/ or /n/:

all of them (alvoelar l)
all the time (dental l)
ban us (alveolar n)
ban them (dental n]

So what we been experimenting with is the fact that /t/ and /d/, when we articulate them, tend to move position quite a lot. In fact they will sometimes move so far, that we will recognise them as a completely different consonant. So try saying the following as spelled:

Its a very goob book.

This phrase should be straightforwardly recognisable to you as "It's a very good book". This kind of transformation (called assimilation) is not obligatory, but it happens all the time, even in slow careful speech.

Dropping sounds altogether: elision

So we've worked out that /t/ and /d/ are unstable in that they move around a lot. But they also get freely omitted, or elided in certain environments too. Consider the following phrases:

Mind the gap.
Mind Amy.

You should find the following pronunciations of these two phrases very different in terms of their acceptability :

Mine the gap. /maɪn ðə gæp/
Mine Amy. /maɪn eɪmi/

The first will probably sound ok to you. The second will be considered unacceptable by most speakers. Now some speakers will balk at the idea that they ever elide a /d/ in phrases like mind the gap and find it hard to believe that it is a quite natural part of the language. If you find yourself among this number, then there's a little experiment that you can do. Find a native speaker (right now, if you can) and ask them what you're saying. Use the phrases above mine the gap and mine Amy, or mine a child. For the last two, you'll find that they don't understand what your saying. But for the first one they'll come back to you immediately with mind the gap. A nice second experiment is to tell them you didn't say a /d/ there - and then listen to them tell you that you most definitely did!

That little experiment we've just done shows us that we cannot freely omit /d/ before a vowel. The same thing goes for /t/ too. Try we buss them wide open and a buss of Mozart for we bust them wide open and a bust of Mozart. The first should be acceptable, the second definitely isn't.

Now we will quickly find that we can't drop freely drop /d/ or /t/ when preceded by a vowel either. We need consonants on both sides. Consider

bind them: "bine them" /baɪn ðəm/
abide them: *"abie them" /əbaɪ ðəm/

The first should work for you, but the second definitely is not an acceptable substitute.

There's one last experiment we need to do. Try out the following pronunciations and decide if any are acceptable substitutes for you (or someone sitting nearby):

cole weather for cold weather
hol the advance for halt the advance
loss the plot for lost the plot

The first and third examples should be fine. The second is unacceptable. There is a simple reason for this, but it's difficult to just guess it. Some consonants that we make involve vibration of the vocal folds. They are "voiced". An example would be /m/. You can sing tunes with these kinds of sound. Other sounds just involve the expulsion of air from the mouth. An example would be the 'sh' sound we hear in sshh , don't wake the baby. If you try to hum a tune using sh, you'll find it has no pitch, so you can't get any real musical note going (try it!).

Now, /d/ is voiced and /t/ isn't. A /t/ just involves the movement of air. If we want to drop a /t/ or a /d/, then the previous consonant must match for voicing. In cold weather the /l/ like the /d/ is voiced. In lost the plot, the /s/, like the /t/, is unvoiced. So in these two cases we can felicitously lose the /t/ or /d/. However, in halt the advance the /l/ is voiced and the /t/ is not. For this reason we cannot drop the /t/ here.

So, in order to be able to freely drop a /t/ or /d/ the following conditions need to be met:

It must be at the end of a syllable
It must be surrounded by consonants
The preceding consonant must match it in terms of voicing.

One last thing, the conditions above will not be met if one of the surrounding consonants is /r/ or /h/. Boil ham does not work as a pronunciation of boiled ham, neither does boil rice for boiled rice.

Past simple endings

For regular verbs, the past tense suffix appended to verbs in English is represented in the orthography by -ed. In the actual speech the situation is slightly more complicated. When the base of the word ends in a voiced sound (or lenis consonant), we usually add a /d/, which is voiced. If the base ends in an unvoiced sound we usually add a /t/, which is unvoiced. But if the base itself already ends in a /t/ or /d/ we add a vowel before the ending so that the past tense morpheme is distinguishable. Because there is now a vowel before the suffix, and vowels in English are voiced, the final consonant in such situations is a /d/. We therefore see the following types of endings:

claim: kleɪm --> kleɪmd (voiced consonant, /m/, voiced d)
drape: dreɪp --> dreɪpt (unvoiced consonant, /p/, unvoiced t)
rate: reɪt --> reɪtɪd (t + ɪd)
fade: feɪd --> feɪdɪd (d + ɪd)

The Original Question

When the base form of a regular verb ends in a vowel then we will normally be able to tell that the verb is past tense from (approach phase of) the /t/ or /d/ suffix.

When regular past tense form of a verb is followed by a vowel, we will likewise be able to audibly distinguish the form by it's suffix, which will be clearly audible because of its release.

However, when the base form of a regular past tense verb ends in a consonant and is also followed by a consonant (not including /h/ or /r/ in either case) then things may be significantly more complicated. I say may be because it might simply be the case that the /t/ or /d/ is pronounced in a canonical fashion and is clearly distinguishable from the consonants surrounding it. For example in the following sequence, if the /d/ is clearly pronounced it will be clearly audible:

... bowled when ...

The /d/ here may be clearly articulated and, if it is, will be easily discernible.

However, there are two other possibilities. The first is that the /t/ or /d/ may be subject to assimilation, or similar processes. So for example, in the string:

billed them

... the /d/ might change its place of articulation to the back of the teeth to match the following /ð/ ('th' sound) at the beginning of them. This is what we saw with hid them further above. In this situation it may be much more difficult, if it is actually possible at all, to audibly discern the /d/ or /t/ suffix.

Also, in theory, we should be able to drop the /t/ or /d/ ending altogether for regular verbs in these circumstances. If the end of the base of the verb and the beginning of the following word are both consonants, then the /t/ or /d/ ending is surrounded by consonants. Now, for regular verbs, the /t/ or /d/ always matches the preceding consonant, because as we saw above we choose /t/ or /d/ precisely on this basis. There is never a mismatch. In practice, there do seem to be some exceptions to our being able to drop the /d/ or /t/ ending - but in general the rule holds good. Now, if the /d/ or /t/ has been dropped altogether, then obviously we cannot distinguish between a past tense and present tense verb by the sound. What we hear isn't the crucial factor here.

So these two factors, namely that the place of articulation may change to match the following consonant and secondly that the past tense suffix may be elided, mean that we have to depend on the context to tell us whether we have just heard a present or past tense verb. And it follows that - if the context is not clear enough - we may not know!

Consider the following examples:

I push them out the window.
I pushed them out the window.

These two sentences will be indistinguishable for most speakers by sound alone. Here's another experiment for you. Find a subject, ask them what you're saying and then say:

Yesterday, I push them out the window.

You're guaranteed to be heard saying pushed them out the window. When you think about this carefully, it is not so surprising. The tense of many irregular verbs is similarly indistinguishable by sound alone. Consider:

I put them on the floor.

This might mean that you do it every day or that you did it yesterday. If you stick a usually or a last week onto the sentence, the tense will become clear. But we don't find cost, let or put for example to be particularly problematic even thought their past and present tense forms are the same.

As is very often the case in language, we often know what one sound or word is, not because of the form of that actual sound or word, but because of what's surrounding it.

Farid, Cyberherbalist and Drew have all given good explanations of how the words should be pronounced and that native speakers should generally hear a difference. I'd like to talk about why an English-learner might not hear that difference.

When we each learn our respective native languages, we learn to differentiate between certain sounds. The specific sets of sounds differ widely between languages, such that a learner of a new language may find that there are sounds that seem identical to the learner, where native speakers can distinguish them.

Certainly there are cases where these sounds ARE identical and even the native speaker must rely on context to differentiate between intended words, but it is also often the case that a learner's ear is not attuned to the differences between these sounds.

For example, in learning Hindi, I have had some difficulty in learning to distinguish between alveolar and retroflex sounds which to me really just sound almost identical unless the speaker is specifically trying to enunciate to show the difference. But it can be a big difference, for example, the words "fat/thick" and "pearl" are identical in Hindi apart from this precise difference. To me, they both sound like "moti", but the T sound I hear is either त (alveolar t) or ट (retroflex ṭ). To a native speaker, these sounds are quite distinctive, and let me tell you, it can be embarrassing to say "fat" when one means "pearl".

Similarly, in English we distinguish between W (/w/ - voiced labial-velar approximant) and V (/v/ - voiced labial-dental fricative), but in Hindi (and many other languages), these are used interchangeably, making it potentially hard for a native Hindi-speaker to distinguish between "vine" and "wine". Of course, clearly, since the word "wine" came from the same root as "vine", these are related sounds, but to my ear they are completely distinct and it's hard to understand how someone can hear them as interchangeable. In the same way, the Hindi speaker hears a very different sound between त and ट and has trouble understanding my difficulty in distinguishing it. (Interestingly, it is sometimes easier to learn to produce the correct sound than to recognize the correct sound.)

The difference between the English d /d/ and the voiced English th /ð/ is another of these cases where a sound (/ð/) just doesn't exist in many other languages, and the closest approximation (/d/) is what the brain fills in. Over time, it is possible to adjust and learn to hear these sounds, but it takes time and a lot of exposure and practice. Until then, context helps.

I think the answer is that they sound different.

And I think the difference is this: When killed is pronounced in this context, your mouth is positioned as if it were going to pronounce the d, but it is not, or is hardly, pronounced. Because your mouth pronounces the coming from the position for pronouncing the d, the resulting sound is different from what happens for kill the.

The same kind of thing happens, I think, for distinguishing can't from can. Your mouth is positioned to pronounce the sound for the letter t when the following word is pronounced. This is different enough from moving from can to that following word -- the difference can be heard, at least by native speakers.

Someone knowledgeable can speak to whether any of what I'm guessing is true. ;-)

When you say kill them you put two voiced sounds together: /kɪlðəm/ whereas when you say killed them your breath channel becomes blocked between these two sounds for a tiny fraction of a second: /kɪl.ðəm/. The /d/ sound is not fully pronounced, but the blockage is still there (denoted by a dot).

Although /ð/ and /d/ are pronounced from about the same place in your mouth, surely the latter causes a stronger friction or even a blockage in the air passage. Carefully observe the air pressure in your mouth and throat and notice any fluctuation thereof when you say kill them and when you say killed them. Isn't the pressure slightly higher in the second one, between the sounds /l/ and /ð/? This pressure difference may cause a slight change in the rhythmical pattern of the sounds and may result in a more intensified /ð/. This is what a skilled ear catches and therefor tells one set of words from the other.

The bottom line of my answer is that there is not 100% resemblance between kill the and killed the (and between some other resembling sets of words), and sometimes it is possible to hear those differences without the help of a context.

19th century American English "slang"?

"How much water do you take a bath with?" — Is this sentence correct?

Can anyone explain the use of determiners in this passage?

What's a good word for a person, or a state, when a person is constantly waiting for life to get good, but does nothing to actually make it happen

Is there a word that describes this gesture of interlocking hands on the lower stomach?

In a conversation, do native speakers people call each other by name? [closed]

Word for the behavior of people who complain at everything you do

There is no headache strong enough, that a good coffee won't relieve

What is the term for giving an action or phenomenon somebody's name, e.g. "Doing a Lord Lucan"?

Is there an English variant of "Zeitgeist" other than "spirit of the times"?

Word or phrase meaning 'Misinterpretation of tone due to language or cultural barrier'

What's a word for knowing something from experience?