What exactly is an idiom?
Solution 1:
Thanks for the fascinating question. I'll start out by referencing the definition in this paper on the process of identifying Japanese idioms. I think it describes idioms in a fairly elegant and straightforward way:
Phrases that (i) consist of more than one word that tend to behave as a single syntactic unit and (ii) take on a fixed, conventional meaning.
The paper goes on to explain that idioms can be both literal and metaphorical and indeed attempts to classify idioms into those categories according to their usage and to describe the process of teaching computers to identify idioms and create a database like this one to identify non-literal usages of phrases (likely idioms). I couldn't find a similarly rigorous study of english-language idioms but presumably something like this could be constructed which breaks down idioms into syntactical archetypes. For the most literal answer of 'what is an idiom' or, even better, an plato-esque 'what is an idiom in the truest sense?' a computational definition like the one found here is probably the best you are going to get.
You seem to argue, or at least allude to the argument that some idioms are not metaphors and that this creates a sort of schism between true idioms and false idioms. If I might be a little radical - I think there's a strong argument to be made that no idioms are metaphorical. From Wearing, C. (2012), Metaphor, Idiom, and Pretense. Noûs, 46: 499–524. doi: 10.1111/j.1468-0068.2010.00819.x:
Intuitions about how pretense might be involved in understanding uses of figurative language such as metaphor and idiom seem to pull simultaneously in two directions. On the one hand, it seems almost obvious that pretending or something very much like it should be involved in understanding (at least) metaphors, for highly creative metaphors in particular seem to provoke or involve a species of vivid imagining which we might think, prima facie, involves make-believe. On the other hand, we hear metaphors (and other instances of figurative language) all the time, and on many of those occasions it's not at all clear that any experience of deliberately or consciously pretending is involved.
It could thus be persuasively argued that there is no imaginative or non-literal meaning of idioms, they function as a single atom with a unique meaning. To put it as elegantly as I can:
An idiom is a word with spaces.
Of course, this is far from universally accepted, although I find it to be the occam's razor of idiom theory. For a solid defense of the pretense interpretation of idioms you might want to check out Egan, A. (2008), Pretense for the Complete Idiom. Noûs, 42: 381–409. doi: 10.1111/j.1468-0068.2008.00686.x (link) which presents and entirely different and equally interesting theory of idiom arguing that idioms have two properties: First:
UNPREDICTABILITY: The meaning of a sentence in which an idiom occurs is different from the meaning you'd get by applying the usual compositional rules to the usual semantic values of its (apparent) constituents.
Second:
INFLEXIBILITY: Idioms are frozen in ways that other expressions are not. Apparently innocent changes in wording or structure often make the idiomatic reading of a sentence in which an idiom occurs unavailable, or at least strained.
The above article thoroughly lampoons my own views and makes for a delightful read, although I still fundamentally disagree that people go through an accelerated process of imagining an idiom before they interpret it but the eventual conclusion is:
The parts of sentences containing idioms all retain their usual semantic values, and are composed in the usual way, but the sentence is assigned nonstandard truth-conditions by processing its literal content through a pretense.
So yeah, the short answer, like that for most interesting questions, is that there is no answer. However, there are certainly arguments from a variety of perspectives and I find that 'kicked the bucket is an alternative pronunciation of died' to be the most persuasive.
Solution 2:
A still more radical approach to idioms, one that has given rise to a big and growing contingent in current linguistics, comes from Charles Fillmore. His lecture on Idiomaticity is a classic:
https://www1.icsi.berkeley.edu/~kay/bcg/lec02.html
Fillmore starts by supposing that idioms are conventional pairings of form, meaning, and use that you have to learn individually, since they can't be predicted from other conventions in the language (from words and rules of grammar, in mainstream models). But this means that all of our linguistic competence - from morphemes and words up to the most general phrase structure rules like predication, NPs and VPs, etc. - can be modeled as a hierarchical network of idioms of various abstractness and complexity. (In part because it feels weird to call something as general as predication an idiom, most linguists call the conventional units in these networks "constructions.")
See also the blockbuster 1988 essay by Fillmore, Kay, and O'Connor,
https://user.phil-fak.uni-duesseldorf.de/~filip/fillmore+88.pdf
which ends by suggesting that the "structure building principles of the so-called core" of our linguistic competence are "degenerate instance[s]" of idiomatic phrasal units. Needless to say, this view, and the views of Fillmore's students and successors, remain quite controversial.
Solution 3:
In Rosamund Moon's 'Fixed Expressions and Idioms in English: A Corpus-Based Approach' (freely downloadable, but I won't link as the links seem to change), she starts with a 4 - 5 page discussion about the terminology involved, conceding that there is considerable disagreement and potential confusion, and stating the need in most cases to define which sense one is using.
Three examples some but not others would call idioms are
'by and large' (extragrammatical),
'skate on thin ice' (highly transparent!), and
'move heaven and earth to ...' (no possible literal meaning).