Why are polynomials defined to be "formal" (vs. functions)?
Solution 1:
Algebraists employ formal (vs. functional) polynomials because this yields the greatest generality. Once one proves an identity in a polynomial ring $\rm\ R[x,y,z]\ $ then it will remain true for all specializations of $\rm\,x,y,z\,$ in any ring where the coefficients can be interpreted (commutatively), i.e. any ring containing a central image of $\rm\,R,\,$ i.e. any $\rm\,R$-algebra. Thus we can prove once-and-for-all important identities such as the Binomial Theorem, Cramer's rule, Vieta's formula, etc. and later specialize the indeterminates as need be for applications in specific rings. This allows us to interpret such polynomial identities in the most universal ring-theoretic manner - in greatest generality.
For example, when we are solving recurrences over a finite field $\rm\,\mathbb F = \mathbb F_p\,$ it is helpful to employ "operator algebra", working with characteristic polynomials over $\rm\,\mathbb F,\,$ i.e. elements of the ring $\rm\,\mathbb F_p[S]\,$ where $\rm\,S\,$ is the shift operator $\rm\ S\ f(n)\, =\, f(n+1).\,$ These are not polynomial functions on $\rm\,\mathbb F_p,\,$ e.g. generally $\rm\ S^p \ne S\ $ since generally $\rm\ f(n+p) \ne f(n+1).\,$ But any polynomial identity of $\rm\,\mathbb F[x]\,$ specializes to this operator algebra by way of the evaluation map $\rm\,x\mapsto \,S,\,$ e.g. we can specialize universal polynomial factorization identities in order to factor the characteristic polynomial, e.g. difference of squares $\rm\ x^2\! - y^2 = (x\!-\!y)\ (x\!+\!y)\,$ $\,\Rightarrow\,$ $\rm\,S^2\!-\! c^2 = (S\!-\!c)\ (S\!+\! c)\ $ via $\rm\,x,y\mapsto S,c,\:$ and we can specialize cyclotomic polynomial factorizations, etc. This would not be possible if we instead employed the the much less general ring of polynomial functions over $\rm\,\mathbb F,\,$ since its specializations of $\rm\,x\,$ must satisfy $\rm\, x^p = x.\,$ Similarly we can factor differential operators (with constant coefficients, for commutativity)
A simple yet striking example of the power of polynomial universality is this slick folklore proof of Sylvester's determinant identity. It employs "generic" matrices (i.e. having indeterminate entries) and exploits to the hilt the fact that the determinant has polynomial form. Hence to prove $\rm\ det\ (I+AB)=det\ (I+BA)\ $ the proof proceeds by simply cancelling $\rm\ det\ A\ $ from the $\rm\ det\ $ of $\rm\ (1+AB)A = A(1+B A).\, $ Because $\rm\,det\,A\,$ is a nonzero polynomial in the domain $\rm\,\mathbb Z[a_{\large\, ij},b_{\large\, ij}],\, $ it is cancellable. By cancelling it universally, i.e. as formal polynomial in a domain (vs. later as a number, possibly $0$, after evaluating the indeterminate matrix entries) we eliminate the "apparent singularity" when $\rm\ det\ A = 0.\,$ See also the discussion here.
Many folks have problems understanding why this proof does not divide by zero. The problem seems to stem from an apparent difficulty forgetting the analytic view of a determinant as a polynomial function, so one may instead view it more generally as formal polynomial in the entries of the matrix. It seems that the analytic bias is so strong that it is difficult for some folks to shift to the formal algebraic viewpoint. I was shocked to observe that even some folks who have completed graduate algebra courses had great difficulty believing the validity of such a formal algebraic proof, thinking instead that one has to resort to alternative arguments employing topological notions (density). Analogously, one can find (older) published papers by distinguished mathematicians disputing the validity of proofs using formal power series (G. C. Rota often joked about such).
To master abstract algebra it is crucial to develop a powerful sense of abstraction. This permits one to exploit many powerful analogies, e.g. viewing "functions" as "numbers" or vice versa. Indeed, the interplay between number fields and function fields is the source of many fruitful ideas in algebra and number theory.
Solution 2:
$p(x) = x^3 + 2x$ may give the zero function on the finite field $\mathbb{F}_3$, but it does not give the zero function on its field extensions, such as $\mathbb{F}_9$. A polynomial with coefficients in a field $F$ actually gives a well-defined function over any extension of $F$, and in this generality it's true that distinct polynomials give distinct functions.
This isn't really the reason, though. To my mind, the main reason is that polynomials satisfy a universal property: for a commutative ring $R$, the ring $R[x_1, ... x_n]$ is the free $R$-algebra on $n$ generators. In other words, if $S$ is any other $R$-algebra, then there is a natural bijection between the set $$\text{Hom}_R(R[x_1, ... x_n], S)$$
of $R$-algebra homomorphisms $R[x_1, ... x_n] \to S$ and the set $$S^n$$ of $n$-tuples of elements of $S$. This universal property fails if the polynomial ring is replaced by any quotient of it, since the values of $x_1, ... x_n$ will be constrained by any additional relations.
Solution 3:
Disclaimer: I am not an algebraist so I do not have a "professional" perspective on this.
Suppose (1) you are only concerned with polynomials as functions, and (2) you are only concerned with situations where the correspondence between lists of coefficients and functions is one-to-one. For specificity, let's say you only care about polynomials with real coefficients, as functions on $\mathbb{R}$.
[This is a huge restriction, as algebraists are not only concerned with polynomials as functions, and as your example shows, there are situations where the correspondence is not one-to-one. And for better or worse, definitions of algebraic objects as given at the university level tend to reflect the needs of future algebraists, not the general mathematical public. But let's ignore this.]
Given your perspective (1) and (2), in defining a polynomial, you have at least two choices:
One, to define a polynomial as a certain kind of function. In short, "A polynomial is a function from $\mathbb{R} \to \mathbb{R}$ whose rule can be expressed in a certain form." In more detail, you might say "a polynomial is a function $f: \mathbb{R} \to \mathbb{R}$ with the property that there is a nonnegative integer $n$ and a list of real numbers $c_0, \dots, c_n$ with the property that $f(x) = \sum_{j=0}^n c_j x^j$ holds for all real $x$."
Two, to specialize the algebraist's definition to the case of $\mathbb{R}$. (Define a polynomial "formally", point out how a formal polynomial induces a function, and then define "polynomial function" from that.)
Granting (1) and (2) above, these are both perfectly fine definitions. The first, in particular, is given in every high school level book, and every university level calculus book, that I have ever seen.
The second has obvious drawbacks--- if you only care about polynomials as functions--- because what a polynomial "is" has been divorced from its corresponding function. One must prove, for example, that if two polynomials induce the same function, they are the same polynomial. Why be bothered?
Given (1) and (2), this is a valid point and I cannot really argue against it. But I can argue that the first definition also has drawbacks. To the extent that it seems easier or more "natural", I'd argue it is because we generally meet it first, and therefore have a much longer time to get used to it. (If you ever teach beginning algebra, you will be cured of the idea that any definition of polynomial is "natural" to most people.)
Consider the notion of the degree of a polynomial, which hopefully you agree is a useful notion. How to define the degree? Well, if $f$ is given by a formula $f(x) = \sum_{j=0}^n c_j x^j$, and $c_n$ is nonzero, then it's this number $n$ appearing in the formula.
... but is why is this well defined? How do we know that $f(x)$ can't be given by two different expressions, $\sum_{j=0}^n c_j x^j$, and $\sum_{k=0}^m d_k x^k$, with $c_n \neq 0$ and $d_m \neq 0$ and $n \neq m$? We can get around this in at least two ways:
We can say that the degree of $f$ is the minimal possible value of $n$, over all conceivable formulas $x \mapsto \sum_{j=0}^n c_j x^j$ that could represent the rule of the function $f$. This makes the definition of the degree very easy, but does not make degrees easy to compute. (By this definition, we know that $x^5 + 3x + 1$ certainly has a well defined degree, and that it is at most $5$. But to know that it is equal to $5$ we need to know that no expression $\sum_{j=0}^4 c_j x^j$ can give rise to the same function. How do we know that?) This feeds into the second way of defining the degree:
We can prove that if the rule of $f$ can be given by a formula $\sum_{j=0}^n c_j x^j$, with $c_n$ nonzero, then $n$ is uniquely determined, so it makes sense to say the degree of $f$ is that value of $n$.
Even if you don't care about the degree, you will run into this issue. Roughly speaking, whenever you want to conclude that two polynomials aren't the same because they don't have the same list of coefficients--- and not by explicitly exhibiting a value of $x$ for which they don't evaluate the same--- you run into this technicality.
I don't mean to imply that it's a particularly arduous issue to deal with. It's possible to give short proofs that the degree is well-defined. My point is only that it requires proof. High school textbooks (and university-level calculus books) generally circumvent this issue by ignoring the fact that it requires proof. Lots of things become more "natural" if you treat them this way.
I'm not saying we should all be "for" the second definition or "against" the first. But I'd argue that however you deal with the actual issues involved in defining polynomials, even for polynomial functions $\mathbb{R} \to \mathbb{R}$, you are dealing with the "list of coefficients" vs "function" distinction whether you want to or not. It is not something you avoid with the first definition of "polynomial", and introduce arbitrarily with the second. It is always there. So it's no mystery that some people prefer to build it into the definition. That is how I think about it anyway.