How is the codomain for a function defined?
Solution 1:
Often what is important is not functions but collections of functions, especially subsets of the set of functions between two sets $X$ and $Y$, some of which will be surjective and some of which won't. What also doesn't get emphasized a lot at the level of elementary set theory is the compositional structure of functions, e.g. a function $f : X \to Y$ can be composed with a function $g : Y \to Z$ to give a function $fg : X \to Z$. For the purposes of studying this compositional structure (for example if $X = Y = Z$) it is generally important not to require that $f$ and $g$ be surjective or you will miss out on structure.
For example, a simple way to define a dynamical system is just as a function $f : X \to X$. Whether $f$ is surjective or not is an important aspect of classifying the dynamics of $f$, or in other words of classifying the behavior of the sequences $\{ x, f(x), f^2(x), f^3(x), ... \}$ for various $x$. This sequence is not well-defined if you pretend that the domain and codomain of $f$ are different just because the range and the codomain aren't equal.
This is another way of saying that a function really consists of three pieces of data (a domain, a codomain, and the mapping from one to the other), but the reason these three pieces of data are all important is really the compositional structure.
Solution 2:
This is something that trips people up the more they dwell on it. And depending on how you like to think about mathematical objects, you have different preferred answers.
More often than not, I tend to think of a function as consisting of three things: a domain $A$, a range $B$ and a subset $S$ of $A \times B$ which satisfies this condition, that for any $a \in A$ there is a unique $b \in B$ such that $(a,b) \in S$. To be precise, a "function" $f$ is a triple $(A,B,S)$ where $S \subset A\times B$ satisfies the above criterion. We say $f : A \to B$ and $f(a) = b$ provided $(a,b) \in S$.
In the above formalism, you can't describe a function without specifying the domain and range before hand. To be specific, here are two different ways of specifying the function that in a calculus class would just be called $x^2$: $f=(\mathbb R, \mathbb R, \{(x,x^2) : x \in \mathbb R\})$. $g=(\mathbb R, [0,\infty), \{(x,x^2) : x \in \mathbb R\})$. So in this formalism, $f$ is not an onto function, while $g$ is an onto function. Although $f(x)=g(x)$ for all $x$ in the domain of $f$ and $g$, $f \neq g$ since they have different ranges.
If you don't use a formalism like the above, yes, the range (or co-domain as you say) is just any arbitrary set containing the image. Frequently this is taken care of by considering the range to be a typical "universal" set. In a standard calculus course what happens is people never really mention the range, they only talk about the image.
Solution 3:
Here is an answer which directly addresses the question in the title: the codomain has to be given as part of the information telling you what the functions is; it can't be deduced (or in the language of question, "defined") if all you know is the domain and values of the function. It is an extra piece of data. (This is why it seems arbitrary to you; you are thinking about how to determine the codomain from the other data, which can't be done! You have to be told what it is as part of the initial description of the function.)
First note that this is incompatible with one traditional definition of a function as being a set of ordered pairs. The set of ordered pairs definition determines the domain and values of the function, but not the codomain. (I guess that some people do use this definition of function; for them, a function doesn't have a codomain separate from its image.)
To define a function which has a domain and a codomain, one should instead use the scheme described by Ryan: a function is a triple (domain $A$, codomain $B$, set of elements of $A\times B$ determining its values).
As to why we introduce the concept of codomain, Qiaochu's answer describes this.