Why is the topological definition of continuous the way it is?

The "normal" definition goes like this:

It is claimed that, at fixed point, for any given ball $B_\epsilon$ of radius $\epsilon$ in the image, there exists a ball $B_\delta$, in the preimage, of radius $\delta$ such that $Im (B_\delta) \subset B_\epsilon$. This is the implication $$(...) < \delta \implies (...) < \epsilon $$

Very informally, you could compare the statement, for continuous $f$,

For any ball $B_\epsilon$ in the image, you can find a ball $B_\delta$ mapping into $B_\epsilon$

and

For any ball $B_\epsilon$ in the image, its preimage contains a ball $B_\delta$

and

The preimages of open sets are open.

In topological spaces, the last one is often taken as a definition.


Regarding your interpretation

IF $U \subseteq Y$ is open THEN $f^{−1}(U)$ is open

This is perfectly valid and translates as "IF you give me an $\epsilon$ THEN I can find you a corresponding $\delta$".


Regarding the implication, let me explain in this way, to show what happens with that implication:

Let $U \subset Y$ be open, then for this set you can have its preimage, $f^{-1}(U) \subset X$, which is the set that satisfies: $$x \in f^{-1}(U) \implies f(x) \in U $$ So now you can freely say:

For any open $U \subset Y$, there is a set $f^{-1}(U) \subset X.$

If is just so happens, that $f^{-1}(U)$ is open for any open $U$, then we call $f$ continuous. Translating, this means that if it just so happens that for any given radius $\epsilon$, can find a corresponding $\delta$ such that $$ x\in B_\delta \implies f(x) \in B_\epsilon, $$ then $f$ is continuous.


A few more details:

You have be rather careful when you state exactly what you mean with mapping "nearby points to nearby points".

Given a metric, we can always have balls as subsets of that space. The open sets are precisely those that, for each $x$, have some ball around them completely contained in the open set. This is true regardless of whether the open set is a union of open intervals, the whole space, a single interval, or any other open set.

To say that $f$ maps "nearby points to nearby points" means to say that, if you fix a point $x_0$, and look at what happens to points nearby $x_0$, they will all be mapped to points close to $f(x_0)$. The exact meaning of this is that: for each fixed $x\in f^{-1}(U)$, for any ball $B_\epsilon$ around $f(x)$ (and one exists, and satisies $B_\epsilon \subset U$, by openness), there is a ball $B_\delta$ around the point $x$ that maps into $B_\epsilon$. Since $B_\epsilon \subset U$, we have $B_\delta \subset f^{-1}(U) $, which by definition makes the preimage open. It's a ball around an arbitrary point completely in $f^{-1}(U) $.

Whatever open set you have, all of the points in there will be interior, so continuity (finding matching balls $B_\delta$ and $B_\epsilon$) works at each point at a time, so to speak. And now it almost rolls off the tongue: $$\forall x \ \forall \epsilon \ \exists \delta \ (...) $$

To me, it is somehow intuitively clear that if you want a statement about how some values of $f(x)$ behave, you would start with something about its target set. Maybe that's just me. You sort of start with the question "How close to $f(x_0)$ do you want the outputs of $f$ to be", which is a question about the target set.


The definition of continuity at a point $a$ for a function $f\colon A\to B$ (say between metric spaces) is: for all $\varepsilon >0$ there exists $\delta>0$ such that if $d(x,a)<\delta$, then $d(fx,fa)<\varepsilon$. Now, notice that the $\varepsilon$ is used for a condition in the codomain and the $\delta$ is used for a condition in the domain. So the order of quantification is: for all something in the codomain, there is a something in the domain such that blah blah blah. The topological definition of continuity reads: for all open in the codomain, the inverse image is open in the domain. This shows that in fact the variance in both definitions is the same: continuity of a from $f\colon A\to B$ means you can pull information back from $B$ to $A$. So, the contravariance in the definition of topological continuity is not anything you haven't seen in the metric definition already. You just always thought the metric definition is variant, but it was contravariant all the time. The topological formulation simply makes it unavoidable to notice.


I think in the translation, it might help to separate out the direct generalization of the notion of "continuity at a point" from the general topological arguments that this generalization being true at every point is equivalent to the condition on inverse images of open sets.

So, recall that for a map $f : X \to Y$ between metric spaces, and $x_0 \in X$, we have $f$ is continuous at $x_0$ if and only if: $$ \forall \epsilon > 0, \exists \delta > 0, \forall x \in X, d(x, x_0) < \delta \rightarrow d(f(x), f(x_0)) < \epsilon. $$ Now let us express what this condition is saying in terms of open balls: first, $d(f(x), f(x_0)) < \epsilon$ is equivalent to $f(x) \in B_\epsilon(f(x_0))$, which is further equivalent to $x \in f^{-1}(B_\epsilon(f(x_0)))$. On the other hand, $d(x, x_0) < \delta$ is equivalent to $x \in B_\delta(x_0)$. Therefore, $f$ is continuous at $x_0$ if and only if: $$ \forall \epsilon > 0, \exists \delta > 0, \forall x \in X, x \in B_\delta(x_0) \rightarrow x \in f^{-1}(B_\epsilon(f(x_0))). $$ Now, the $\forall x \in X$ part is equivalent to a subset condition, so $f$ is continuous at $x_0$ if and only if: $$ \forall \epsilon > 0, \exists \delta > 0, B_\delta(x_0) \subseteq f^{-1}(B_\epsilon(f(x_0))). $$ Now, note that the $\exists \delta > 0, \ldots$ part is precisely equivalent by definition to: "$f^{-1}(B_\epsilon(f(x_0)))$ is a neighborhood of $x_0$." Furthermore, the collection of $B_\epsilon(f(x_0))$ for $\epsilon > 0$ is precisely the neighborhood basis at $f(x_0)$ coming from the metric on $Y$. To summarize, we have seen that more or less directly:

$f$ is continuous at $x_0$ if and only if for all basic neighborhoods $N$ of $f(x_0)$, we have $f^{-1}(N)$ is a neighborhood of $x_0$.


Now, not all topological spaces in general will have a natural system of neighborhood bases, so usually the generalization of continuity at a point to general maps of topological spaces will look something like:

Definition: Let $f : X \to Y$ be a map between topological spaces, and $x_0 \in X$. Then $f$ is continuous at $x_0$ if and only if one of the following equivalent statements is true:

  1. For every neighborhood $N$ of $f(x_0)$, we have that $f^{-1}(N)$ is a neighborhood of $x_0$.
  2. For every open neighborhood $N$ of $f(x_0)$, we have that $f^{-1}(N)$ is a neighborhood of $x_0$.
  3. (In the presence of a given system of neighborhood bases on $Y$:) For every basic neighborhood $N$ of $f(x_0)$, we have that $f^{-1}(N)$ is a neighborhood of $x_0$.

(Of course, I think in practice, most textbooks will likely just choose one of these conditions as the definition - in my experience, usually either (1) or (2) - and then prove the equivalence to the other conditions as separate results.)

Also, we have the general topological fact: "For any subset $U \subseteq X$, $U$ is open if and only if $U$ is a neighborhood of all of its elements." Using this, it is easy to prove the first equivalence in the below revised definition of continuity:

Definition: Let $f : X \to Y$ be a map between topological spaces. Then $f$ is continuous if and only if one of the following equivalent statements is true:

  1. $f$ is continuous at every point of $X$.
  2. For every open subset $V \subseteq Y$, we have that $f^{-1}(V)\subseteq X$ is open.
  3. (In the presence of a given basis for the topology of $Y$:) For every basic open subset $V \subseteq Y$, we have that $f^{-1}(V) \subseteq X$ is open.

(Of course, again most textbooks will present (2) as the definition of continuity, and then prove equivalence to (1) and (3) as separate results.)


Now, according to the translation above, the $\epsilon$-$\delta$ definition of continuity is most closely related to (1) above, with the continuity at a point $x_0 \in X$ being expanded from (3). Looking more closely at the initial expansion, we see that the overall structure "if $V$ is a basic open neighborhood of $f(x_0)$ then $f^{-1}(V)$ is a neighborhood of $x_0$" expands to the $\forall \epsilon > 0, \exists \delta > 0, \ldots$ part. Whereas the part the question is about, the part $d(x, x_0) < \delta \rightarrow d(f(x), f(x_0)) < \epsilon$, is actually part of the expansion of "$f^{-1}(V)$ is a neighborhood of $x_0$."


The two definitions are equivalent to each other for metric spaces. To see that the first definition implies the second, let $\epsilon>0$ and $y=f(x)$. The open ball $B_\epsilon(y)$ is open in $Y$. Therefore $f^{(-1)}(B_\epsilon(y))$ must be open in $X$. Therefore, it contains the open ball $B_\delta(x)$ for small enough $\delta>0$. Since $B_\delta(x)\subset f^{(-1)}(B_\epsilon(y))$, we have found $\delta>0$ such that $c\in X, d(x,c)<\delta \implies d(f(x),f(c))<\epsilon$.

The reverse implication also uses an argument using open balls.


I would have expected the definition to be the other way round

I take you to be proposing this:

$f\colon X\to Y$ is continuous if $f(U)$ is open for every open $U\subseteq X$

But that does not serve. In particular, consider constant functions. Constant functions are among those that meet our expectations for continuity, and constant functions over metric spaces are in fact continuous by the metric-space definition of continuity. But if $f\colon X\to Y$ is a constant function and $V \subseteq X$ is nonempty then $f(V) = \{k\}$ for some $k \in Y$, and in many cases we care about, such singleton sets are closed, not open.

On the other hand, consider a constant function $f$ defined as above, and let $U\subseteq Y$ be open. The preimage $f^{-1}(U)$ of $U$ is either $\emptyset$ or $X$, which are both open by definition in every topology over $X$, so the definition you started with serves for this example.

On the third hand, consider $f\colon \mathbb R \to \mathbb R$ defined by $f(x) = -1$ if $x \lt 0$ and $f(x) = 1$ if $x \ge 0$. To demonstrate that it is discontinuous, choose, say, the open interval $\left(\frac{1}{2},\frac{3}{2}\right)$. The preimage of that open set is the closed set $\left[0,\infty\right)$.

More generally, the definition captures the idea of a point of discontinuity in the range of the function, and that should seem natural, because that's what you look for when visually inspecting the graph of a function for discontinuities.