Why are continuous functions the "right" morphisms between topological spaces?
Solution 1:
I had the same question as you when I was studying topological spaces: in particular, it annoyed me that the definition didn't look like "preserves some structure" in the sense that I'd become familiar with in abstract algebra, e.g. preserving a group operation in the case of morphisms of groups. Here were my thoughts at the time. Here are two proposals I currently have for answers to this question.
Kuratowski
There is an alternative and equivalent axiomatization of topological spaces called the Kuratowski closure axioms. Here a topology on a space $X$ is described in terms of the operation $\text{cl}$ on the power set that sends a subset of $X$ to its closure in the topology, and continuity becomes "preserves the closure operator" in the sense that $f(\text{cl}(A)) \subseteq \text{cl}(f(A))$.
Vickers
General topology is actually a kind of logic. I don't know who this insight is due to, but see Vickers' Topology via Logic for much more on this theme. In particular, the open subsets of a topological space should be thought of as axiomatizing semidecidable properties: properties that you can confirm but not necessarily disconfirm, given limited tools (e.g. finite time and precision).
For example, you can confirm whether two things are less than $5$ inches apart by measuring the distance between them to finite precision and seeing if it's less than $5$, so an open ball of radius $5$ in a metric space describes a semidecidable property, but you can't confirm whether two things are less than or equal to $5$ inches apart by measuring the distance between them to finite precision because if you get $4.99 \pm 0.2$ inches you don't know whether that's over or under $5$.
Semidecidability can be used to justify all of the topological space axioms, which is a nice exercise. For example, arbitrary unions of open sets are open because given a method of confirming whether you're in each of those open sets, you get a method of confirming whether you're in any of them by running all of the methods simultaneously and waiting for one to finish. But you only get finite intersections when you try to do the same thing for waiting for all of the methods to finish because method $n$ might take $n$ seconds finish.
Continuous functions then axiomatize "computable functions": for $f$ to be continuous means that it should be possible to compute $f(x)$ "to arbitrary precision" by computing $x$ "to arbitrary precision," where going off of the example of metric spaces "to arbitrary precision" means "to within an arbitrary open set," since it's semidecidable whether $f(x)$ is contained within an open set. In other words, to locate $f(x)$ within some open set $U$, it suffices to locate $x$ within some open set $V$. After a moment's thought you'll see that this is precisely the condition that $f^{-1}(U) = V$.
(I particularly like this justification of topological spaces and continuity because, unlike the justification coming from thinking about metric spaces, it continues to apply to spaces that aren't Hausdorff, and in fact it tells you what it means for a space to not be Hausdorff. One equivalent definition of being Hausdorff is that the diagonal $\{ (x, x) \in X \times X \}$ is closed in $X \times X$. This is equivalent to "$x \neq y$" being semidecidable, so a space fails to be Hausdorff precisely when "$x \neq y$" fails to be semidecidable.)
Solution 2:
I am a litte late to answer this question which already has an answer, but I want to make more clear some points left uncleared for the following reader.
First, there are not two functions between $P(X)$ and $P(Y)$ but three. That is there are three functors $P_i$ from $\mathbf{Set}$ to $\mathbf{Pos}$, one contravariant and two covariant: \begin{align} P_1:\mathbf{Set} &\to \mathbf{Pos} & P_2:\mathbf{Set}^{op} &\to \mathbf{Pos} & P_3:\mathbf{Set} &\to \mathbf{Pos}\\ f&\to\exists f& f&\to I(f)& f&\to\forall f, \end{align} Where $\exists f(U)=f(U)$ is the image, $I(f)(U)=f^{-1}(U)$ is the inverse image and $\forall f(U)=(f(U^c))^c$ is something different. When you embed $\mathbf{Pos}$ in $\mathbf{Cat}$ then there is a nice adjoint relation between these 3 functors: $$\exists f\dashv I(f)\dashv\forall(f).$$ From this relation one would get the usual: the image commutes with unions and the inverse image commutes with unions and intersections.
Second point in this answer is that you look at morphisms between objects of the form $P(P(X))$, because they contain among other things the topologies on the set $X$. From this reasoning it seems that you would look at all possible group structures $G(X)$ on a set $X$ and for morphisms you would look for functions between $G(X)$ and $G(Y)$, that clearly associate to a group structure another group structure and do not even begin to "preserve" anything inside the group, since it sends structure to structure.
Now I'd like to answer the question in two ways, one is very satisfactory and the other is not.
The first and satisfactory one is that the definition of a topological space via the open sets is not the only feasible one. We can define a topological space in at least 8 different ways. I would like to look at two of them to see that continuous functions preserve some structure.
The first one is obviously that a topological space is characterized by its converging nets. If you do not know what a net is we can change the aforementioned statement to: "first-countable topological spaces are characterized by their converging sequence."
I'll list the axioms because they are not common to find in the literature. In the following we will use the notion of directed set $\Lambda$ that is a partially ordered set such that $$\forall a,b\in\Lambda\quad\exists c\in\Lambda\ s.t. a\leq c,\ b\leq c.$$ A net is but a function $\Lambda\to X$ from a directed set $\lambda$ to a set $X$ ($\Lambda$=$\mathbb{N}$ being the usual definition of a sequence.). The first difference arise in the definition of subnet, because we would need a function respecting the directed structure: Given two directed sets $\Lambda$ and $M$ a morphism of directed sets will be a functions $f:\Lambda\to M$ such that $\forall \mu\in M\ \exists \lambda\in\Lambda$ such that $\mu\leq f(\lambda)$.
1)For all $x\in X$ the constant nets $s:\Lambda\to X$, $s_\lambda=x$ for all $\lambda$ converge to $x$.
2)If $s:\Lambda\to X$ is a net converging to $x$ then all its subnets converge to x.
3)If $s:\Lambda\to X$ is a generic net (not a priori convergent) and all its subnets admint a subnet converging to $x$ then $s$ is converging to $x$.
4)If $s:\Lambda\to X$ is a net converging to $x$ and for all $s_\lambda$ I take a net $m^\lambda:M_\lambda\to X$ converging to $s_\lambda$ then there exists a diagonal net $t:\Lambda\to X$ defined by $t_\lambda=m^\lambda_{\mu_\lambda}$ converging to $x$.
From this point of view, given two sets $X$ and $Y$ and families $S(X)$ and $S(Y)$ of nets satisfying the above axioms we get that the continuous functions are those functions between $X$ and $Y$ sending converging nets to converging nets.
The second point of view is that of neighbourhoods. I won't state the axioms since wikipedia gives a definition: https://en.wikipedia.org/wiki/Neighbourhood_(mathematics)
A "little" informally, but not that much, we can say that a topological space is set with a directed set $N_x$ at each point $x\in X$ (saying that the elements of this directed set lives in a certain $P(X)$ is not really necessary for the same reason we do not use all the neighbourhoods but just basis of them.) So a continuous function between $X$ and $Y$ is a function between the sets and a morphism of directed sets at each point (when we look at the neighbourhoods as elements of a set $P(X)$ then this morphism of directed sets must be image function.).
Even in this case it is "more" evident how a continuous function preserves the defined structure on the space.
The less satisfactory way to address your old question is by looking at pointless topology. Someone pointed it out but didn't explain why. Formally the relation is the same to the question of morphism of schemes: you take the category of commutative rings and take its opposite. On commutative rings you have structure preserving morphisms and on scheme you have continuous functions and natural transformation of sheaves.
In pointless topology you define a frame to model your definition of the set of open sets: a frame is a partially ordered set with supremum for all subsets and infimum only for finite subsets, with the distributive property.
You define a function to be a morphism if it preserves the arbitrary sup and only finitary inf. The point is that now you take the opposite category to get what are called Locales which contain the category of topological spaces and continuous functions as a full subcategory.
To filter only those objects which are topological space you should consider injective morphisms to frames of the type $P(X)$ and then make the same construction.
It is not in any way satisfactory, but it is natural in some way.