Motivation and examples for ramification
Ok, this is certainly a non-trivially complicated question.
I chose to give some sort of geometric intuition. I'm not totally sure this is what you were after, but hopefully it's of some use to you!
Geometric prelude
So, the best place to start thinking about ramification, is in terms of maps of Riemann surfaces. While this may seem unrelated at first, bear through it, and I promise (hopefully!) it will make sense at the end.
So, a Riemann surface is something like a one-dimensional manifold, but instead of our space 'locally looking like an open subset of $\mathbb{R}$' it locally looks like an open subset of $\mathbb{C}$. Moreover, whereas 'locally looking like' in the context of real manifolds is meant in the smooth (i.e. $C^\infty$) sense, in the case of Riemann surfaces, it's meant in the holomorphic (i.e. analytic) sense.
Good examples of Riemann surfaces are things like the sphere $S^2$ for which at each point $p\in S^2$ we can think of the sphere as locally looking like $\mathbb{C}$. Moreover, as we change these charts (local ways of looking like $\mathbb{C}$), the coordinates they give vary in a holomorphic way. Another example would be the torus $\mathbb{C}/\Lambda$ (for $\Lambda\subseteq\mathbb{C}$ a lattice).
Now, if we have two Riemann surfaces $X$ and $Y$, we can look at holomorphic maps $f:X\to Y$. Given two points $x\in X$ and $y\in Y$ we can choose charts around $x$ and $y$, so homeomorphisms $U\to X$ (whose image contains $x$), and $V\to Y$ (whose image contains $y$), where $U,V\subseteq \mathbb{C}$ are open. We can then use these to think (locally around $x$/$y$) of $f$ as being just a holomorphic map $f:U\to V$.
Now, it's a common fact from the theory of one complex variable (sometimes called the 'local normal form theorem'), that says that there exists a unique integer $n$ such that for sufficiently small neighborhoods $W$ of $x$, the mapping $f:W\to V$ looks like (i.e. up to pre/post composition with a biholomorphism is) the mapping $z\mapsto z^n$ from the disc $D(0,1)$ to $\mathbb{C}$;
Now, let's think about what the $n^\text{th}$-power map $D(0,1)\to\mathbb{C}$ looks like. Well, if we write an element of $D(0,1)$ as $z=r e^{i\theta}$, for $r<1$ we see that the $n^\text{th}$-power map send this to $r^n e^{i n\theta}$. So, it shrinks the length of the vector $z$, and also multiples its angle by $n$. In particular, for $r\ne 0$, we see that a $w= r e^{i\theta n}\in D(0,1)$ has exactly $n$ preimage points: $r^{\frac{1}{n}}e^{i \omega}$, where $\displaystyle \omega=\frac{2\pi \theta}{m}$, for $m=1,\ldots,n$.
However, look what happens when $r=0$ (i.e. when $z=0$)--the $n^{\text{th}}$-power map does nothing. If you think about the previous paragraph as saying that the $n^{\text{th}}$-power map causes the disc to 'cover itself $n$ times', and each covering is a 'sheet', then all of these sheets meet at precisely one place--the fixed point $0$. So, if $n>1$, then somehow $0$ is this degenerate point which stops the $n^{\text{th}}$-power map from being precisely a $n$-sheeted covering of itself. It ties all the sheets together, and since it's in multiple sheets, 'counts' for more than just a point, it counts for how many sheets it lies in.
Now, let's bring this back to the original context of Riemann surfaces. This local analysis tells us that for our map of Riemann surfaces $f:X\to Y$, and our points $x\in X$ and $y\in Y$, the mapping $f$ locally (around the points) looks something like some $n$-sheeted covering, here $n$ is the same as the one given to us by the local normal form theorem. But, as we said above, it's not precisely an $n$-sheeted covering (if $n>1$) since the point $x$ (which corresponds to $0$ in the disc) doesn't get covered $n$-times, it just stays stationery--it lies in $n$-sheets (opposed to the other points around it, which lie in $1$-sheet). Let's call this integer $e_x$--it's how many 'sheets' that $x$ lies inside of.
Now, why is this integer something of importance to us. Well, first and foremost, it allows us to give a very simple answer to an (otherwise complicated) question:
How many points are in $f^{-1}(y)$, for $y\in Y$?
In other words, how many points are in the preimage of a point in $Y$?
Now, taken literally, this question is hard. The answer could really be anything, and greatly depends on which point $y\in Y$ that we take. But, if we're clever, and tweak the question every so slightly, we get a much more satisfying answer.
For example, let's consider the $n^\text{th}$-power map $g:D(0,1)\to D(0,1)$ again. If we take an arbitrary point $y\in D(0,1)$, we can ask how many points are in the preimage. Well, the answer breaks into two case:
$$\# g^{-1}(y)=\begin{cases}1 & \mbox{if}\quad y=0\\ n & \mbox{if}\quad y\ne 0\end{cases}$$
But, if we use our intuition that we should count the point $0\in D(0,1)$ not just as itself, but for the $n$ sheets it's contained in, we do get a constant answer of $n$.
Said differently, instead of asking for $\#f^{-1}(y)$, let's instead ask for
$$\sum_{x\in f^{-1}(y)}e_x$$
so, we're not just asking for how many points map to $y$, but to weight these points by how many sheets there in. This makes the answer a constant number. This is a theorem you have to prove, but it's not hard. This map $X\to \mathbb{Z}$ sending $x$ to this sum can be shown to be locally constant (think about the local normal form theorem!), and so if $X$ is connected, must, in fact, be constant.
So, it clearly is nicer, and fits better the 'global properties', to not count total preimages, but these weighted preimages based on the 'multiplicities' $e_x$ of points.
Thus, the natural thing then is to study the points with multiplicity greater than $1$: $x$ such that $e_x>1$. These are the points where multiple sheets come together.
An important thing to notice is thatese points are the points where $f:X\to Y$ is NOT an isomorphism on a neighbhorhood of $x$ (think about the local normal form theorem again!). Translating this into a form which will be more useful later, they are the points where $f$ doesn't send charts (around $x$) to charts (around $y$).
These points, these 'bad' but significant points, are called the ramification points of $f:X\to Y$.
Relation to number theory
Somewhat surprisingly, this is the motivating factor, and intuition, for the notion of ramified primes in algebraic number theory.
Just to make sure we're on the same page, let's briefly recall the setup of what ramified vs. unramified primes look like. For the sake of convenience, I'm going to assume that you're working over number fields.
So, we have an extension of number fields $L/K$. We then have associated to this an extension of Dedekind domains $\mathcal{O}_L/\mathcal{O}_K$. For any non-zero prime $\mathfrak{p}$ of $\mathcal{O}_K$, we know that $\mathcal{p}\mathcal{O}_L$ (by integrality), and so we can factor this ideal as:
$$\mathcal{p}\mathcal{O}_L=\mathfrak{P}_1^{e_1}\cdots\mathfrak{P}_m^{e_m}$$
We say that $\mathfrak{P}_i$ is ramified if $e_i>1$.
Now, how are we going to relate this to the geometric setting of maps between Riemann surfaces? Well, the go between is the subject called algebraic geometry, which allows us to think about the set of prime ideals of $\mathcal{O}_L$, and the set of prime ideals of $\mathcal{O}_K$, as geometric objects unto themselves.
It isn't important if you are familiar with algebraic geometry, as long as you're willing to take (on faith) two things. To each ring $R$ (such as $\mathcal{O}_K$ or $\mathcal{O}_L$) a geometric object $\text{Spec}(R)$, and to each ring map $f:R\to S$, an associated map of geometric objects (notice the switch in direction!) $\text{Spec}(S)\to\text{Spec}(R)$--on points, this map just takes a prime $\mathfrak{p}$ of $S$ to the prime $f^{-1}(\mathfrak{p})$ of $R$.
So, now to the inclusion of number rings $\mathcal{O}_K\hookrightarrow\mathcal{O}_L$, we get an induced map of geometric spaces $\text{Spec}(\mathcal{O}_L)\to \text{Spec}(\mathcal{O}_K$. So, now we're in a setup somewhat similar to the case of Riemann surfaces--a map between geometric objects. So, let's do what we did there.
Fix a point $\mathfrak{p}$ of $\text{Spec}(\mathcal{O}_K)$, and let's ask about the set of points in the preimage of $\mathfrak{p}$ under the map $\text{Spec}(\mathcal{O}_L)$. If you think about this for a minute, you'll see that the set of preimage points is EXACTLY the set $\{\mathfrak{P}_i\}$ of primes which divide $\mathfrak{p}\mathcal{O}_L$.
So, we can start asking about how many points are in the preimage of some random $\mathfrak{p}$. Just as before, the answer is hard to get at. There is no satisfactory/uniform way to answer this question as stated. And, moreover, just as before, this is because the 'true answer' comes from not counting literal points in the preimage, but counting them weighted by how many 'sheets' they lie in. As you may have already deduced, these points which are weighted greater than $1$ (the amount contributed just by them being a point), are precisely the primes over $\mathfrak{p}$ that ramify!
How can we connect this notion of ramified prime to the one above though--why are they intuitively connected? Well, let's first do what we did above, and zoom in on the point $\mathfrak{p}$ and one of its preimage points $\mathfrak{P}_i$. In algebraic geometry, this corresponds (see here for some intuition) to localizing our rings at the respective primes. Thus, we're looking at the mapping $(\mathcal{O}_K)_\mathfrak{p}\to (\mathcal{O}_L)_{\mathfrak{P}_i}$.
Now, before, once we shrunk the neighborhoods around the points of our Riemann surfaces small enough, we took charts, and then asked about questions relative to these charts. Now, the objects $\text{Spec}(\mathcal{O}_K)$ and $\text{Spec}(\mathcal{O}_L)$ are '$1$-dimensional' (if you know commutative algebra, this is just the fact that the rings $\mathcal{O}_L$ and $\mathcal{O}_K$ are one-dimensional). So, charts should be like 'one map', which locally dictates the behavior of the space. For us, a chart of $\text{Spec}(\mathcal{O}_K)$ at $\mathfrak{p}$ will correspond to a uniformizer $\pi$ of $(\mathcal{O}_K)_\mathfrak{p}$, and similarly, a chart of $\text{Spec}(\mathcal{O}_L)$ at $\mathfrak{P}_i$ will be a uniformizer $\varpi$ of $(\mathcal{O}_L)_{\mathfrak{P}_i}$ (while this may not mean anything to you, this is more than just an analogy--for smooth curves over $\mathbb{C}$, uniformizers literally become the charts of the analytifications).
So, we said above that what it mean for a point $x$ to be ramified, was that the map $f$ did NOT map charts at $x$ to charts at $y$. The same is true here, the chart $\pi$ at $\mathfrak{p}$ gets mapped to the object $\pi\in(\mathcal{O}_L)_{\mathfrak{P}_i}$. This element will be a chart (i.e. a uniformizer) if and only if $\mathfrak{p}$ is unramified! In fact, up to units, $\pi$ is $\varpi^{e_i}$, where $e_i$ is the same $e_i$ as above!
In fact, if you think about local rings of Riemann surfaces, that notion of ramification point is precisely the same as uniformizers not going to uniformizers. Also, the amount of sheets $e_x$ is precisely the 'power' of a uniformizer at $y$ that a uniformizer at $x$ is sent to. These situations are precisely analogous.
Thus, we should picture somehow that $\text{Spec}(\mathcal{O}_L)\to\text{Spec}(\mathcal{O}_K)$ is like a multi-sheeted covering, and, just as in the case of Riemann surfaces, the points of ramification are precisely the points where the sheets come together.
If the analogies are to hold, we'd like to also have an analogy, as alluded to above, of the fact that with our notion of ramification points, and keeping track of the number of sheets, that we can uniformly describe amount of points in the preimage. Not only do we have such a theorem, but it's one you are probably very familiar with:
$$\sum_{\mathfrak{P}\mid \mathfrak{p}}\text{ }e_\mathfrak{P} f_\mathfrak{P}=[L:K]$$
Indeed, we already noted that $\{\mathfrak{P}:\mathfrak{P}\mid \mathfrak{p}\}$ is precisely the preimage of $\mathfrak{p}$ under the map $\text{Spec}(\mathcal{O}_L)\to\text{Spec}(\mathcal{O}_K)$, and that the $e_\mathfrak{P}$ were the analogy of the multiplicity (or number of sheets) that showed up in the Riemann surface case. The only thing which is unexplained are the $f_\mathfrak{P}$, and those are a holdover of 'hidden points'. That's a whole nother story. See my answer here for a taste.
I should probably mention the things I didn't talk about. Totally ramified just means that there is one-preimage point where $[L:K]$ sheets come together. Tamely and wildly ramified is a more technical condition, which has less of a geometric flair (or, at least, not one as easily yielded by basic words in algebraic geometry).
As for the characters in Hecke's thesis, I would need you to be more specific about what type of characters they are. Are they grossencharacters, or characters of the (absolute) Galois group?