Intuitively, how should I think of Measurable Functions?
Mesurable Functions by definition(from Stein and Shakarchi):
A function $f$ defined on a measurable subset $E$ of $\mathbb{R}^d$ is measurable, if for all $a\in \mathbb{R}$, the set $$f^{-1}([-\infty,a))=\{x\in E: f(x)<a\}$$ is measurable.
Now a set $E$ is called measurable if $m_*(E)=0.$
Intuitively, the definition doesn't make much sense to me and would appreciate it if someone can explain it to me. A bonus would be if you can give me some simple examples of measurable and non measurable functions? Thanks.
First, I don't know your definition of measurable sets.
Why do people define measurable function this way?
A non-mathematical reason
Laziness. Well, this is just an opinion, but I think that when you define measurable function like this, then you don't need to go into the trouble of explaining (or even understanding yourself) the definition.
A mathematical reason
Point 1. We talk about the probability of subsets of $\Omega$, not elements.
Let's take probability theory as our model of reference. If you have a finite set, $\Omega$, you can define a probability $\mu$ in $\Omega$ simply defining the probability of each element of $\Omega$. But when you have an uncountable set, this approach is not viable anymore. I will not go into details... I expect you to agree that for the "uniform probability in $[0,1]$, the probability of a set $\{x\}$ is $0$ for every $x \in [0,1]$.
The same reasoning applies if you are considering not probabilities, but the length of a set. It is true that if the interval $I$ is the disjoint sum of two other intervals $J$ and $K$, then the length of $I$ will be the sum of the lengths of $J$ and $K$. But $[0,1]$ is the disjoint union of the sets of the form $\{x\}$, whose length is $0$. Nevertheless, the length of $[0,1]$ is not $0$. For that reason, we do not talk about the size or the probability of points in $\Omega$. We talk about the probabilities or size of subsets of $\Omega$.
Point 2. We know the size of certain sets (think of the intervals).
Usually, we know the measure of certain subsets. For example, in the case of the unit interval $[0,1]$, one usually takes the size of an interval $[a,b]$ to be the value $b - a$.
Point 3. The sets for which we do have a probability defined is the family $\mathcal{B}$. Those are the "measurable" sets.
Based on the size of this simple sets, we can manage to EXTEND our measure to other sets. The next simpler case is when the set is the finite disjoint union of intervals. It happens that, given the constraints we want the measure to satisfy, not always it is possible to EXTEND the measure to the whole family of subsets of $\Omega$. So, we are happy to limit the domain of our measure $\mu$ to some class $\mathcal{B}$ of subsets of $\Omega$. We shall use the notation $(\Omega, \mathcal{B})$ to indicate that we are talking about the family $\mathcal{B}$ of subsets of $\Omega$. So, the measure is a function $$ \mu: \mathcal{B} \to [0,1]. $$
Point 4. A measurable function $f: \Omega \to X$ transports the probability in $(\Omega, \mathcal{B})$ to a probability $(X, \mathcal{F})$.
Now, suppose that you have a probability $\mu: \mathcal{B} \to [0,1]$ defined for a family of subsets of $\Omega$. And also, suppose that you have a function $f: \Omega \to \mathbb{R}$. Then, you may wish to TRANSPORT your probability from $\Omega$ to $\mathbb{R}$. For example, suppose that $\Omega = \{1,\dotsc,6\}$ is a dice, and you are gambling. If the value of the dice is odd then you get BRL 10, if it is even, then you lose BRL 10. This is the definition of $f: \Omega \to \{-10,10\}$. Now, instead of talking about a probability in $\Omega$, we can talk about the probability of, in one bet, getting or losing 10 Brazilian Reals. We transported the probability in $\Omega$ to a probability in $\mathbb{R}$. This is a measurable function! The probability of getting BRL 10 is the probability of the event $f^{-1}(10)$, and the probability of losing BRL 10 is the probability of $f^{-1}(-10)$. The probability of losing money is the probability of the set $f^{-1}((-\infty,0))$.
If you think that $f^{-1}$ is a function that takes subsets of $\mathbb{R}$ to subsets of $\Omega$, then you can TRY to compose $\mu$ with $f^{-1}$ to get $\mu \circ f^{-1}$. In order for this to work, if you want to know the probability of a set $A \subset \mathbb{R}$, you will need that $f^{-1}(A) \in \mathcal{B}$.
Point 5. We want $f^{-1}(I)$ to be measurable.
Finally, since we are talking about a function $f: \Omega \to \mathbb{R}$, it might happen that we want the probabilities to be defined at least for the intervals. That is, given an interval $I \subset \mathbb{R}$, we want $f^{-1}(I)$ to have a probability associated with it.
Point 6. We got to a definition of "measurable function" which is easier to state without appealing to measure theory.
But $f^{-1}(I)$ will be measurable for every interval $I$ exactly when $f^{-1}([-\infty,a))$ is measurable for every $a$.
Point 7. We can integrate measurable functions (and get the "expected value").
With a function $f$ like this, we can calculate the mean, that is, the integral of the function.
Now, I realise that you are not talking about probabilities, you are talking about analysis. But then, you just have to change the terms "probability" by measure. And for the same reason, technicalities aside, you can calculate the integral of measurable functions. It is just a bit harder to understand because now $\Omega = \mathbb{R}$.
Let us see if someone comes up with something better, but there's no easy intuitive sense in which measurability makes sense.
Depending on point of view, measurable functions are really wild, or really well-behaved.
The need for measurability of functions arises when defining Lebesgue integral. In its most simple form, consider a set $E\subset\mathbb{R}^d$ and its characteristic function $1_E$. Then, following on the notion that an integral is in spirit "sum of (value times volume)" we would want $$\tag{1} \int_{\mathbb{R}^d}\,1_E\,dm=m(E), $$ where $m$ is a measure (the Lebesgue measure in this case). The problem arises when one notices that "Vitali sets" exist, i.e. sets where the notion of measure makes no sense if it is going to be compatible with the usual notion of volume of a box, or of a ball.
In other words, it could happen that the set $E$ is not measurable, and the integral in ($1$) makes no sense.
Note that for such a set $E$, $$ 1_E^{-1}(0,\infty)=E, $$ and the function $1_E$ is not measurable.
Anyway, to make a long story short, to define Lebesgue integral of a function, you need that function to be measurable.
As for examples, any function that you can come up with some kind of formula will be measurable. Any pointwise limit of continuous functions is measurable, for instance, which gives you a huge family to start with. On the other hand, there are no "easy" examples of nonmeasurable functions: the $1_E$ I gave you above is the easiest probably, but it depends on constructing a nonmeasurable set, and these are not obvious either.
To make a long story short, the existence of non-measurable subsets of $\mathbb R$ (or equivalently non-measurable real functions) is a technical trade-off due to the highly non constructive axiom of choice. If you choose not to use the axiom of choice, then you can safely assume that every subset of $\mathbb R$ is measurable : the theory ZF + {all subsets are measurable} is relatively consistent with ZF + {there exists a inacessible cardinal}, and you can even add some weaker version of AC. In particular, if you can define a function without the axiom of choice, then it is automatically measurable.
If you look closely at the measure theory, you will realize that there is no outer characterization of measurable sets, there is only the inner construction : a set is measurable if you can construct it from the open sets of $\mathbb R$ and the set of rules for $\sigma$-algebras, definition which precisely avoids too much unconstructive subsets.