You've got your head round Basic Category Theory: why look at monads next?
Solution 1:
I found the notion of a monad to be highly non-intuitive as well, until I learned about Lawvere theories.
A Lawvere theory is a small category such that each object is isomorphic to a product $x^n$ for some $n \in \mathbb N$ and a single object $x$ being given. We call such an $x$ a generic object. Now a Lawvere theory is uniquely specified by defining the sets $hom(x^n,x)$ (since a morphism into $x^m$ is the same as $m$ morphisms into $x$). You should think of an element in $hom(x^n,x)$ as an abstract $n$-ary operation.
An $\textit{algebra}$ for a Lawvere theory $L$ is a product preserving functor to Sets, $F : L \rightarrow \text{Sets}$. A morphism between $L$-algebras is a natural transformation between them.
Let me explain that a bit further: Such a functor is uniquely determined on objects by where it maps the generic object $x$, since $F(x^n) \cong F(x)^n$. It is hence the same thing as a set equipped with sets of $n$-ary operations, with equational rules between them given by the Lawvere theory $L$.
Examples
- Take the opposite category of the category of finite sets, $\text{FinSet}^{op}$. Since every finite set is a finite coproduct of sets with a single element, this is a Lawvere theory. A product preserving functor is uniquely determined by the set $F(\left\{\star\right\})$, with $\left\{\star\right\}$ being a set with a single element. Hence the category of $\text{FinSet}^{op}$ algebras is the same as $\text{Set}$.
- Define $hom(x^n,x)$ to be $R[X_1,...,X_n]$ for a commutative ring $R$. The category of algebras for this Lawvere theory is equivalent to the category of $R$-algebras.
- Take your favorite definition of a group. The opposite category of the category of finitely generated free groups with group homorphisms between them is a Lawvere theory. The resulting category of algebras is equivalent to the category of groups.
- In general any algebraic variety (in the universal algebra sense) corresponds to a Lawvere theory (I won't go into the details here).
Now, given a Lawvere theory $L$, the representable functors $hom(x^n,-)$ preserve products, hence are $L$-algebras. They are called free objects on $n$ generators and represent exactly that in each of the above examples. By the Yoneda embedding, the free algebras form a full subcategory equivalent to $L^{op}$.
With the above in mind we get a functor $\text{FinSet} \rightarrow \text{Sets}$ by mapping an $n$-element set to $hom(x^n,x)$ (Something to think about: What is a natural way to define its action on morphisms?). By using coends, we get a functor $M : \text{Sets} \rightarrow \text{Sets}$ [intuitively: Write each set as a directed colimit of finite sets and extend by preserving colimits]. This is a monad. Now there is something like the reverse: Finitary monads correspond uniquely to Lawvere theories. As seen in the group example, knowing the free groups (and their homomorphisms) is enough to reconstruct the entire category.
One main point about this is that it is easy to look at algebras for a Lawvere theory in any other category. You can check that the standard definition of a group object in a category $C$ with finite products is the same as saying that a group object is a product preserving functor $L_{Groups} \rightarrow C$. And there are generalisations for monoidal categories with lax monoidal functors, etc, but again I won't go in detail here.
The other is that using Lawvere theories frees you from definitions of algebraic objects using $\textit{specific}$ operations. The classical definition of a group involves a $0$-ary, a unary and a binary operation, but nothing particular is special about them other than that they make certain computations and proofs about groups easy and others hard. The situation is a bit like manifolds, where coordinate charts are used to define and sometimes compute things but at the end of the day they aren't intrinsic and it doesn't really matter which one you used. In the same way representations in terms of a set of operations and equations between them are still useful but not fundamental.
Monads are a bit more general, but they have the same general features. Another related concept is that of operads.
If you want to read more about all of this, I recommend Qiaochu's blog post and the many resources in this overflow question.
Solution 2:
There is one standpoint that says that in category theory, of course it isn't categories that are important, but functors between them. Once we begin to have functors between categories, we start to want to compose them, and get chains of functors. Now being categorists, it would be nice if we had a setting where we could "internalize" these chains of functors as plain morphisms in a single category. Monads provide a construction which does so. This all somewhat relates to the bar construction (https://golem.ph.utexas.edu/category/2007/05/on_the_bar_construction.html).
Relatedly, we can view an adjunction as of course a very simple chain of functors of a very special form. And of course adjoints give rise to monads. So we can view the theory of monads as describing the induced structure in a single category as arising from adjoint functors on it. With this in hand we can say "give me a category, and an adjunction to another category, and I will tell you something about the structure of objects and morphisms internal to that first category, with no reference at all to any other category." This becomes particularly striking in the common case of adjoint triples, where, having recognized this structure we immediately get an adjoint monad/comonad pair on both categories involved. In the case where the adjoint triple arises from a "substitution" functor, then this provides a characterization of universal and existential quantification.