There is a certain monad which might well be considered the "mother of all monads", in much the same way as the endomorphism ring of an abelian group is the "mother of all rings". Indeed, some people even call it the endomorphism monad (especially in operadic circles), but it is perhaps more commonly known as the codensity monad. It has the following universal property:

  • Let $\mathcal{C}$ be a locally small and complete category, let $\mathbb{T}$ be a monad on $\mathcal{C}$, and let $X : \mathcal{J} \to \mathcal{C}$ be a small diagram. Then the category of monad morphisms and monad transformations $\mathbb{T} \to \mathbb{E}\mathsf{nd}(X)$ is isomorphic to the category of factorisations of $X$ through the forgetful functor $U^\mathbb{T} : \mathcal{C}^\mathbb{T} \to \mathcal{C}$.

Now how might $\mathbb{E}\mathsf{nd}(X)$ be defined? The neatest way is by Kan extension: the underlying endofunctor of $\mathbb{E}\mathsf{nd}(X)$ is the right Kan extension of $X$ along $X$. Thus we have the following formula: $$\textrm{End}(X)(Y) = \int_{j : \mathcal{J}} (X j)^{\mathcal{C}(Y, X j)}$$ In particular, when $\mathcal{J}$ is the terminal category $\mathbb{1}$, $X$ just picks out an object of $\mathcal{C}$, and when $\mathcal{C} = \textbf{Set}$, the formula boils down to $$\textrm{End}(X)(Y) = X^{X^Y}$$ and so the double-dualisation monad is a special case of the endomorphism monad. This formula also shows what the algebraic theory corresponding to $\mathbb{E}\mathsf{nd}(X)$ is: it is the theory whose operations are functions $X^Y \to X$, and whose axioms are all the equations that these satisfy.


My favorite monad at the moment is the Giry monad. I am actually not sure if the definition I have in mind is the same as the definition in the literature, but here is one version: there is a monad on $\text{Set}$ which takes a set $X$ to the set of all probability distributions on $X$. Its algebras are an infinitary version of convex spaces, but more importantly, its Kleisli category is the category of sets and random functions (functions $f : X \to Y$ which return, not an element of $Y$, but a probability distribution over elements of $Y$). This is an abstract way of thinking about stochastic matrices.


The Powerset Monad

Let $T$ be the powerset endofunctor on the category $\mathsf{Set}.$ For any set $X,~TX=\mathscr P(X)$ is its powerset, and for any map of sets $f:X\to Y,~Tf:\mathscr P(X)\to\mathscr P(Y)$ sends $A\subset X$ to $Tf(A)=f(A)$. There are natural transformations $\eta:1_{\mathsf{Set}}\Rightarrow T$ and $\mu:T^2\Rightarrow T$ defined by $$\eta_X(x)=\lbrace x\rbrace\text{ and }\mu_X\Big(\lbrace A_i:i\in I\rbrace\Big)=\bigcup_{i\in I} A_i$$ where $X$ is any set, $x\in X$ any point in $X$, and $\lbrace A_i\mid i\in I\rbrace$ is any set of distinct subsets of $X$. This defines indeed a monad.

Its Algebras

A $T$-algebra is a set $X$ and a morphism of sets $\theta:TX\rightarrow X$ i.e. a map that takes a subset of $X$ and sends it to an element in $X$. It being an $T$-algebra means that it has to be compatible with $\mu$ and $\eta$ as follows: for any $x\in X$ and for any collection of subsets $\mathscr{A}=\lbrace A_i: i\in I\rbrace$ of $X$,

$$\theta(\eta_X(x))=\theta(\lbrace x\rbrace)=x \text{ and } \theta\big(\mu_X(\mathscr A)\big)=\theta\big(T\theta(\mathscr A)\big)$$ where the last equality means $\theta\Big(\bigcup_{i\in I}A_i\Big)=\theta\Big(\lbrace\theta(A_i):i\in I\rbrace\Big)$


Define a binary relation $\preceq$ on $X$ by $$a\preceq b\Longleftrightarrow \theta\big(\lbrace a,b\rbrace\big)=b$$ Then $(X,\preceq)$ is a complete join-semilattice, and for any subset $A\subset X,~\bigvee A=\theta(A)$.


For any $x\in X,~x\preceq x$ since $\theta(\lbrace x,x\rbrace)=\theta(\lbrace x\rbrace)=x$, so $\preceq$ is reflexive. Also, if $x\preceq y$ and $y\preceq x$, then by definition $x=\theta(\lbrace x,y\rbrace)=y$, so $\preceq$ is antisymmetric. Finally, if $x\preceq y$ and $y\preceq z$. Then we have $$\begin{array}{ccccc} \theta(\lbrace x,y,z\rbrace)&=&\theta(\mu_X(\lbrace \lbrace x,y\rbrace,\lbrace y,z\rbrace\rbrace))\\ &&\Vert&&\\ &&\theta(T\theta(\lbrace\lbrace x,y\rbrace,\lbrace y,z\rbrace\rbrace))&=\theta(\lbrace y,z\rbrace)&=&z\end{array}$$ thus $$\begin{array}{ccccc}\theta(\lbrace x,z\rbrace)&=&\theta(T\theta(\lbrace\lbrace x\rbrace,\lbrace y,z\rbrace\rbrace))\\ &&\Vert&&\\ &&\theta(\mu_X(\lbrace\lbrace x\rbrace,\lbrace y,z\rbrace\rbrace))&=&\theta(\lbrace x,y,z\rbrace)=z\end{array}$$ i.e. $x\preceq z$ and so $\preceq$ is transitive. This shows that $(X,\preceq)$ is a poset.


For any subset $A\subset X,~\theta(A)\in X$ is its join. Indeed, there is nothing to show if $A=\emptyset$, and otherwise, for any $a\in A,$ $$\begin{array}{ccccc} \theta(\lbrace a,\theta(A)\rbrace)&=&\theta( T\theta(\lbrace \lbrace a\rbrace,A\rbrace))&&\\ &&\Vert&&\\ &&\theta(\mu_X(\lbrace \lbrace a\rbrace,A\rbrace))&=&\theta(\lbrace a\rbrace\cup A)=\theta(A) \end{array}$$ so for any $a\in A,~a\preceq \theta(A)$. Now if $b\in X$ has $a\preceq b$ for all$a\in A$, then $$\begin{array}{c} \theta(\lbrace \theta(A),b\rbrace)&=&\theta(T\theta(\lbrace A,\lbrace b\rbrace\rbrace))&&\\ &&\Vert&&\\ &&\theta(\mu_X(\lbrace A,\lbrace b\rbrace\rbrace))&&\\ &&\Vert&&\\ &&\theta(A\cup\lbrace b\rbrace)&=&\theta(\bigcup_{a\in A}\lbrace a,b\rbrace)\\ &&&&\Vert\\ &&&&\theta(\mu_X(\lbrace \lbrace a,b\rbrace:a\in A\rbrace))\\ &&&&\Vert\\ &&&&\theta(T\theta(\lbrace \lbrace a,b\rbrace:a\in A\rbrace))\\ &&&&\Vert\\ &&&&\theta(\lbrace \underbrace{\theta(\lbrace a,b\rbrace)}_{=b}:a\in A\rbrace)&=& \theta(\lbrace b\rbrace)=b \end{array}$$ i.e. for any set $A$ and any upper bound $b,~\theta(A)\preceq b$ so $\theta(A)$ is $A$'s lower upperbound, thus $(X,\preceq)$ is a complete join -semilattice.


Conversely, any join-semilattice $(X,\leq)$ gives rise to a map $\bigvee$ from the powerset of $X$ to $X,$ $A\mapsto \bigvee A$, and it is easily seen that this is a $T$-algebra.


It's often helpful to understand a general category-theoretic idea by looking at how it specializes to posets. In this case, a monad on a poset is precisely a closure operator, hence examples are given by any topological space $X$ (where the poset is the poset of subsets of $X$ and the closure operator is taking closures). The algebras are precisely the closed elements of the poset. Note how the relationship to adjoint functors specializes to the relationship between Galois connections and closure operators.

Note also how non-poset examples can be thought of in terms of closure. For example, the List monad, whose algebras are monoids, can be thought of as the result of "closing a set under concatenation."


I'll describe the following three examples:

  1. The word monad; its algebras are monoids

  2. The reduced word monad; its algebras are groups

  3. The path monad; its algebras are small categories

Monoids. Let $T\colon \textrm{Set}\to \textrm{Set}$ be the functor taking a set $S$ to the set of words in $S$, i.e. $$ T(S)=\bigsqcup_{n=0}^{\infty}S^n $$ and taking a function $f\colon S\to S'$ to the function $T(S)\to T(S')$ which applies $f$ to each symbol appearing in a word. Let $\mu$ be the natural transformation from $T^2$ to $T$ which concatenates words, and let $\eta$ be the natural transformation from $T^0$ to $T$ which embeds sets as words of length $1$. Then $(T,\eta,\mu)$ is a monad in Set. Its Eilenberg-Moore category is the category of monoids.

Groups. I'm following the explicit construction of free groups described here. Given a set $S$ consider the disjoint union $S\sqcup S$ and let $i$ be the involution exchanging the two copies of $S$. The graph of $i$, as a subset of $(S\sqcup S)^2$, is the set of reducible words of length $2$, and its complement is the set of reduced words. These notions are extended to words on $S\sqcup S$ of arbitrarily length, and there is an operation on sequences of reduced words given by concatenation followed by reduction.

Let $T\colon \textrm{Set}\to \textrm{Set}$ be the functor taking a set $S$ to the set of reduced words on $S\sqcup S$ and sending a function $f\colon S\to S'$ to the operation which applies $f\sqcup f$ to each symbol then reduces the resulting word. Let $\mu$ be the natural transformation from $T^2$ to $T$ which takes a word whose symbols belong to $T(S)\sqcup T(S)$ and concatenates them after replacing symbols coming from the second factor with their involuted reversal, and finally reducing the resulting word. Let $\eta$ be as before. Then $(T,\eta,\mu)$ is a monad in Set whose Eilenberg-Moore category is the category of groups.

Small categories.

Let Quiv denote the category of quivers, i.e. directed multigraphs with multigraph homomorphisms. Let $T\colon\textrm{Quiv}\to\textrm{Quiv}$ be the functor taking a quiver to its path quiver, i.e. the quiver with the same vertex set and edges given by paths in the original quiver, and taking a morphism of quivers to the morphism of path quivers obtained from the original morphism by applying it to each of the edges along a path. Let $\mu$ be the natural transformation from $T^2$ to $T$ which concatenates paths of paths, and let $\eta$ be the natural transformation from $T^0$ to $T$ consisting of quiver morphisms which are the identity on the vertices and map edges to the corresponding paths of length $1$. Then $(T,\eta,\mu)$ is a monad in Quiv whose Eilenberg-Moore category is the category of small categories.


There is a notable omission from this list of examples: groupoids. I expect that there is a "reduced path" monad whose algebras are groupoids, but I did not work out the details and would be interested to see them.