Solution 1:

This is an interesting question as there are a few perspectives we can take. In this answer, I'll provide some insight in the stochastic and causal cases. Whilst I am unaware of a category that meets all four criteria, there do exist categories that meet at least two of them.

From a stochastic perspective, the category $\sf FinStoch$ is a good starting point because it is the category of Markov kernels between finite sets so Criterion 1. is met. However it is only mostly used in the discrete case; for instance, in information theory/Bayesian networks, it can be applied in the formulation of the discrete (Shannon) entropy and divergence criteria. Other than the PhD thesis by Fong (2012), the Master's thesis by Rischel (2020) is also quite helpful.

More recently, Shiebler (2021) discovered a connection between categorical models and maximum likelihood estimation, developing from Fong (2012). This used the co-Kleisli category $\sf CoKl$ under the co-monad $(A\otimes\_)$ where $A$ is an object in the Cartesian monoidal category $C$. In the special case of the real space $\Bbb R^n$, this is termed $\sf CEuc$. This answers to Criterion 1. and Criterion 2. as the setting is wholly continuous.

As a bonus to Criterion 1., the related category $\sf PEuc$ (allowing stochastic processes in more than one probability space) was shown to be extendable to a Markov category. This is shown in Proposition 5, where each object is equipped with the comultiplication map $cp$ and the counit map $dc$ such that \begin{align}cp:1\times\Bbb R^a\to\Bbb R^a\times\Bbb R^a\quad&\text{where}\quad cp(-,x_a)=(x_a,x_a)\\dc:1\times\Bbb R^a\to1\quad&\text{where}\quad dc(-,x_a)=-.\end{align}

To give a reasonably close answer to Criterion 3., we can link the areas of stochastic theory and metric spaces together. The Kantorovich monad comes into mind, and the details are provided in the PhD thesis by Perrone (2018). By equipping the category (derived from $\sf CMet$, the category of complete metric spaces) with both lax morphisms (convex maps) and oplax morphisms (concave maps), it is possible to construct a hexagon equation to show bimonoidality. This is given in Proposition 2.5.16. Note Criterion 2. is also satisfied.

I don't know of a useful category that obeys Criterion 4. and at least one other criterion. In terms of recursion though, it could be worthwhile to study the backpropagation functor through $\sf Learn$ as each set of morphisms in the learning algorithm may be transferrable to models involving Bayesian updating, given a well-defined initial state.

With this literature being discussed, I think the Markov extension of $\sf PEuc$ would be of most relevance to the question and hopefully Shiebler's work will be further developed in the near future.