Minimal definition of the derivative
Solution 1:
This does appear to be a valid characterisation of a derivative. Let me try and prove that if $f$ is smooth (say, $C^\infty$, to avoid unnecessary complications), then $D(f)$ is really the derivative of $f$, at least if we additionally assume that $D(c) \equiv 0$ for $c$ - constant map.
First, I claim that we have the product rule, at least in the following form: if $f: V \to \mathbb{R}$ and $g : V \to W$ then $D(fg) = D(f) g + f D(g)$. For this, let $A : V \to V \times V$ be the diagonal map $x \mapsto (x,x)$, let $F(x,x) = (f(x), g(x))$ and finally let $B(t,w) = tw$ for $(t,w) \in \mathbb{R} \times W$. Then, $fg(x)$ can be written as $B \circ F \circ A$. It follows that: $$ D(fg)_x = (DB)_{(f(x),g(x))} \circ (DF)_{(x,x)}\circ (DA)_x$$ Looking at compositions of $F$ with projections, it follows that $(DF)_{(x,y)}(u,v) = ((Df)_x(u),(Dg)_y(v))$ (or so it seems to me). From previous axioms we have $(DA)_x = A$, and $(DB)_{(f(x),g(x))} (u,v) = f(x) v + u g(x)$. Combining this all together we conclude that: $$ D(fg)_x(v) = f(x) (Dg)_x(v) + (Df)_x(v) g(x)$$ Hence, the usual multiplication rule holds.
Secondly, I claim that if $g: V \to W$ is a map such that $(D_0g)_x = 0$ and $g(0) = 0$, where $D_0$ is the usual derivative, and $x$ is some point in $V$, then $(Dg)_x = 0$. Indeed, if the derivative of $g$ vanishes at $0$, we can write $g$ as $g(x) = f\tilde{g}(x)$ where: $f:\ V \to \mathbb{R}$ is a map with $f(x) = 0$, and $\tilde{g}: V \to W$ is a map with $\tilde{g}(x) = 0$. (At this point, we are using Taylor expansion, and the assumption that $g$ is $C^2$ is needed). From the product rule, it follows that: $$ D(g)_x = D(f)_x \tilde{g}(x) + f(x) D(\tilde{g})_x = 0$$.
Finally, we strike the killing blow. Let $g: V \to W$ be any map, and $x$ be a fixed point. We can write $g$ in the form $g(y) = g(x) + (D_0g)_x(y) + h(y)$, where $(D_0h)_x = 0$ and $h(0) = 0$. Now, the previous considerations show that $D_x(h) = 0$, so by additivity (and known behaviour of $D$ on linear and constant functions) we have $(Dg)_x = (D_0g)_x$. Because $x$ was arbitrary, and so was $g$, it follows that $D $ is just the usual derivative.
As for the other possible axioms, I think you could assume the product rule, and work from there. In particular, you can drop the assumption about bilinear maps then. I don't think the other assumptions can be significantly weakened. Without assumption of linearity, it doesn't look like a natural question (at least to me). I also think that without the assumption on what happens for linear operators, $D$ given by something like $Df = P^{-1} (D_0f) P$ for some linear $P$ would work.
A mathematical object that might be relevant is the connection, which is analogous to the derivative but defined more abstractly (by listing required properties, just as in the question) and existing on differential manifolds. Because there are normally a lot of connections on a manifold, this should give some lower bound on how much has to be assumed.