Derivative of a Delta function
Solution 1:
With suitable interpretation (!), it is perfectly ok to compute as $$ {d\over dx}\int_{-\infty}^\infty e^{itx}\;dt \;=\; \int_{-\infty}^\infty {d\over dx}e^{itx}\;dt \;=\; \int_{-\infty}^\infty it\cdot e^{itx}\;dt $$ (Faux-secretly, the integral is the Fourier transform being taken as tempered _distribution_, and not as limit of Riemann sums...) In this example, one must be prepared to recognize the outcome (which, as numerical integral diverges, of course) as simply the derivative of Dirac delta, just as the original was Dirac delta itself.
While it is true that $\delta$ does not have a pointwise value at $0$, it certainly does have pointwise values away from $0$. For that matter, before people decided to formalize "function" as something that should have pointwise values, Euler and many others often treated "function" as sometimes meaning "expression". Meanwhile, an $L^2$ function doesn't really have pointwise values, so in some sense is worse off than $\delta$.
Fourier inversion on $L^2(\mathbb R)$ would seem to involve integrals that needn't converge, but with Plancherel's theorem in hand, we continue to write those integrals, but disclaim that the notation must be understood as meaning the extension-by-continuity from a smaller space.
I think much of the point of distribution theory is to be able to treat distributions not merely as functionals on classical functions, but as generalized functions, permitting the same operations, if extended suitably.
Operations such as dilation and translation are more easily notated by using "argument" notation $x\to \delta(x-y)$, although, yes, it is risky to too hastily assume that generalized functions share all the properties of classical ones.
Further, the first round of distribution theory is not the end of the story, even to make best legitimate use of $\delta$. Namely, a finer gradation of "distributions" (and classical functions) is very often useful, as measure of how far a generalized function is from being $L^2$ or continuous, etc. Thus, one can "predict" that a solution $u$ to an equation $u''+q(x)u=\delta$ (for smooth $q$, say) will be continuous.
I only recently learned that Dirac's original intuitive use of $\delta$ did actually resemble what would now be formalized as a "Gelfand triple" $H^{+1}\subset L^2\subset H^{-1}$ of Levi-Sobolev spaces on $\mathbb R$, where $\delta$ lies in $H^{-1}$ but not quite in $L^2$. Thus, general distribution theory made $\delta$ "legal", but in itself did not yet manage to account for Dirac's marvelous intuition.
(Yes, nowadays standard coursework teaches us to have a sometimes-too-narrow viewpoint, providing an excuse to merely dismiss good ideas as "unrigorous" rather than figuring out how to legitimize them. The utility of Heaviside's and Dirac's "non-rigorous" ideas out-weighed the difficulty of justification in everyone's eyes except mathematicians, perhaps. That is, if you seem to always be able to "get the right answer" (with physical corroboration), it's hard to take seriously an objection that one wasn't playing fair, all the more when "fair" is according to rules that were made up by someone.)
Solution 2:
First of all: forget everything you "know" about $\delta$. The following is nothing rigorous, after all it seems that you are not used to rigorous math, probably a physicist or so. I won't tell you the technical stuff, for this reason (if you insist, I will of course). By a function I mean a function defined on the whole real line, having values in the real line also.
Let me say from the outset that there is no function $\delta$ such that $$f(0)=\int_{\mathbf{R}} f(x)\delta(x)dx$$ for all functions $f$, the proof is not too difficult (it follows for instance by the fundamental lemma of the calculus of variations).
Nevertheless many physics textbooks define a function $\delta(x)$ by this property (or by something like $\delta(x)=0$ for $x\neq 0$ and $\delta(0)=\infty$, and then somehow argue that this function has the above property, which is false, the integral would be zero always, since it does not care about a single point, as $0$), albeit it doesn't even exist. The same goes for your "definition" (by means of this integral, which does not exist). I cannot tell why they do this, after all this is fiction.
The point is that the $\delta$ function is no "ordinary" function (defined on the real line), but a distribution. Now a distribution is a linear map $\varphi:X\rightarrow\mathbf{R}$ (also continuous, in some sense) defined on a space $X$, say, of so called test functions. (Linearity simply means $\varphi(\alpha f+\beta g)=\alpha\varphi(f)+\beta\varphi(g)$ for all real $\alpha$ and $\beta$, and all $f,g\in X$; continuity is somehow more delicate.) These test functions are "nice" in the sense that they are always taken to be infinitely differentiable and to have some decay condition. Two spaces of test functions are the following. (i) Schwartz space $S$ (the decay condition of which is essentially that the functions in $S$ - and all their derivatives - vanish faster than the inverse of any polynomial), for example $\exp(-x^2)\in S$, or (ii) the space $C^\infty_0$ of infinitely differentiable functions with compact support (compact support means that the functions vanish identically outside some bounded set). Notice $C^\infty_0\subset S$. One requires these functions to have the above properties (which are rather restrictive), because one wants to have as many distributions as possible. The "usual" distributions are those defined on $C^\infty_0$, and those defined on $S$ are said to be "tempered". Every tempered distribution is a usual distribution, but not conversely. The tempered distributions are useful, since one can define their fourier transforms, say.
Now we can define a distribution $\delta$ on a given space of test functions $X$ by $\delta(f)=f(0)$. $\delta$ thus acts on a test function $f$ by evaluating it at $0$. Now if for some distributions $D$ there is an ordinary function $d$ such that $D(f)=\int_\mathbf{R}d(x)f(x)dx$ (this integral must exists of course for all $f\in X$, for this reason, $C^\infty_0$ is much more convenient here), then $D$ is said to be regular. Now $\delta$ is known to be non regular (which is simply what I have written above).
Now we can define distributional derivatives. If $D$ is a distribution, we want to define another distribution $D'$, its distributional derivative. This done by declaring $D'$ by $(D')(f)=-D(f')$; more generally, the $n$-th distributional derivative $D^{(n)}$ of $D$ is defined by $(D^{(n)})(f)=(-1)^n D(f^{(n)})$. This is ok, since we assumed the test functions $f$ to be infinitely differentiable; it follows that distributions are infinitely differentiable (in another, in this sense). Notice the minus sign. This is because we want distributional derivatives to extend the ordinary derivative, notice that if $d$ is differentiable, $\int_\mathbf{R}d'(x)f(x)dx=-\int_\mathbf{R}d(x)f'(x)dx$ since the boundary term vanishes by the decay condition imposed on the test functions $f$.
So we may differentiate $\delta$ as follows: $(\delta')(f)=-\delta(f')=-f'(0)$.
Solution 3:
$\delta$ function is not strictly a function. If used as a normal function, it does not ensure you to get to consistent results. While mathematically rigorous $\delta$ function is usually not what physicists want. Physicists' $\delta$ function is a peak with very small width, small compared to other scales in the problem but not infinitely small. So what I do to such inconsistency of $\delta$ function is to fall back to a peak with finite width, say a Gaussian or Lorentzian, do the integrals and take the limit width $\to$ zero only at the last step.