Why should I "believe in" weak solutions to PDEs?
Solution 1:
First, you should not believe in anything in mathematics, in particular weak solutions of PDEs. They are sometimes a useful tool, as others have pointed out, but they are often not unique. For example, one needs an additional entropy condition to obtain uniqueness of weak solutions for scalar conservation laws, like Burger's equation. Also note that there are compactly supported weak solutions of the Euler equations, which is absurd (a fluid that starts at rest, no force is applied, and then it does something crazy and comes back to rest). They are a useful tool, connected to physics sometimes, but that is it.
In general, it is naive to ignore applications when studying or looking for motivations for theoretical objects in PDEs. Nearly all applications of PDEs are in physical sciences, engineering, materials science, image processing, computer vision, etc. These are the motivations for studying particular types of PDEs, and without these applications, there would be almost zero mathematical interest in many of the PDEs we study. For instance, why do we spend so much time studying parabolic and elliptic equations, instead of focusing effort on bizarre fourth order equations like $u_{xxxx}^\pi = u_y^2e^{u_z}$? (hint: there are physical applications of elliptic and parabolic equations). We study an extremely small sliver of all possible PDEs, and without a mind towards applications, there is no reason to study these PDEs instead of others.
You say you do not know anything about physics; well I would encourage you to learn about some physics and connections to PDEs (e.g., heat equation or wave equation) before learning about theoretical properties of PDEs, like weak solutions.
PDEs are only models of the physical phenomenon we care about. For example, consider conserved quantities. If $u(x,t)$ denotes the density (say heat content, or density of traffic along a highway) of some quantity along a line at position $x$ and time $t$, then if the quantity is truly conserved, it satisfies (trivially) a conservation law like $$\frac{d}{dt} \int_a^b u(x,t) \, dx = F(a,t) - F(b,t), \ \ \ \ \ (*)$$ where $F(x,t)$ denotes the flux of the density $u$, that is, the amount of heat/traffic/etc flowing to the right per unit time at position $x$ and time $t$. The equation simply says that the only way the amount of the substance in the interval $[a,b]$ can change is by the substance moving into the interval at $x=a$ or moving out at $x=b$.
The function $u$ need not be differentiable in order to satisfy the equation above. However, it is often more convenient to assume $u$ and $F$ are differentiable, set $b = a+h$ and send $h\to 0$ to obtain (formally) a differential equation $$\frac{\partial u}{\partial t} + \frac{\partial F}{\partial x} = 0. \ \ \ \ \ (+)$$ This is called a conservation law, and we can obtain a closed PDE by taking some physical modeling assumption on the flux $F$. For instance, in heat flow, Newton's law of cooling says $F=-k\frac{\partial u}{\partial x}$ (or for diffusion, Fick's law of diffusion is identical). For traffic flow, a common flux is $F(u)=u(1-u)$, which gives a scalar conservation law.
Whatever physical model you choose, you have to understand that (*) is the real equation you care about, and (+) is just a convenient way to write the equation. It would seem absurd to say that if one cannot find a classical solution of (+), then we should throw up our hands and admit defeat.
Most applications of PDEs, such as optimal control, differential games, fluid flow, etc., have a similar flavor. One writes down a function, like a value function in optimal control, and the function is in general just Lipschitz continuous. Then one wants to explore more properties of this function and finds that it satisfies a PDE (the Hamilton-Jacobi-Bellman equation), but since the function is not differentiable we look for a weak notion of solution (here, the viscosity solution) that makes our Lipschitz function the unique solution of the PDE. This point is that without a mind towards applications, one is shooting in the dark and you will not find elegant answers to such questions.
Solution 2:
Reason 1. Even if you actually care only about smooth solutions, it some cases it is much easier to first establish that a weak solution exists and separately show that the structure of the PDE actually enforces it to be smooth. Existence and regularity are handled separately and using different tools.
Reason 2. There are physical phenomena which are described by discontinuous solutions of PDEs, e.g. hydrodynamical shock waves.
Reason 3. Discontinuous solutions may be used as a convenient approximation for describing macroscopic physics neglecting some details of the microscopic theory. For example in electrodynamics one derives from the Maxwell equations that the electric field of an electric dipole behaves at large distances in a universal way, depending only on the dipole moment but not on the charge distributions. On distances comparable to the dipole size these microscopic details start to become important. If you don't care about these small distances you may work in the approximation in which dipole is a point-like object, with charge distribution given by a derivative of the delta distribution. Even though the actual charge distribution is given by a smooth function, it is more convenient to approximate it by a very singular object. One can still make sense of the Maxwell equations, and the results obtained this way turn out to be correct (provided that you understand the limitations of performed approximations).
Reason 4. It is desirable to have "nice" spaces in which you look for solutions. In functional analysis there are many features you might want a topological vector space to have, and among these one of the most important is completeness. Suppose you start with the space of smooth functions on, say, $[0,1]$ and equip it with a certain topology. In this case it is completely natural to pass to the completion. For many choices of the topology you will find that the completed space contains objects which are too singular to be considered as bona fide functions, e.g. measures or distributions. Just to give you an example of this phenomenon: if you are interested in computing integrals of smooth functions, you are eventually going to consider gadgets such as $L^p$ norms on $C^{\infty}[0,1]$. Once you complete, you get the famous $L^p$ spaces, whose elements are merely equivalence classes of functions modulo equality almost anywhere. Space of distributions on $[0,1]$ may be constructed very similarly: instead of $L^p$ norms you consider the seminorms $p_f$ given by $p_f(g)= \int_{0}^1 f(x) g(x) dx$ for $f,g \in C^{\infty}[0,1]$. If you can justify to yourself that it is interesting to look at this family of seminorms, then distibutions (and also weak solutions of PDEs) become an inevitable consequence.
Solution 3:
Let's have a look at the Dirichlet problem on some (say smoothly) bounded domain $\Omega$, i.e. $$ -\Delta u=f \text{ in } \Omega\\ u=0~ \text{ on } \partial \Omega $$ for $f \in \text{C}^0(\overline{\Omega})$. Then, Dirichlet's principle states a classical solution is a minimizer of an energy functional, namely $E(u):=\dfrac{1}{2}\int_\Omega \left|\nabla u\right|^2 \mathrm{d}x-\int_\Omega f u ~\mathrm{d}x$. (Here we need some boundary condition on $\Omega$ for the first integral to be finite).
So the question one may ask is, if I have some PDE why not just take corresponding the energy functional, minimize it in the right function space and obtain a solution of the PDE. So far so good. But the problem that may occur is finding this minimizer. It can be shown that such functionals are bounded by below, so we have some infimum. As also stated in the Wikipedia article, it was just assumed (e.g. by Riemann) that this infimum will always be attained, which shown by Weierstrass unfortunately not always is the case (see also this answer on MO).
Hence, we find differentiable functions which are "close" (in some sense) to a "solution" of the PDE, but no actual differentiable solution. I feel that this is quite unsatisfactory.
So have could we save this? We can multiply the PDE (take the Laplace equation for simplicity) with some test function and integrate by parts to obtain $$ \int_\Omega \nabla u \cdot \nabla v~\mathrm{d}x= \int_\Omega fv~\mathrm{d}x $$ for all test functions $v$. But from what space should $u$ come from? What do we need to make sense to the integral?
Well, $\nabla u \in \text{L}^2(\Omega)$ would be nice, because then the first integral is well-defined via Cauchy-Schwarz. But as shown by Weierstrass, classical derivatives are not enough, so we need some weaker sense. And here we got to Sobolev Spaces and looking again at the last formula, we see the weak formulation.
I am aware that this does not give a full explanation to why one should "believe" in weak solutions, Sobolev spaces and so on. What I stated above is a quick run through how in my course on PDE the step from classical to weak theory was motivated and at least I was quite happy about it.