Why limits work
Solution 1:
A function $f:\>x\mapsto f(x)$ given by some expression has a "natural" domain of definition $D(f)$: the set of all $x$ in the realm of discourse (${\mathbb R}$ or ${\mathbb C}$, say) for which $f(x)$ can be evaluated without asking questions. In most cases $f$ is continuous throughout $D(f)$, which means that for all $x_0\in D(f)$, when $x$ is sufficiently near $x_0$ then $f(x)$ is very near to $f(x_0)$.
Now some $f$'s may have "exceptional points" where they are not continuous, e.g., the sign-function, which is defined on all of ${\mathbb R}$, but is discontinuous at $0$. Above all, the set $D(f)$ may have "real" or "virtual" boundary points, where $f$ is a priori undefined. But nevertheless we have the feeling that $f$ has a "reasonable" behavior in the neighborhood of such a point. Examples are $x\mapsto{\sin x\over x}$ at $x=0$ (a "real" boundary point of $D(f)$), or $x\mapsto e^{-x}$ when $x\to\infty$ (here $\infty$ is a "virtual" boundary point of $D(f)$).
All in all the concept of "limit" is a tool to handle such "exceptional", or: "limiting", cases. An all-imortant example is of course the following: When $f$ is defined in a neighborhood of $x_0$ we are interested in the function $$m:\quad x\mapsto{f(x)-f(x_0)\over x-x_0}$$ which has an "exceptional" point at $x_0$. It is impossible to plug in $x:=x_0$ into the definition of $m$.
This brings me to your point 4. which gets to the heart of the matter. I'd rewrite the central sentence as follows: In the definition of the limit of $f(x)$ for $x\to c$ it says that I can make $f(x)$ as close to the value $L$ as I wish, as long as I'm willing to make $x$ sufficiently close to $c$. The idea is: While it is in most cases impossible to put $x:=c$ in the definition of $f$, we want to describe how $f$ behaves when $x$ is very close to $c$.
You then go on to say that "this definition is supposed to be mathematically rigorous, but using these as close and sufficiently close don't look rigorous to me".
The whole $\epsilon$-$\delta$ business serves exactly the purpose to make the colloquial handling of as close and sufficiently close that you are lamenting rigorous.
Life would be simpler if we could define $\lim_{x\to c}f(x)=L$ by the condition $|f(x)-L|\leq |x-c|$, or maybe $|f(x)-L|\leq 100|x-c|$. But four centuries of dealing with limits have taught us that the $\epsilon$-$\delta$ definition of limit, arrived at only around 1870 or so, captures our intuition about them in an optimal way. It takes care as well of the unforeseeable cases when the error $|f(x)-L|$ can be made as small as we we want, but we need an extra effort in the nearness of $x$ to $c$, e.g., $|x-c|<\epsilon^2$ instead of ${\epsilon\over100}$.
Solution 2:
For definiteness, let's start with the modern definition of limits. If $f$ is a real-valued function defined on some deleted neighborhood of the real number $c$, then we say "$\lim(f, c) = L$" if:
For every $\varepsilon > 0$, there exists a $\delta > 0$ such that if $0 < |x - c| < \delta$, then $|f(x) - L| < \varepsilon$.
Many people find it helpful to view this definition as a set of rules for an adversarial game. A function $f$, a location $c$, and a prospective limit $L$ are given as above. Player $\varepsilon$ issues a "challenge" in the form of a positive real number. To "meet" the challenge is to ensure that $|f(x) - L| < \varepsilon$ for all $x$ lying in some deleted neighborhood of $c$. The opponent, Player $\delta$, consequently attempts to issue a "response": to specify a positive real number $\delta$ such that every location $x \neq c$ with $|x - c| < \delta$ satisfies $|f(x) - L| < \varepsilon$.
To say "$\lim(f, c) = L$" is to say Player $\delta$ has a winning strategy against a perfect opponent; that is, Player $\delta$ can respond to an arbitrary challenge. This is precisely what is meant by saying, "We can make $f(x)$ as close to $L$ as we like by taking $x$ sufficiently close to $c$ (not equal to $c$)."
Limits are unique: If $f$ and $c$ are given, at most one number $L$ satisfies the preceding definition. Indeed, if $L_1$ and $L_2$ both satisfy the definition, then for every $\varepsilon > 0$, there exists a $\delta > 0$ such that if $0 < |x - c| < \delta$, then $$ |f(x) - L_1| < \varepsilon/2\quad\text{and}\quad |f(x) - L_2| < \varepsilon/2. $$ (This is a standard analytic idiom; pick a $\delta_1 > 0$ that "works" for $L_1$, pick a $\delta_2 > 0$ that "works" for $L_2$, and let $\delta = \min(\delta_1, \delta_2)$.)
Now pick an arbitrary $x$ with $0 < |x - c| < \delta$. By the triangle inequality, $$ |L_1 - L_2| = |L_1 - f(x) + f(x) - L_2| \leq |f(x) - L_1| + |f(x) - L_2| < \varepsilon/2 + \varepsilon/2 = \varepsilon. $$ But this inequality is a statement about two (fixed) real numbers, and if $|L_1 - L_2| < \varepsilon$ for every $\varepsilon > 0$, then $L_1 = L_2$.
Practically speaking, if we show that some limit is $2x$, then that same limit cannot also be $2x + \Delta x$ unless $\Delta x = 0$.
Here's how the formal definition works for computing the derivative of $g(x) = x^2$: Fix a real number $c$, define $$ f(h) = \frac{g(c + h) - g(c)}{h},\quad h \neq 0, $$ and put $L = 2c$. (The definition only allows us to "test" a prospective limit; to use the definition we must guess the limit in advance. Here we might notice that $f(h) = 2c + h$ for $h \neq 0$; if we wishfully set $h = 0$, we obtain our guess for $L$. As yet we've proven nothing; for all we know, this guess is incorrect.)
Now let's play the formal limit game: Player $\varepsilon$ issues a challenge. Player $\delta$'s goal is to find $\delta > 0$ such that if $h \neq 0$ and $|h| < \delta$, then $$ |f(h) - L| = \left|\frac{g(c + h) - g(c)}{h} - 2c\right| = \left|\frac{(c + h)^2 - c^2 - 2ch}{h}\right| = \left|\frac{h^2}{h}\right| = |h| < \varepsilon. $$ From this "scratch work"/strategizing, Player $\delta$ discovers they can win by responding with the challenge. That is, if $\varepsilon > 0$ is arbitrary, there exists a $\delta > 0$ (specifically, $\delta = \varepsilon$ in this example) such that if $0 < |h| < \delta$, then $$ |f(h) - L| = \cdots = |h| < \varepsilon $$ (because $\varepsilon = \delta$).
If you're dealing with an infinite series, the role of the function is played by a partial sum of the series, viewed as a function of the index: $$ s(n) = s_n = \sum_{k=1}^n a_k. $$ To say the series has sum $s$ is to say that for every $\varepsilon > 0$, there exists a positive integer $N$ such that if $n \geq N$, then $|s - s_n| < \varepsilon$. This definition is a challenge-response game of exactly the same type as the "real" limit game. The "function value" $f(x)$ becomes $s_n$, the prospective "limit" is $s$ (which, again, must be known in advance), the "challenge" is $\varepsilon > 0$, and the "response" is an $N$; the response "wins" if $n \geq N$ implies $|s - s_n| < \varepsilon$.
(The condition "$n \geq N$" replaces "$0 < |x - c| < \delta$"; very loosely, this condition asserts "$n$ is closer to $\infty$ than $N$ is". The condition $|s - s_n| < \varepsilon$ is the direct analogue of $|f(x) - L| < \varepsilon$.)
In case these remarks are helpful:
A value $f(x)$ may or may not be equal to the limit $L$. Similarly, a partial sum $s_n$ of an infinite series may or may not be equal to the sum $s$.
You cannot determine whether a series converges or diverges by looking at a bounded number of terms; you must look at arbitrary finite sums. This is analogous to the impossibility of determining a limit of a function by examining only finitely many points of the domain. (Calculus exercises that as you to "evaluate" a limit by plugging small numbers into a calculator are intuitively compelling, but logically without content. Existence and evaluation of $\lim(f, c)$ can never be rigorously determined by looking at values of $f$ at finitely many points.)
You've doubtless seen "convergence tests" for infinite series, and you may know that while $$ \sum_{k=1}^\infty \frac{1}{k^2}\quad\text{and}\quad \sum_{k=1}^\infty \frac{1}{k^3} $$ both converge (i.e., have finite sums), the sum of the first is $\pi^2/6$ while the sum of the second is "unknown" (at this writing). But recall, the definition of convergence requires that the limit be known in advance. The loophole is this: Consider the set of real numbers $S = \{s_n\}_{n=1}^\infty$, with $$ s_n = \sum_{k=1}^n \frac{1}{k^3}. $$ It's easy to prove that $S$ is non-empty and bounded above. It follows by the "completeness property" of the real number system that $S$ has a "supremum" or "least upper bound" $s$, the smallest real number that is greater than or equal to every number in $S$. It's also straightforward to prove $s$ is the sum of the series. When we say "$s$ is unknown", we simply mean there is no known formula for $s$ in terms of familiar numbers (such as $e$, $\pi$, or roots of rational numbers).
Solution 3:
I have read the answers and most of them are very high quality, but still I think I can contribute a little bit. I hope this is helpful.
1– I suppose you know the rigorous proof that the derivative of $x^2$ (that is, the limit of $((x+h)^2-x^2) / h$ as $h\to 0$) is $2x$.Suppose that the position of a particle in some reference frame is given by $x^2$ when exactly $x$ seconds have passed.
The reason why you can say that its velocity is $2x$ and not $2x+\Delta x$ is a matter of the definition of the word velocity. In theoretical physics the velocity is defined as the limit value just described when $h\to0$ and not as any value of the ratio for any positive value of $h$ whatsoever.
It is a bit paradoxical that you will never be able to really measure this velocity: even if you measure the position at two very very close moments and compute the ratio, you will always be getting that small $\Delta x$ as a perturbation in your calculation. The surprising part is that if you improve your measurement techniques and decrease the interval between measurements you will actually get something closer to $2x$ every time.
The fact that you are certain about this, that the values will get closer and closer to $2x$ when you approximate the ratio by smaller and smaller $\Delta x$ (which is what you prove mathematically with the $\epsilon-\delta$ definition) is what motivates you to define the "real", "instantaneous" velocity to be equal to $2x$. This velocity is not a ratio of any distance actually traveled in any time interval, but a limit value of these ratios.
Why that mathematical, abstract, definition works in practice is a totally different question, and I'm afraid no one has a complete answer for it. After all, it is a product of reason, not of observation. The physicist Eugene Wigner wrote a famous essay on these topics, called The unreasonable effectiveness of mathematics in the natural sciences. Some religious people like to think that god actually created the physical world with mathematical rules and laws which can be discovered by us, others like to accept that it just works in most cases (not quantum or relativistic scenarios) as practice shows and are happy with that, others just avoid the question.
Just a thought: isn't the assertion "the position of the particle at time $x$ is $x^2$" equally abstract? Even if you measure the position a billion times per second, and ignoring the measurement errors, you would be taking a leap of faith by believing that the position at the infinite number of seconds that you didn't measure obey the same rule. What is usually "best possible" in the natural sciences is to be sure of things up to a certain point and then assume this will hold always by inductive reasoning.
2– Again, it's not that you "can do this", it's that what you quote is the definition of an infinite sum. No one can count up to infinity, and no one can sum an infinite number of terms. The symbol $\sum_0^\infty a_n$ is nonesense until we agree to what we mean by it! If you think of that symbol as representing the limit value of the sequence of partial sums when it exists, you can assign a meaning to it: then you have a new toy to play with, and you can learn how to play with it.
What you want when defining something is that this definition captures desirable properties: that you can treat and operate with infinite sums just the way you treat and operate with a normal sum, and that it will work. As an engineer, you will probably also want it to be useful, to help you solve problems like differential equations, for example.
If you prove that this definition shares the same properties as the normal, finite, sum, then you can play with this new toy just the same way as with your old one. To make sure that this is the case, you prove the theorems that say that $\sum_0^\infty(a_n+b_n)=\sum_0^\infty a_n + \sum_0^\infty b_n \,$ , and so on. If you study and understand the proofs of these theorems, you will see where and how the definition is used, and you will be able to appreciate why it was created the way it was.
3– I think intellectual curiosity is a very important thing to have. You don't have to necessarily finish your search with your professors, you will probably find that different people you meet will be able to provide answers for different kinds of questions. In that sense, it is good that you are asking this here. It will never hurt if an engineer knows his mathematics well, has asked himself profound questions and has tried to answer them seriously. I think it will probably make you be a better professional (or a better human being) if you find your own path through learning.
4– This fourth question has been addressed by several people here, I may not have a better answer than many of them. I can only insist that the rigor in the as close as you wish comes in the $\epsilon-\delta$ definition of the limit and the fact that you can prove that the limit is a unique, well defined object.