What's really going on behind calculus? [closed]
Firstly I'll go over the definition of the derivative, and why the derivative of $x^n$ is $n x^{n-1}$. Then I'll try to explain what the derivative is trying to capture, and why the definition makes sense.
For a function $f(x)$, its derivative is defined as being $$f'(x) = \lim_{h\to0} \frac{f(x+h)-f(x)}{h}$$ If you haven't met limits before (as I suspect an A-level student may not have), the idea is that the limit tries to capture what a function looks like near a point, in particular here what $\frac{f(x+h)-f(x)}{h}$ is like near $0$, even though at $0$ it's not defined.
For $f(x) = x^n$, we have
$$f'(x) = \lim_{h\to0} \frac{f(x+h)-f(x)}{h}$$ $$f'(x) = \lim_{h\to0} \frac{(x+h)^n-x^n}{h}$$ $$f'(x) = \lim_{h\to0} \frac{x^n + n h x^{n-1} + \frac{n(n-1)}{2} h^2 x^{n-1} + ... + h^n -x^n}{h}$$ $$f'(x) = \lim_{h\to0} (n x^{n-1} + \frac{n(n-1)}{2} h x^{n-1} + ... + h^{n-1})$$ Now you can see that each term except the first contains an $h$, so for really small $h$ they're $0$, but the first term is independent of $h$. This gives that $$f'(x) = n x^{n-1}$$.
Now let's talk about the way you should think about the derivative of a function. The derivative of a function tries to say how much the value of the function changes when you change the input by a tiny amount - and you'll consider the ratio of the change, in the same way as you'd consider a percentage change. In particular, you might want to know how much $x^2$ changes when you move from $x=2$ to $x=2.01$, for instance. Another way of thinking about this is the tangent line to the function at a point, so we might want to know what the slope of the tangent to $y=x^2$ is when $x=2$, and it doesn't take much thought to see why these are both the same idea. To look at the tangent line, it makes sense to instead consider the point $(2, 2^2)$ and another point really close to it, say $(2.01, 2.01^2)$, or $(2.0001, 2.0001^2)$, or $(2+h, (2+h)^2)$, where $h$ is really close to $0$ (possibly negative), and look at what the slope of the line connecting these two is for tiny $h$. This starts to look like the definition of the derivative, because the slope of the line connecting $(2+h, (2+h)^2)$ and $(2, 2^2)$ is $$\frac{(2+h)^2 - 2^2}{2+h-2} = \frac{(2+h)^2 - 2^2}{h}$$
and just like above, you can compute the limit of this expression to be $4$.
This easily generalises to any point $x$ (instead of $2$), and to any (differentiable) function (instead of $x^2$), to give a limit as defined above.
It's important to realise that the derivative is another function in itself, and the meaning of $f'(x) = 2x$ is that at the point $t$, the slope of the tangent line to $y=f(x)$ at $t$ is $2t$.
The derivative of any differentiable function could, in theory, be computed directly from the limit, but that's often tedious. So, we use tricks like linearity (sometimes called the sum rule), the product and quotient rules, and the chain rule. It might be instructive to try to prove linearity and the product rule yourself, directly from the definition of the limit.
Geometrically, the derivative is the slope of the best linear approximation of a function $f$ at a particular point $x_0$.
Recall that a linear function over the real numbers has the form $L(x) = ax+b$ for some constants $a$ and $b$. In this equation, $a$ is the slope of the line and $b$ is the $y$-intercept. If we were to choose the best line to approximate a function $f$ near the point $x_0$, then we'd probably want for them to both the pass through the point $(x_0,f(x_0))$. One way to ensure this is to write $L$ as $L(x) = a(x-x_0) + f(x_0)$. Then when $x=x_0$, we'd get $L(x_0) = f(x_0)$. Note that $L$ is still a linear function because $a(x-x_0) + f(x_0) = ax + (-ax_0+f(x_0))$ where $-ax_0 + f(x_0)$ is a constant.
Also if $L$ really is the best linear approximation to $f$ at $x_0$, then $L$ should probably have the same slope as $f$ at $x_0$. Here's the good thing, though: while the slope of $f$ may be constantly changing, $L$ is a line and so it has the same slope everywhere -- that slope being $a$. We know that $f'(x_0)$ should be the slope of $f$ at $x_0$ so we just plug that in for $a$. So the best linear approximation for $f$ near $x_0$ should be $$\bbox[5px,border:2px solid blue] {L(x) = f'(x_0)(x-x_0)+f(x_0)}$$
Now, here's how we use this. Imagine we could rewrite $f$ everywhere in some small interval around $x_0$ as $$f(x) = L(x) + r(x-x_0)$$ where $r$ is an error function that gives the error between $f$ and $L$ at each point. Note that at $x_0$, $r$ should just be $0$ because $L$ and $f$ pass through the same point. As it turns out, we can say something even stronger about $r$. The error function $r$ should go to $0$ faster than $x-x_0$ at $x_0$. I.e. $$\lim_{x\to x_0} \frac{r(x-x_0)}{x-x_0} = 0\tag{*}$$ In fact this can be used to uniquely define the derivative in this way:
If, for any $x$ near $x_0$, we can write the function $f$ as $f(x) = f(x_0) + B(x-x_0) + r(x-x_0)$ where $r$ satisfies $(*)$, then $B$ must equal $f'(x_0)$.
I provide a proof in this answer (but depending on your level of mathematical maturity you might just want to take it as a fact for now).
With all that out of the way, we can use this to figure out that the derivative of $f(x)=x^n$ at $x_0$ is $n{x_0}^{n-1}$. We just have to see if we can somehow write $f(x)$ in the form ${x_0}^n + B(x-x_0) + r(x-x_0)$ where $B$ is a constant and $r$ is some function that satisfies the limit condition in $(*)$. Then $B$ will be the derivative at $x_0$. To do so we'll use the binomial theorem and, just to make it a little easier, we'll rewrite $x$ as $x_0 + \Delta x$. Then we see that:
$$\begin{align}f(x) = x^n &= (x_0 + \Delta x)^n = {x_0}^n + n{x_0}^{n-1}\Delta x + \cdots + nx_0\Delta x^{n-1} + \Delta x^n \\ &= {x_0}^n + n{x_0}^{n-1}\left[(x_0 +\Delta x) - x_0\right] + \cdots + nx_0\left[(x_0 +\Delta x) - x_0\right]^{n-1} + \left[(x_0 +\Delta x) - x_0\right]^n \\ &= {x_0}^n + n{x_0}^{n-1}(x - x_0) + \cdots + nx_0(x - x_0)^{n-1} + (x - x_0)^n\end{align}$$
Notice that every term after $n{x_0}^{n-1}(x - x_0)$ has at least a power of $(x - x_0)^2$ in it. So these go to $0$ faster than $x - x_0$ (in the sense of $(*)$). So this is really saying $$f(x) = x^n = {x_0}^n + n{x_0}^{n-1}(x - x_0) + r(x-x_0)$$ where $\lim_{x\to x_0}\frac{r(x-x_0)}{x-x_0} = 0$. Thus $$\bbox[5px,border:2px solid red] {f'(x_0) = n{x_0}^{n-1}}$$
definition of a derivative
Plug in x^n to the formula, you get:
$$\frac{dy}{dx}x^{n}=\lim_{h\to 0}\frac{(x+h)^n-x^n}{h}$$
$$=\lim_{h\to 0}\frac{x^{n}+\binom{n}{1}hx^{n-1}+\binom{n}{2}h^{2}x^{n-2}+...+\binom{n}{n-1}h^{n-1}x^1+h^n-x^n}{h}$$ This simplifies to $$=\lim_{h\to 0}{\binom{n}{1}x^{n-1}+\binom{n}{2}hx^{n-2}+...+\binom{n}{n-1}h^{n-2}x^1+h^{n-1}}$$ Now you can plug in 0 for h and every term containing h dissapears leaving you with $$\frac{dy}{dx}x^{n}=\binom{n}{1}x^{n-1}$$ Since $$\binom{n}{1}=n$$ $$\frac{dy}{dx}x^{n}=nx^{n-1}$$
Let me try to answer you some of your questions.
- The derivate of a function is the slope of the tangent line to the graph of the function. Have a look here.
- To compute the derivate of $x\mapsto x^n$ you need to know that \begin{align*} \frac{x^n-x_0^n}{x-x_0}=\sum_{j=0}^{n-1}x^jx_0^{n-1-j} \end{align*} for $x\neq x_0$. (You can check the formula by multiplying both sides with $x-x_0$ and computing the right hand side.) Therefore you get \begin{align*} \lim\limits_{x\to x_0}\frac{x^n-x_0^n}{x-x_0} = \lim\limits_{x\to x_0}\sum_{j=0}^{n-1}x^jx_0^{n-1-j} = \sum_{j=0}^{n-1}x_0^jx_0^{n-1-j}=nx_0^{n-1}, \end{align*} which confirms the well known rule $(x^n)' =nx^{n-1}$.
- Since differentiating is linear, i.e. $(f+g)' =f' +g'$ and $(cf)'=cf'$ you can apply part 2 on every polynomial of the type $p(x)= \sum_{j=0}^n a_j x^j$ to get $$ p'(x) = \sum_{j=1}^n a_j j x^{j-1} .$$
- The definition of higher derivates is just a definition. If you like you can write $f^{(n)}$ for the $n$-th derivate aswell. An other common notation is $D^n f$.
- And what about for an unknown function? Lets call the function $f$. First we have to check if the function is differentiable (at a point $x_0$) or not, i.e. you have to check if the limit $$ \lim\limits_{x\to x_0}\frac{f(x) -f(x_0)}{x-x_0}$$ exists or not. If the limit exists we call him (by definition) $f'(x_0)$. A typical example of a function which is not in every point differentiable is $x\mapsto |x|$ since one can show that the limit $$ \lim\limits_{x\to 0}\frac{|x|}{x}$$ does not exist.
Concerning your question of why higher derivatives are denoted by $$\frac{d^ny}{dx^n}$$ There is a reason for this notation. Now $$\begin{align}\frac{d^2y}{dx^2} &= \lim_{h \to 0} \frac{y'(x + h) - y'(x)}{h}\\&=\lim_{h \to 0}\frac{\lim_{h_1 \to 0} \frac{y(x + h + h_1) - y(x + h)}{h_1} - \lim_{h_2 \to 0} \frac{y(x + h_2) - y(x)}{h_2}}h\\&=\lim_{h \to 0}\lim_{h_1 \to 0}\lim_{h_2 \to 0} \frac{ \frac{y(x + h + h_1) - y(x + h)}{h_1} - \frac{y(x + h_2) - y(x)}{h_2}}h\end{align}$$
Now assuming both the iterated and the combined limit exist, we can set the three values $h, h_1, h_2$ to be equal without changing the value. So $$\begin{align}\frac{d^2y}{dx^2} &=\lim_{h \to 0} \frac{ \frac{y(x + 2h) - y(x + h)}{h} - \frac{y(x + h) - y(x)}{h}}h\\&=\lim_{h \to 0} \frac{ y(x + 2h) - 2y(x + h) + y(x)}{h^2}\end{align}$$
The expression on the top is the "2nd difference". It is what you get from taking a difference of a difference, so it is the taking of differences that is squared in the short-hand notation: $d^2y$. But the denominator is just $h^2$, which is the square of a single difference in $x$. Thus $dx^2$ (which is intended to mean $(dx)^2$, not $d(x^2)$).