Intuition and derivation of the geometric mean
Took some time to condense this answer and, in the process, make it more readable.
If I may join on the festivities, here are my two cents!
It seems to me that you have an issue with the general formalization of mathematical concepts/constructs and the expected intuition behind them (inferrred from KCd's comment). You're not the only one, this is generated by lousy school systems which portray many mathematical concepts like they were pulled out of someone's behind. And this bugs people, some turn to malicious attempts of disproving mathematics ( to a point of bitter hate) and some invest the time to investigate the backdrop and find out whether that's truly the case.
It most certainly is not. Let me address the problem in logical chunks relating to your questions. To understand the geometric mean, let us make sure you understand the arithmetic mean.
The Arithmetic Mean
Let's take a step back and take a look at the arithmetic mean, from which you should be able to understand the need for multiple means and how they can be formulated:
$$A = \frac{1}{k}\sum_{i=1}^{k}x_i$$
It is astonishing how many people use this, without having the slightest idea of what it is. Of what it represents. You often hear "Add all the elements together and divide by the number of elements - that's the average."
Ask that person what is this average, where does this relationship stem from - you'll probably get a wall of silence - or even worse - more explanations which use the thing they're actually trying to explain.
So, let's explore this organically, which means our logic will drive us into the wall before reaching the correct answer. Say we have an array of a few rather random (accidentally odd) numbers:
$$R = [3, 5, 11]$$
They could be anything. Money, unit-less values, grades in school - irrelevant. Now, the problem:
How do we find out a number $n$ which relates to all inputs as close as possible?
What does it mean to lie as close as possible? It means that the absolute difference between the input and a supposed value $n$ is as low as possible. Let's formalize such a statement by trying to estimate how $n=6$ relates to all of our values:
$$\Delta x_1 = | 3 - 6 |= 3 $$
$$\Delta x_2 = | 5 - 6 |= 1 $$
$$\Delta x_3 = | 11 - 6 |= 5 $$
$$\Delta[3,1,5]$$
Now, we can see the how far $n$ is from every input. We want minimal numbers here, as small as the inputs permit (because they modulate the final output). We could spend all day searching for the smallest possible $\Delta$-difference array which is given as a function of $n$, or we could do something clever. We want the a number whose distance from every input is as small as possible. Well, we could add up all the differences:
$$ d(n) = \sum_{i=1}^{k}( | x_i - n | )$$
Where $k$ is the number of inputs. In the previous example, that gives us a total "miss of $d(6) = 9$. We want $d(n)$ to be as small as possible which simply means to be maximally close to all of the inputs.
You could track the possible solution as the lowest point which the function $d(n)$ reaches, but as the function is both linear and limited to positive values means it doesn't have a continuous derivative at the point of reflection, which is exactly the point we're looking for.
Well, the only reason we used absolute values was to denote that we desire positive values, because negative distances don't make sense. What else can give us positive values, preserve order and relieve our problem of a non-continuous first derivative? Raise the equations to the power of $2$!
$$ d(n) = \sum_{i=1}^{k}(x_i - n )^2$$
Well, that was quick. What have we gained? We now have our distances squared and summed, but positive. And guess what, we now have a unique peak value:
$$\frac{d}{dn}d(n) = 0$$
So, let's first expand our $d(n)$ before deriving and extrapolating the $n$ at $d'(n) = 0$, here are the steps:
$$ d(n) = kn^2 - 2n\sum_{i=1}^{k}(x_i) + \sum_{i=1}^{k}(x_i^2) $$
Differentiating:
$$ \frac{d}{dn}d(n) = 2kn - 2\sum_{i=1}^{k}(x_i) $$
And the winner is:
$$ 2kn - 2\sum_{i=1}^{k}(x_i) = 0 $$
$$ n = \frac{2\sum_{i=1}^{k}(x_i)}{2k} $$
$$ n = \frac{\sum_{i=1}^{k}(x_i)}{k} $$
$$ n = \frac{1}{k}\sum_{i=1}^{k}(x_i) = A $$
Could it be? Oh, yes. That's the elusive, proper definition of an arithmetic mean. Nobody woke up one day and simply wrote out the final equation. It was an important issue to resolve, to find what is the value around which all inputs tend to be accumulating.
And that's the whole point of it, when you try to reduce the sum of accumulated distances by respecting all the inputs included, you will find that the value drops around the most populated areas of the number line. Also, when you have only two values, that's simply resolves down to the middle between two points on the numberline. The more inputs you have, the more precisely can you define the central tendency.
The Geometric Mean
We have greatly covered some of the background concerning the arithmetic mean which answers the question we posed as: "What single number best describes a set of numbers, a number which therefore lies as close as possible to all inputs?"
We interpreted it as the least squared sum of the errors (geometrically, a point on a number line which is as the bare minimum distance from all inputs, naturally moving to the highest concentration of inputs). The derived expression is as follows (from the previous discussion):
$$A = \frac{1}{k}\sum_{i=1}^{k}x_i$$
Verbosely, it adds up all the elements and divides by the number of elements. You have a total sum of all the inputs and you divide it in $k$ equal parts. The number that you get from the former definition is one which when added to itself $k$ times gives the original sum (which is simply inferred by the definition of basic arithmetic operations).
So basically, it replaces the varying elements with a constant term which preserves the sum. And this is the average by the very definition of the concept. If every day is the same, it's considered average. If every day is legendary then every day is, again, average, there is no oscillation. It's the same.
And this is facilitated simply by having a total sum of all the inputs and dividing that sum in $k$ equal parts. This is also why when you take the average between two points, it's the middle point. The total length is divided into two equal parts, defining a point on the number line directly inbetween separating the two partitions.
So, in simplest terms, arithmetic mean replaces all the inputs with one constant and requires it to add up to the original total sum. In contrast, geometric mean answers the question: "If all inputs were, again, the same value, what would be that value to multiply up to the original product." It is expressed as:
$$G = \sqrt[n]{\prod_{i=1}^{n}x_i} = (\prod_{i=1}^{n}x_i)^{\frac{1}{n}} $$
Looks wild? It's just simple algebra which drives a simple concept, as I hope you'll see further below.
I hope everyone now sees why I emphasized on understanding the arithmetic mean first, since the concepts are related in terms of goals, providing an average value which needs to be employed in the calculation that must satisfy some properties which are specific to its problem.
So, what is the problem of the geometric mean? Relative values. When each of the inputs or elements of the set is defined in terms of the previous. That is at the heart of percentages, stating how much of the total you have.
The percentage, $\%$ - that's a unit. It's defined as:
$$\% = \frac{1}{100}$$
$\%$ is a hundreth. Standing alone it means you've got one hundredth of the base unit $1$. Doesn't make much sense on it's own. It needs to be defined in terms of a definite initial value. Then you can say that you've got $20\%$ of it, which amounts to $20/100$ or $2/10$ or $1/5$ of the total value.
Imagine now a problem that involves relative growth, which is quite common. In the first year, you gain 30%, second - 40%, third - 50%. The naive approach would be to calculate the arithmetic mean to return the average change over the 3 years - returning a value of $40\%$.
But here you're making an assumption that for each year you use the original total multiplied by respectively $1.3$, $1.4$, $1.5$. Let's assume that the initial value is $v_0$:
$$\frac{1.3v_0 + 1.4v_0 + 1.5v_0}{3} = A$$
You see how the actual value is the same for all elements? What actually happens is this ($v_0$ is constant, we can factor it out):
$$v_0\frac{1.3 + 1.4+ 1.5}{3} = 1.4v = v_0 + 0.4v_0 $$
And this simply isn't right. Since we've expressed our values in terms of percentages of the initial value, the elements are dependent on one another, after the first year, the $v$ is no longer $v_o$. And this breaks down.
We need to understand the motivation for the geometric mean, like we did with the arithmetic mean. When considering the geometric mean, most get fooled by its name. To someone new to the concept, such a term might be intimidating.
So, we've talked about relative growth, expressing values as percentages which depend upon an undisclosed value (when evaluating the geometric term, we are agnostic of the value because we don't need to know it). If they indeed depend on one another and must be expressed in terms of the previous, we can express it as a product whose very definition lends itself to the problem:
$$v_0*1.3*1.4*1.5 = v_0x^n$$
We are, essentially looking for a value which, when multiplied by itself $3$ times (the number of percentages per respective years), gives the same final value as if we simply repeatedly multiplied each year with the respective $1.3$, $1.4$ and $1.5$. Now, since we don't care about the $v_0$ and we are trying to express the values in relative notions of percentages, we can just get rid of it:
$$1.3*1.4*1.5 = x^3$$
Let's be even more explicit:
$$1.3*1.4*1.5 = x*x*x$$
I hope this drives the notion home. One, average $x$ that does the same job as every other element on the left side. Now, we've defined the concept of a root as the number which when multiplied by itself $n$ times gives the value of which we're taking the root in the first place. So, take the $n$-th root of both sides, where in this case $n=3$:
$$\sqrt[3]{1.3 * 1.4 * 1.5} = \sqrt[3]{x^3}$$
What is the number which multiplied by itself three times gives $x^3$? Well, that's $x$, right? And that gives the following expression:
$$x = \sqrt[3]{1.3 * 1.4 * 1.5} = ~1.3976$$
That totals down to an average growth $39.76\%$ over the period of three years. See how it is just a little bit smaller than the arithmetic mean? That's actually quite a useful property called the arithmetic-geometric inequality which states that it is either less or equal to the arithmetic mean. The actual differences can be quite drastic, especially when dealing with ratios, relative values, growth etc.
And this simply generalizes to the first expression:
$$G = \sqrt[n]{x_1x_2 \ldots x_n}= \sqrt[n]{\prod_{i=1}^{n}x_i}$$
As the arithmetic mean protects the total sum, the geometric mean protects the total product (and solves our problem of dependent values). We mentioned that the arithmetic mean can be visualized as something that divides a number line in $n$ equal parts, which when added together give the original total.
EQUAL, you say?
Indeed. You could imagine having four sides of a rectangle (that means that the two values are the same). If you add them up, that is the perimeter of the rectangle. Now, divide it in four equal $a$ parts. Woah, what's that? Add them together to make sure we have the sum:
$$a + a + a + a = 4a = S$$
That seems awfully like the circumference of a square. And that's exactly what it is. It has the same circumference as the rectangle, but just one value is needed to describe the same thing.
Now, onto the geometric mean. If you had, as Day Late Don described, a rectangle with sides $a$ and $b$, their product is the area (as per agreement when the multiplication was defined, that's the geometric interpretation). So, which number when multiplied by itself $n = 2$ times gives the same area as the rectangle $ab$? That's right, $G = \sqrt{ab}$.
Do you see what we have here? G is the side of a square which when multiplied with itself gives the same area as the more complicated product of two different values. We have found a central value that does the job of all other values. As the arithmetic mean preserves the imaginary circumference, the geometric mean preserves the imaginary area. Respectively, that's protecting the sum. And protecting the area.
Amazing, huh? By trying to figure out a completely different problem without even contemplating the applications within geometry, we have acquired a new useful tool should we ever need it in our investigations.
Mr. Ayman Hourieh has answered the correspondence between the arithmetic mean and the geometric mean when we try to log both sides. The product reduces to a sum because of the nature of logarithms ($\log{(ab)} = \log{a} + \log{b}$. Should any questions arise, again, don't hesitate to ask.
Here is one motivation for the geometric mean: By taking the logarithm of both sides of the geometric mean definition, you'll find that the logarithm of the geometric mean is the mean of logarithms of values:
\begin{align*} G &= \left(\prod_{i=1}^{n}a_{i}\right)^\frac{1}{n} \\ \Rightarrow \log{G} &= \frac{1}{n}\sum_{i=1}^{n}\log{a_{i}} \end{align*}
When the numerical ranges of values are too large, you may want to use logarithms of values, and hence the geometric mean.
For example, let's say you're studying a value that grows exponentially over time (like human population, compound interest, etc). The geometric mean makes more sense when studying this value, as illustrated by this example in Wikipedia.
As for why the geometric mean works better for values that grow exponentially: When a value grows exponentially, its logarithm grows linearly. Therefore, linear approximation via the geometric mean works, because it's based on the logarithms of values.
Essentially, the arithmetic mean is the mean according to addition, and the geometric mean is the mean according to multiplication. What I mean by this is that, if we have some numbers $a_1,a_2,\ldots,a_n$ and we are interested in only addition, then we might seek to find some number $\mu$ such that $$a_1+a_2+\ldots+a_n=\underbrace{\mu+\mu+\ldots+\mu}_{\text{$n$ times}}$$ - that is, if we replaced every $a_i$ with $\mu$, we would get the same answer when we summed all of them. To imagine this, consider if we had a bunch of glasses of water. Transferring water from one to another doesn't change the overall sum - so if $a_i$ were the amount of water in in each glass to start with, $\mu$ would be the amount in each cup were the amounts transferred to be equal. Notice the above equation can rearrange to $$\frac{a_1+a_2+\ldots+a_n}{n}=\mu$$ which is the standard definition.
The geometric mean is just the same, except we use multiplication instead of addition. So, we want to solve $$a_1\cdot a_2\cdot\ldots\cdot a_n=\underbrace{\mu\cdot\mu\cdot\ldots\cdot \mu}_{\text{$n$ times}}$$ where we get that the product of every element is preserved when we replace all the elements with $\mu$. This could be imagined as, if we had rectangular prism in $n$ dimensions (that's easy to imagine right? Just let $n$ be not-very-big), then the product of all its side lengths $a_i$ would be its volume. Moreover, if stretched one dimension by a factor $c$ while compressing another by the same factor, the volume would be preserved - so the geometric mean $\mu$ would be the sidelength of the hypercube with the same volume. The above formula, of course, simplifies to $$\sqrt[n]{a_1\cdot a_2\cdot\ldots\cdot a_n}=\mu.$$
A nice note to add to this answer is that a weighted average, with weights $w_i$ can just be interpreted as the solution to $$w_1a_1+w_2a_2+\ldots+w_na_n=w_1\mu+w_2\mu+\ldots+w_n\mu$$ where, again, we replace every $a_i$ with $\mu$. I suppose this generalizes to create an "average" with respect to any $f$ where the average satisfies $f(a_1,a_2,\ldots\,a_n)=f(\mu,\mu,\ldots,\mu)$ - where "arithmetic mean" is the where $f$ is the sum and "geometric mean" is where $f$ is the product (just the same as the difference between "arithmetic" and "geometric" progression) - and "weighted average" is a weighted sum, and so on.