Motivation for definition of free group?
Let $S$ be a set and $F_S$ be the equivalence classes of all words that can be built from members of $S$. Then $F_S$ is called the free group over $S$.
I don't understand the motivation for this definition. Since each word $w$ in $F_S$ is a finite product of elements of $S$, it uniquely identifies to an element $s\in S$, so if $S$ were a group then clearly $S$ and $F_S$ would be isomorphic. What makes the free group an interesting object? I assume it is the case when $S$ is not a group, but instead some arbitrary set closed under some binary operation. I suppose the most general type of set we could define a free group over would be a magma, then? What group axiom should we leave out to construct an interesting (nontrivial) example of a free group? I suppose it would be associativity, but I am not sure.
The idea of a free group is a way of turning sets (without any structure) into groups. So, a word in $F_S$ does not uniquely identify an element of $S$, because there is no way given to multiply elements of $S$. In particular the word "set" comes with no implications of any operations on that set.
A somewhat modern way to express the motivation behind free group is via the following axiom:
The free group $F_S$ has the property that there is a function $i:S\rightarrow F_S$ given by sending each element of $S$ to the word consisting of that element alone and that, for any function $i':S\rightarrow G$, there is a unique homomorphism $f:F_S\rightarrow G$ such that $i'=f\circ i$.
I highlight the kinds of maps involved in this definition, to note that the free group converts the set theoretic notion of a function into the group theoretic homomorphism. This definition is terse and takes some time to appreciate, but let's think about a particular example:
Consider the free group $F_1$ on a single element $\{x\}$. There has to be some element in this group called $x$, since we must be able to embed $x$ into our group. Moreover, we can work out that some equation in $x$ holds in $F_1$ if it holds for every $\bar x \in G$ - which is exactly what the homomorphisms capture in the definition. So, for instance $$x\cdot x^{-1} = e$$ must be true, because it is true of every group and so must $$x\cdot (x\cdot x) = (x\cdot x)\cdot x.$$ On the contrary, $$x\cdot x = e$$ must not be true in $F_1$ because it is not true of the element $1$ in $\mathbb Z/3\mathbb Z$ for instance. We can get the suspicion that $F_1\cong \mathbb Z$ through examples, since every product involving only $x$ and $x^{-1}$ reduces to $x^n$ for some $n$. This is what is meant by saying that an equation follows from the group axioms: that means that it is true in every group.
In fact, once we've proved that the definition, if it identifies a group, identifies a group uniquely up to isomorphism (not too terrible of an exercise, but not obvious - it's good to think of why $F_1$ can't be $\mathbb Q$), we can see that setting $F_1=\mathbb Z$ and selecting $x=1\in \mathbb Z$, the definition above is indeed satisfied by constructing morphisms $f:F_S\rightarrow G$ that take $n\in\mathbb Z$ to $i'(x)^n$.
Things work out similarly if we have multiple elements; for instance, we would see that equations such as $$x\cdot y = y\cdot x$$ do not hold in every group $G$ and for every $x,y\in G$, and in fact, the only equations that do hold are those that only involve easy cancellations, as in the free group - and to prove that, we just note that the set of words under this cancellation law is a group in which only these relations hold and that the group axioms imply that these relations must hold in every group. Formally, this gives another notion of a free group:
The free group on a set $S$ is the set of expressions built from multiplication and inversion using elements of $S$ (and an added identity element), where two expressions are considered equivalent if their equivalence follows from the group axioms (i.e. holds in every group).
A nice generalization of this is that you can then go on to define group presentations like $$D_{16}\cong\langle x,y | xy = y^{-1}x, y^2 = e, x^8 = e\rangle$$ similarly to be a group where an equation holds if and only if it follows from the group axioms and the given relations. Equivalently, you can define it as a group $G$ with identified elements $x,y$ such that, for any group $H$ and any elements $\bar x,\bar y\in H$ satisfying all the desired relations, there is a unique homomorphism $f:G\rightarrow H$ taking $x$ to $\bar x$ and $y$ to $\bar y$ - and with a bit more work, you can see that this is also just the quotient of the free group on $\{x,y\}$ by the normal subgroup generated by the set$\{xyx^{-1}y,y^2,x^8\}$.
Okay, but your question is somewhat implicitly asking about what happens when $S$ was already a group - since then, when we have a word in $S$, we already know how to multiply it together. This leads to an interesting thought: $F_S$ is a group built by forgetting how to multiply, then putting a new multiplication rule generated by this set. In fact, the prior definition leads us to a nice fact: there is a homomorphism $\epsilon:F_S\rightarrow S$ that takes a free group on a group back into the group. This is, in category theory parlance, called the counit, but that's not so important.
This map $\epsilon$ not the identity nor is it ever an isomorphism - for instance, if we started with the trivial group $(\{e,\},\cdot)$ and take the free group, we get that the free group on $\{e\}$ is $\mathbb Z$ with members of the form $e^n$ - all of which, when multiplied out, give $e$. So, somehow, the members of this free group are "unevaluated" expressions in the prior group. What this also tells you is that, since $\epsilon$ is clearly surjective, it must be that $S$ is a quotient group of $F_S$ - telling us that every group is a quotient of some free group.
What's really neat about these maps is that you can define the notion of a group by thinking about them carefully. In particular, if we know how to take free groups, but even then forget about how to multiply
Let $S$ be a set and let $FS$ be the set of reduced words over $S$. A group $G$ is a set $G$ along with a map $f:FG\rightarrow G$ such that, treating $g$ as a one letter word in $FG$, we have $f(g)=g$ and for any element $\omega$ of $FFG$ (i.e. a reduced word whose letters are each reduced words), the following processes yield the same result: (1) take the word $\omega$ and apply $f$ to each letter in the word, yielding a word in $FG$ after reduction. Then apply $f$ again to this word. (2) append all the reduced words in $\omega$ together to get a reduced word in $FG$. Apply $f$ to this.
For instance, if we want to define the group $\mathbb Z/2\mathbb Z$ using this definition, we would start with the set $G=\{e,x\}$ and then define a map $f:FG\rightarrow G$ by saying $f(w)$ is $e$ if an even number of $x$'s appear in $w$ and is $x$ otherwise. It's clear that $f(e)=e$ and $f(x)=x$ for the first axiom. For the second, we would consider words like $$(ex)\cdot(xx)^{-1}\cdot(xe)$$ and note that, apply $f$ to each "letter" (parenthesized expression) in this word gives $$x\cdot e^{-1} \cdot x$$ which gives $e$ when we apply $f$. If we instead concatenated the word together first and cancelled, we would get $$exx^{-1}x^{-1}xe\rightarrow ee$$ which then, applying $f$, gives $e$. One can figure out that this process really does define a group, so we can then say that a group is precisely a rule for transforming words in $F_S$ back into $S$. This process generalizes into the notion of an algebra over a monad, but that's more category theory nonsense we don't need to worry about.
To finish, it's worth considering what happens when we take away some axioms of groups; if you remove inverses, then you just get that the free monoid on a set $S$ is just the set $S^*$ of all words in $S$, under the operation of concatenation - where you still have relations like $$(xy)z = x(yz)$$ but almost nothing else. If you get rid of associativity and identity, you end up with the free magma on a set... which is just the set of all fully parenthesized expressions with one operator over that set (i.e. the set of rooted ordered binary trees whose leaves are labeled with the set and where the operation is taking two trees and building a new one whose root's left child is the left argument and right child is the right argument).
A bit more illuminating is actually to add structure. For instance, we can get a ring by sensibly axiomatizing addition and multiplication - and then the free ring on a single element set $\{x\}$ is every expression one can write in terms of $x$ and the terms $0$ and $1$ with multiplication, addition and negation - so expressions like $1+x+x\cdot (x+1)$. These all reduce down to some polynomial with integer coefficients - and one can prove that the free ring on a single element is just $\mathbb Z[x]$: the ring of polynomials with integer coefficients. This also has the significance that you can essentially evaluate these polynomials in any ring by looking at the homomorphism that takes $x$ to what you want to evaluate at, then seeing where that same homomorphism takes the polynomial in question. For instance, if you want to evaluate $x^3-2$ at $\sqrt{2}\in\mathbb R$, you can send $x$ to $\sqrt{2}$ and see that $x^3-2$ must go to $2\sqrt{2}-2$.
There are also some examples where "freeness" doesn't work out; for instance, a field has multiplication, division, addition, and subtraction. There is no free field on any set however, since, for instance, the equation $$1+1=0$$ is not true in every field, so couldn't hold in any free field - however, then we're confronted with the fact that we can't map any field in which this doesn't hold into any field in which it does, because, for instance, $\frac{1}2$ makes no sense in $\mathbb Z/2\mathbb Z$.
You will also find examples where you start with some structure, and then freely add some more structure - this is most common with rings (e.g. you can start with a monoid for multiplication and extend it to a ring), but it also can apply to groups - for instance, you can start with a group and freely "extend" to an abelian group (giving the process of abelianization) or start with a monoid and turn it into the freest group possible. There's also some analogous notions in fields such as topology - in general, these ideas fall under the general category of adjunction from category theory (but let's still not worry about that).
You said: "if $S$ were a group then clearly $S$ and $FS$ would be isomorphic." This is not correct. $FS$ is an object constructed considering $S$ only as a set, ignoring any structure (whether group, magma, etc.) on $S$. The free group over $S$ can be defined over any set $S$, even if it has no additional structure on it.
For example, if $x$ and $y$ are two elements of $S$, then $xy$ will be an element of $FS$. If $S$ happens to be a group, then we could multiply $x$ and $y$ within $S$, but this is not equal to $xy$ in the free group $FS$. In the free group, $xy$ is a brand new element not in $S$. Similarly, the free group $FS$ will contain an element $x^{-1}$, but this is not equal to the inverse of $x$ in $S$ even if $S$ happens to be a group. In the free group, all inverses of elements of $S$ are new elements not in $S$.
The free group is important because it is the "simplest way to make a group out of a set". We start with the elements of $S$ and do the minimal amount of work to make it a group. We add inverses $x^{-1}$ for each $x \in S$, because the group axioms say we have to. (Emphasis: these elements $x^{-1}$ are new elements not in $S$.) We add an identity (which I will call 1), because the group axioms say so. We force $xx^{-1} = x^{-1}x = 1$ and $1x=x1=x$, again because the group axioms say so. And a word like $xyz$ can be written without parentheses because of the group axioms (associativity). But that's it. Two words in the free group are equal if they can be made to appear identical by canceling $xx^{-1} = x^{-1}x = 1$, but that's it. Thus if $a,b,c,d$ are distinct elements of $S$, then $abb^{-1}c = ac$, but $ab \ne cd$ in $FS$ because no cancellation is possible. (Note that if $S$ were an arbitrary group, we could have $ab = cd$ even if $a,b,c,d$ are distinct, but not in the free group on $S$.)
If $S$ contains $n$ elements, then any group generated by $n$ elements is a quotient of $FS$. This gives rise to the construction of groups by generators and relations, which is important.
Perhaps after reading the other answer, it makes sense to think about some other free objects as well. I hope the examples below make it clear that free constructions are indeed useful!
Free monoid
Recall that monoids are groups without inverses. That is, they have a set, an associative binary operation and a neutral element. The free monoid on $\{0, 1\}$ is the set of all finite, possibly empty binary strings -- also known as the Kleene closure. It is commonly written $\{0, 1\}^* = \{\varepsilon, 0, 1, 00, 01, 10, 11, 000, 001, \ldots\}$.
Notice how the construction again started with $\{0, 1\}$ and added a required neutral element $\varepsilon$ -- the empty string. Furthermore, we added all possible variations we can construct with the binary operation $\circ$. Note that we left out $\circ$ in the notation above. Actually, you should read it as $\{\varepsilon, 0, 1, 0\circ 0, 0\circ 1, 1\circ 0, 1\circ 1, 0\circ 0\circ 0, 0\circ 0\circ 1, \ldots\}$. Last but not least, we omitted parentheses since we have associativity in monoids.
Prime numbers
Take the prime numbers $2, 3, 5, 7, \ldots$ and construct the free Abelian monoid on it. You end up with $\mathbb{N}_{\geq 1}$. Because indeed every natural number can be written as a product of primes.
If you have, say, $n = p_1 p_2 p_3$, then
- $p_2 p_3 p_1$ represents the same number, hence the Abelianness condition
- $(p_1 p_2) p_3$ represents the same number, hence the associativity in the monoid
- $p_1 p_2 p_3 p_4$ represents always a different number, even if $p_4 = p_1$, hence the condition of freeness. It ensures that terms are identified if and only if it is due some law in "Abelian monoids" -- since I said free Abelian monoid
Construction of $\mathbb{Z}$ and $\mathbb{Q}$
Look at the monoid $(\mathbb{N}_0, +)$ and think about why $(\mathbb{Z}, +)$ is "bigger"/"richer" in structure. Namely, the integers have inveres wrt. $+$. In a sense, transforming $(\mathbb{N}_0, +)$ to $(\mathbb{Z}, +)$ is just "groupifying a monoid".
Now consider the multiplicative monoid $(\mathbb{Z}\setminus 0, \cdot)$ and think about why $(\mathbb{Q}\setminus 0, \cdot)$ is "bigger"/"richer" in structure. Namely, the rationals have inverses wrt. $\cdot$. In a sense, transforming $(\mathbb{Z}\setminus 0, \cdot)$ to $(\mathbb{Q}\setminus 0, \cdot)$ is just "groupifying a monoid".
The last two examples are not exactly examples for the free construction I mentioned above all the times, however, they are quite related, if you ask me. They are instances of the Grothendieck group construction, which constructs an commutative group out of a commutative monoid.
F-Algebras, construction of $\mathbb{N}$
If you fix a signature, i.e. constructors, say, $\{z^0, s^1\}$ for zero and successor where the superscripts indicate their arity as a function symbol, and then seek the free-est algebra there is on this signature, it turns out it is (isomorphic to) the natural numbers.
If you instead fix the signature $\{\mathrm{leaf}^0, \mathrm{bin}^2\}$ for leaves and binary branches, you get the algebra of (possibly unbalanced) binary trees as the free-est (term) algebra. Namely, the resulting set is $\{\mathrm{leaf}, \mathrm{bin}(\mathrm{leaf}, \mathrm{leaf}), \mathrm{bin}(\mathrm{bin}(\mathrm{leaf}, \mathrm{leaf}), \mathrm{leaf}), \ldots\}$.
Both claims are made precise and given a good framework with F-algebras. It requires some bit of category theory, mind you.
Interestingly, if you work with the almost dual F-coalgebras, you can describe automata with the unfree-est co-algebras.