Confusion in the meaning of variables in Big-O notation.

$f(n)$ and $g(n)$ are any positive functions. For instance $f(n)$ may describe the running time of a certain algorithm for an input of size $n$, or it may describe the amount of memory a certain algorithm uses for an input of size $n$, but it may also be entirely unrelated to algorithms -- this asymptotic notation is just a mathematical notation which is used in analysis.

To say that $f(n) \in O(g(n))$ means intuitively "$f(n)$ does not grow faster than $g(n)$ times a constant". For instance, $n^2 \in O(n^3)$ because we can take $n_0=1$ and $c=1$ because $0\leq n^2\leq n^3$ for all $n\in\mathbb{N}$. We also have $5n^2 \in O(n^2)$ with $n_0=1$ and $c=5$. But on the other hand we can't say that $n^3\in O(n^2)$ because for every constant $c$ there is some $m\in\mathbb{N}$ such that for every $n>m$ we have $n^3 > cn^2$.

The purpose of the constant $c$ is to allow us to say that functions which are the same up to multiplication by a constant "grow at the same asymptotic rate" in the sense that for any $f(n)$ we have $f(n)\in O(Cf(n))$ and $Cf(n)\in O(f(n))$ for any $C>0$. The purpose of the constant $n_0$ is to allow the functions to possibly behave differently for the first few terms. What matters is what happens asymptotically, that is, for large $n$, and if there are a finite number of exceptions for the rule then $n_0$ is there to allow those.

Keep in mind that a very common abuse of notation is to write $f(n)=O(g(n))$ instead of $f(n)\in O(g(n))$.

We say that a function $f$ is $O(g)$ if there is some constants $C>0$ and some $n_0>0$ such that $0\leq f(n)\leq Cg(n)$ for all $n>n_0$ as you have stated. Now we want to relate this to an algorithm, and what we mean when we say that an algorithm is $O(f)$ is that the running time of the algorithm is $O(f)$ in the "worst-case scenario". Now what we mean by worst case scenario is the most computational steps used in the algorithm, so the function we are looking at is the maximum computations taken with an input size of $n$.

I think an example might enlighten us. Consider the classical bubble sort. Let us do a concrete example. Say we wish to sort $3,2,1$. Then we go through and compare $3$ and $2$ and swap them to get $2,3,1$, then we compare $3$ and $1$ and swap to get $2,1,3$. That is the first pass through the algorithm and we have that the largest entry is to the right, and we had to do $2$ checks/swaps, so $2$ computations. Now when we do the second loop, we will compare $2$ and $1$ and swap them. There is $1$ computation done here (as we don't have to compare with $3$ as we know that $3$ is already in the correct place). Thus, for bubble sort with $3$ elements we did $2+1$ computations, so $f(3)=2+1$.

If we think about what would happen if we did this for a list of $n$ elements, then we would do $n-1$ checks/swaps in the first loop, then $n-2$, all the way to $1$. Thus, we find that $f(n)=\sum_{i=1}^{n-1}i=\frac{(n-1)n}{2}=\frac{n^2-n}{2}$. Now we see that this function is $O(n^2)$ (as it is a well-known fact that for polynomials are $O(x^{\deg(f)})$). So this is what we mean when we say that bubble sort is $O(n^2)$ that the function of the number of computations done when we put in a list of $n$ elements is $O(n^2)$.

Now there is something that I should mention, there are implementations of bubble sort where if you don't do any swaps in a loop you conclude that the list is already sorted and you terminate the algorithm, and in this case, if you put in the list $1,2,3,...,n$ that is already sorted, it would just do a single pass through the loop (i.e. it would only do $n-1$ comparisons for this example). This does not mean that bubble sort is $O(n)$ (even though it would run faster than quadratic in this case). This is the meaning of "worst-case scenario" since the worst case of a list of length $n$ is doing every single check.

Let $f(n)$ be the running time of the algorithm as a function of the input size $n$. We are interested in the rate of growth of the running time. As we provide more and more inputs to the algorithm, the running time increases. In the limit as $n \to \infty$ (for some very large input), is it possible to find a $g(n)$, such that the rate of growth of the algorithm $f(n)$ is bounded by the rate of growth of $g(n)$?

For instance, take insertion sort. It's good to know, for instance, insertion sort on a bunch of numbers takes atleast, say, $t=10$ seconds. However, it would be great, if we can say, insertion sort takes no more than $t=12$ seconds. Because, everybody loves a guarantee. Thus, we are trying to find an upper bound on the growth of running time.

Suppose an algorithm has a running time $f(n)=999,999 n^2$. We can always choose a constant $c=1,000,000$, so that $f(n) \leq cg(n)$, where $g(n)=n^2$. Thus, $999,999 n^2 \in O(n^2)$. Since, $\sqrt{n},\log{n}, n, n^{\frac{3}{2}},n^{\frac{5}{3}},n^{\frac{7}{4}},\ldots,5n^2$ - all of them grow no faster than $cn^2$ ($c$ is of our choosing), they belong to $O(n^2)$.