Scaling property of Fourier series and Fourier Transform
This question about the intuition behind the scaling property of the Fourier transform made me wonder about the corresponding notion for a Fourier series.
The Fourier transform of $f(ax)$ is $\frac{1}{|a|} \mathcal{F(\frac{u}{a})}$. If $a>1$ then the graph of $f(ax)$ is $f$ compressed and so its Fourier transform has frequencies that are higher. However they are scaled down in magnitude.
On the other hand take $f(x)= \cos x$. Its fourier series is trivially itself, with the coefficient of $\cos x$ being $1$. Scaling it to $f(ax) = \cos ax$ where $a>1$ is an integer still has a trivial fourier series, and the coefficient of $\cos ax$ is $1$. This is unlike the Fourier transform where the "coefficients" of each frequency get scaled as the formula shows: $\frac{1}{|a|} \mathcal{F(\frac{u}{a})}$.
Is there a conceptual way to explain this discrepancy?
Let $a\neq 0$. For a function $f$ define $m_af$ by $(m_af)(x) = f(ax)$. Then the scale factor that appears in a Fourier transform of $m_af$ is directly related to the norm of $m_a$ since $$||m_af||_2 = ||m_a||\cdot ||f||_2$$ and a Fourier transform on $\mathbb{R}$ or $S^1$ is (up to a normalization factor) unitary. Now on $\mathbb{R}$ we have $||m_a|| = |a|^{-\frac{1}{2}}$ while on $S^1$ (and $a \in \mathbb{N_{>0}}$) $||m_a||=1$.
As an amusing aside, if you take the Fourier transform of $\cos$ as a tempered distribution then $$\mathcal{F}(m_a\cos) = \tfrac{1}{2|a|}(m_{a^{-1}}\delta_1 + m_{a^{-1}}\delta_{-1})= \tfrac{1}{2|a|}(|a|\delta_a + |a|\delta_{-a}) = \tfrac{1}{2}(\delta_a + \delta_{-a})$$ by the transformation property of the Dirac distribution as noted in a comment. So in this case there is a factor $|a|^{-1}$ as usual although it is not directly visible.
Since you wanted a conceptual explanation:
the reason is that the action of "scaling" on the circle, and the action of "scaling" on the real line, are two different things.
On the circle, the mapping $\theta \mapsto a\theta$ "wraps around", and for $a$ a nonzero integer, the image of the circle under this map wraps around $|a|$ times.
On the real line, the mapping $x\mapsto ax$ is one-to-one.
And this makes a huge difference.
When you do the rescaling $\cos \theta \to \cos 2\theta$ on the circle, you are not just compressing the function by making the characteristic length-scale smaller, you are also cramming two copies of the rescaled function into the same circle. In fact, this works for any periodic function: the mapping $g(\theta) \to g(a\theta)$ scales spatially and also makes $|a|$ copies of the function.
When you do the rescaling $f(x) \to f(ax)$ on the real line, you are only compressing the function spatially. There still is only one copy of the function. This difference in the number of copies is, morally speaking, why there is a factor of $|a|$ difference in the two formulae.
Another way to think about it: a "better" (in some cases) way of thinking about the Fourier series (in the context of as a special case of the Fourier transform; this way is not necessarily better for other applications) is, instead of extending the function on $[0,2\pi]$ periodically, extend it by the $0$ function outside of $[0,2\pi]$. Then you see immediately that the evaluate of the Fourier transform at integer values gives you precisely the Fourier coefficients for the series on the circle. So define the function $g(x) = \cos(x)$ if $x\in [0,2\pi]$ and $0$ elsewhere. The rescaling of $g(x)$ is the function $g(ax) = \cos(ax)$ if $x\in [0,2\pi/a]$ and $0$ elsewhere. This is very different from the function $\cos(ax)$ on $[0,2\pi]$.
This is all to say that what you thought of as rescaling on the circle is not just rescaling: but scaling and copying.
I think I can see what you're getting at -- you want to view the Fourier series as a Fourier transform that happens to consist of $\delta$ peaks instead of a smoothly varying density, and then you wonder why those $\delta$ peaks don't appear to behave the same under scaling as an ordinary smooth Fourier transform does.
First off, I don't think the default formalism for Fourier transforms actually allows for this viewpoint. It can probably be formalized in some way, though, such as by letting the transform produce (or act on?) a complex measure instead of an ordinary function. Never mind; let's imagine it can work and see where that leads us.
Now, intuitively it is tempting to think of $C\delta(x-x_0)$ as a "density" that is $0$ everywhere except at $x_0$ where it is $C\cdot\infty$. But that doesn't really work -- infinities don't work that way in the first place, and in particular the height of a delta peak doesn't behave quite like an infinitary density. And you've chanced upon one of the differences, namely how they react to scaling.
For an ordinary smooth density function $F$ we can scale it by $a$ by using $G(t)=\frac{1}{a} F(\frac{t}{a})$. The inner factor of $\frac{1}{a}$ does the actual stretching of the horizontal axis, and the outer factor of $\frac{1}{a}$ corrects for the resulting area increase, such that $$\int_{ap}^{aq} G(t)\;dt = \int_{ap}^{aq}\frac1a F(\frac ta)\;dt = \int_p^q F(u)\;du$$ for all intervals $[p,q]$. On the other hand, a delta peak needs no such outer factor, because at the peak itself the $\delta$ function cannot see that its argument has been stretched! Instead we have $$\int_{ap}^{aq} \delta(\frac xa-x_0)\;dx = \int_p^q \delta(x-x_0)\;dx$$ with no outer factor needed to compensate for the horizontal stretching of the peak, because the peak has width 0 and so doesn't stretch at all.
And that is why you're seeing a difference in the behavior of Fourier coefficients versus Fourier transforms.