Why can't we formalize the lambda calculus in first order logic?
I'm reading through Hindley and Seldin's book about the lambda calculus and combinatory logic. In the book, the authors express that, though combinatory logic can be expressed as an equational theory in first order logic, the lambda calculus cannot. I sort of understand why this is true, but I can't seem to figure out the problem with the following method. Aren't the terms of first order logic arbitrary (constructed from constants, functions, and variables, etc.) so why can't we define composition and lambda abstraction as functions on variables, and then consider an equational theory over these terms? Which systems of logic can the lambda calculus be formalized as an equationary theory over?
In particular, we consider a first order language with a single binary predicate, equality. We take a set $V$ of terms over the $\lambda$ calculus, which we will view of as constants in the enveloping first order theory and for each $x \in V$, we will add a function $f_x$, corresponding to $\lambda$ abstraction. We also add a binary function $c$ corresponding to composition. We add in the standard equality axioms, in addition to the $\beta$ and $\alpha$ conversion rules, which are a sequence of infinitely many axioms produced over the terms formed from the $\lambda$ terms from composition and $\lambda$ abstraction. I doubt this is finitely axiomatizable, but it's still a first order theory.
Untyped $\lambda$-calculus and untyped combinatory terms both have a form of the $Y$ combinator, so we introduce types to avoid infinite reductions. As (untyped) systems, they are equivalent only as equational theories. To elaborate;
Take $(\Lambda , \rightarrow_{\beta\eta})$ to be the set of $\lambda$-terms equipped with the $\beta\eta$-relation (abstraction and application only, I am considering the simplest case) and $(\mathcal{C}, \rightarrow_R)$ be the set of combinatory terms equipped with reduction. We want a translation $T$, be a bijection between the two sets that respects reduction. The problem is that reduction in combinatory terms happens "too fast" when compared with $\beta\eta$, so such a translation does not exist. When viewed as equational theories, i.e. $(\Lambda , =_{\beta\eta})$ and $(\mathcal{C}, =_R)$, we can find a translation that respects equalities.
An interesting fact is that $=_{\beta\eta} \approx =_{ext}$, where $=_{ext}$ is a form of extensional equality on $\lambda$-terms.
To answer your question, when you view these systems as equational theories, they are both expressible (in FOL) since one of them is and this translation exists. But using just reduction, $\lambda$-calculus has the notion of bound variables, while combinatory logic does not. Bound variables require more expressibility in the system, more than FOL. You need to have access to FOL expressions in order to differentiate them from free variables, which you cannot do from inside the system.
Concepts related to, and explanatory of my abuse of terminology when I say "you need to have access to expressions of FOL" are the $\lambda$-cube and higher-order logics from formal systems, and "linguistic levels" from linguistics. Informally it is the level of precision the system in question has. Take for example the alphabet $\{a,b\}$ and consider all words from this alphabet. This language requires the minimum amount of precision to be expressed, since it is the largest language I can create from this alphabet. Now consider only words of the form $a^nb^n$. This language requires a higher level of precision for a system to be able to express it, because you need to be able to "look inside" a word to tell if belongs to the language or not. You can read more about this topic on https://devopedia.org/chomsky-hierarchy