Is many sorted logic really a unifying logic?
The last two (rhetorical) questions point toward the idea that the goal of a logic should not be to talk in details about the background set theoretical environment, the mathematical universe.
It is actually misleading to think of SOL with full semantics as 'allowing talk' about the background mathematical universe. This is because full semantics is really and truly a semantic notion totally separate from the (syntactic) second-order system itself. I think it is important to maintain a very clear distinction between syntax and semantics; a system cannot talk about any interpretations of itself. Even if we can interpret it to be doing that, it is still just an interpretation, and not what it is doing. In other words, the symbols are meaningless until we interpret them. You can choose to interpret an SOL system using full semantics, or you can choose to use Henkin semantics, and neither choice has anything to do with the system itself; it generates the same theorems.
Furthermore, the formal system meant for foundational purposes has to be one with a computable proof verifier (or equivalently theorem generator). So it would make no sense to ask for a formal system that is complete with respect to SOL with full semantics, since any reasonable foundational system must be able to reason about finite program runs, which by the generalized incompleteness theorems is incompatible with full-semantics SOL, since PA2 (second-order PA) is categorical but the theory of $\mathbb{N}$ is uncomputable.
This means that if we are interested in full-semantics SOL it cannot be for foundational purposes, and so must simply be a mathematical notion residing in our meta-system (MS). Of course in MS we can discuss formal systems as objects, but for the reasons above, MS cannot be based on any formal system that is complete with respect to full-semantics SOL, and so necessarily no system captured by an object in MS can pin down the 'true' natural numbers, because the mathematical notion $\mathbb{N}$ itself is merely an object in MS, very incompletely pinned down by MS itself. Same goes for anything that appears to be essentially uniquely defined by some second-order system under full semantics.
In other words, a so called interpretation in the mathematical universe is not necessarily the "practical" interpretation that concerns us and thus we should not care about the extra "expressive power" of standard SOL.
As explained above, one cannot consider full-semantics SOL to have more "expressive power" in the specific sense of being able to express more, since it does not. Of course, MS may assume (or prove) the existence of a full-semantics model of PA2, and may further prove that this model is essentially unique. But (in my view) it is not full-semantics PA2 that is more expressive than (first-order) PA; it is MS that is more expressive!
To clarify my objection to the use of the word "expressive", consider what you think PA2 is really expressing. Its meaning under full semantics can only be defined with respect to the collections that exist in MS, and what collections exist in MS are obviously governed by MS. So it is misleading to think of PA2 as expressing anything on its own since its meaning is tied to what MS can express.
In contrast, the notion of models of an FOL system is quite independent of MS. Although the domain still must be a collection in MS, and each symbol must be interpreted by some function or relation in MS, the truth-values of sentences in each model is fixed, independent of MS (as long as you agree that collections, functions and relations are meaningful independently of MS). Similarly with Henkin models of an SOL system. This is not so with full-semantics models of an SOL system, because the collection of sets must be exactly the powerset of the collection of individuals, where "powerset" must exactly coincide with what MS 'thinks'. Clearly then we are importing all the assumptions of MS into the notion of a full-semantics model.
The thesis that is the subject of the question is whether we are OK in practice to give up the extra expressive power and other semantic properties of standard SOL? In that sense, is really many-sorted logic a unifying logic?
So (as per above) it does not make sense to say that full-semantics SOL had any extra expressive power over FOL at all. It is analogous to claiming that we can get more by assuming that $\mathbb{N}$ is a standard model of PA than by just assuming it is a model of PA. Of course, but this assumption is made in the MS, not in PA.
In the same manner, if we are building a foundational higher-order formal system, we should reasonably want it to be complete for Henkin semantics, because that captures more or less what second-order objects we are certain necessarily exist (since the comprehension axioms correspond exactly to the definable subsets over the first-order objects). Thus a formal system that is complete for Henkin semantics is not only more general (unifying) than one for full-semantics, but will also be more robust.
For example, any reasonable MS can prove (via the incompleteness theorems) that if MS itself is consistent then MS has a model with a non-isomorphic $\mathbb{N}$. So it means nothing for second-order PA to be categorical under full semantics. In contrast, there is no 'illusion' of categoricity for Henkin semantics.
Note that what I wrote is based on my interpretation of the English words "expressive" and "talk" and "unifying" and so on, and other logicians may not have the same notion in mind when they use the same words. But anyway the mathematics is the same; MS knows that for every model of itself there will always be another model that disagrees on full-semantics models of PA2 even though it is categorical, and that is the core of what I am getting at when I say that full semantics gives an 'illusion' of expressiveness. It is not that Henkin semantics allows you to 'express less', but rather it does not give the impression that you can pin down structures that cannot be pinned down. Hope it is clearer. =)
And now I'm going to say something about translation of practical formal systems into FOL.
What is the status of this thesis now? Do we have many logic systems that are used in practice, say in computer science, foundations of physics, etc. that cannot be translated into many-sorted logic. Also, does this thesis say that we could also directly use many-sorted logic in these applications or is it that a lot is lost in the translation in terms of convenience.
Technically speaking, every possible practical foundational system $S$ can be translated uniformly and easily into FOL. This is because (as stated above) it must have a computable proof verifier, and so even a weak MS such as TC (theory of concatenation) or PA will be able to 'trace' the finite runs of the verifier and hence capture which statements are proven by $S$.
Specifically, there is a $2$-parameter sentence $V$ over TC such that for every strings $p,x$ we have that MS proves $V(p,x)$ iff $p$ is a valid proof over $S$ of statement $x$. (The same holds for PA but we of course have to encode strings say using Godel numbering, whereas the encoding is trivial for TC.) Note that MS is a first-order theory, and so we have effectively translated $S$ into a first-order system, since the theorems of $S$ can be recovered directly from the theorems of MS of the form $V(p,x)$. (Note that this depends on the soundness of MS, but almost every logician accepts the soundness of TC and PA.)
Such a translation is not only uniform for all practical formal systems but also easy to perform. However, the catch should be obvious; it is a purely syntactic translation. It fulfills the stated goal of representing any target system within a first-order system, but you may ask whether there is a 'natural' translation that also 'captures' the intended semantics. For example, you might want (what are intended to be interpreted as) collections in $S$ to also be something like collections in the target first-order system $T$.
But what does that actually mean? In the first place, the notion of "collection" will have to be a meta-theoretic notion, namely a concept in MS, otherwise the goal would be no more meaningful than a syntactic translation. Then the answer is no; there cannot always be 'natural' translations, because the intended semantics for collections in $S$ may well be incompatible with the intended semantics for collections in MS. For example, if MS is ZFC and $S$ is NF, then it is impossible for $U := \{ x : x=x \}$ in NF to be a set in ZFC with the same members, since $U∈U$ in NF, contrary to Foundation in ZFC. Although NF is a classical first-order theory, you can imagine non-classical variants of NF that still admit a universal 'set'. The point remains that you cannot expect to be able to 'naturally' translate any given arbitrary $S$ uniformly into FOL, without it degenerating into a syntactic transform.
Another way to put it is that 'natural' translations are necessarily ad-hoc. For example, you can 'naturally' translate HOA (higher-order arithmetic) into ZFC because we can construct iterated powersets of $\mathbb{N}$ in ZFC, and you can 'naturally' translate plenty of stratified type theories (such as CiC) into ZFC plus some large cardinal (e.g. CiC into ZFC plus countably many inaccessibles), because the stratification ensures that the intended semantics are compatible.
Moreover, I would say that it is impossible to 'naturally' translate even some 'pure logics' into FOL, such as provability logic. This is because the modal operator allows us to drop quotation symbols, which in the first place cannot be represented natively as first-order symbols, since quotations are essentially meta-operators. Observe that the PDF you linked to essentially constructs Kripke frames to translate the modal logics mentioned. This is clearly an instance of importing capabilities of MS, and not a 'natural' translation. It also only works because we have proven that those modal logics correspond to some first-order frame condition! If you actually perform that given translation on provability logic and try working in the resulting first-order system, I am sure you will be forced to agree that it is not 'natural'.
As for foundational systems that are applied (in computer science or physical sciences), the whole purpose of designing stratified type theories was to make them somewhat philosophically justifiable (and hence hopefully not suffer from arithmetical unsoundness). Also, as far as we know, almost every mathematical theorem with direct real-world applications can be proven naturally in HOA (especially if you have the base types $\mathbb{N}$ and $bool$ and you have the type $S→T$ for every types $S,T$, rather than just one type for each powerset iterate of $\mathbb{N}$). This is of course trivial to express as a many-sorted FOL system, with one sort for each type.
It is true that HOA is unable to prove things about arbitrary types, and that perhaps is the most convincing reason we should go beyond it, if only for conceptual 'completeness'. Unfortunately, as Russell's paradox shows, we cannot possibly have both boolean set membership and unrestricted set comprehension. ZFC discards the latter. But since practical applications almost solely involve 'concrete' objects (below some finite order in HOA), it is unlikely that future proof assistants will be based on radically different foundational systems, since mathematicians will not give up HOA, and it is sufficient for 'concrete' mathematics.
Currently, as far as I know all major proof assistants (such as those listed on Wikipedia) are indeed based on either a classical set theory like ZFC plus some large cardinal axiom, or a stratified type theory like CiC. For instance, Mizar is based on ZFC plus Grothendieck universes, and Coq is based on CiC. This at least implies that work done in established proof assistants can all be 'naturally' translated into a suitable first-order system. This is subjective, but you would have to try it yourself to form your own opinion.
Finally, I will note that many apparently modal logics that are widely used in applied computer science are actually easily translatable into FOL. For example, temporal logic is easily expressed using an axiomatization of time and events in time instead of using modal operators for tenses. Sure, it will be less elegant, but it would be far more general and also fit nicely into standard foundational systems.
The question is about the current status of translations of different kind logic systems into many sorted logic. These translations are at the core of Manzano's thesis. This is a partial answer to that question, because it deals only with the issue created by the greater expressive power of the full semantic of the second order logic, but it might be seen by many as the most important and fundamental aspect of the answer. The TLDR answer is that the expressive power of a semantic (in the sense of truths existing in that semantic) should not be confused with the expressive power of the logic itself.
Originally and fundamentally, as mentioned by Enderton,
Symbolic logic is a model in much the same way that modern probability theory is a model for situations involving chance and uncertainty.
... you select some features of [a real life] object to be represented in the model ... then you build an object that is like the original in some ways. ... [In Logic,] (t)he real-life objects are certain "logically correct" deductions.
-- H.B. Enderton, A Mathematical Introduction to Logic, Second Edition, in the Introduction.
In a way, we can say that we have two notions of truth. The truth as determined by the axioms and the deductive system and the truth in the "real objects". The idea is that we hope that the deductive system captures all the truths that are normally associated with the counterparts of the axioms in the "real objects". Enderton does not mention it explicitly, but even mathematical notions can be included in this "real-world". In fact, as Popper and many philosophers of science would tell you, we can never directly consider the real world in a theoretical manner. The only things that we can consider directly in a theoretical manner are sentences about this real world. A theory always considers relationships between sentences.
The connection between logic and other mathematical models (in the sense of an abstract theoretical representations of real world objects) is simply that logic use the same languages as in any other models, but the focus shift on the logical relationship between sentences and we are not interested in particular axioms. One key point is that logic should not be tied up with a particular real object or collection of real objects. It should be only about the logical relations between sentences.
This original concept of logic precedes the introduction of Modern Set Theory by Cantor in the 19th century, which he introduced as a way to investigate infinite sets of real numbers. But,
(A)s with other branch branches of mathematics [Logic] has grown beyond the circumstances of its birth. -- Enderton, idem.
... and so has Set Theory. Because of the discovery of paradoxes in Set theory in the early 1900, set theory had to be formulated more rigorously based on logic. In the following years, having a solid logical foundation, Set Theory gained in importance as a basis for mathematics. Worth of mention in this regard, is the Principia Mathematica written by Alfred North Whitehead and Bertrand Russell and published in 1910, 1912, and 1913 (see https://en.wikipedia.org/wiki/Principia_Mathematica)
We mention this aspect of the history to emphasize that during the next 20 years or so, Logic was behind set theory, but there was no formal relationship in the other direction. It is only with Tarski and Godel, in the early 1930, that a meta-logical notion of truth given by mathematical structures was introduced. In this meta-logical notion, a sentence is considered true (a validity), if it is true in all mathematical structures that respect a given signature fixed by the logic. This was the birth of modern logic as we know it now. The semantic (the collection of mathematical structures with the given signature) is nowaday an integral part of the logic. So, coming with any logic, we have now a third notion of truth. We recall that the first notion of truth is fixed by the deductive system and that the second is the "real world" of logical relationships, not necessarily attached to Set Theory. The third one is the notion of truth fixed by the chosen semantic.
With this third notion of truth, came also a new notion of expressive power, which depends on the chosen semantic. The natural choice is to pick a semantic that corresponds as much as possible to the "real world" of logical relationships. Let's recall that we want a logic that is independent of any specific "real world" object or domain of objects. Therefore, this suggests that the semantic should include all structures that can possibly be described formally. Given the previous argument that we can only directly consider sentences about the real world, if we believe in the general ability of the mathematical universe to express those sentences, the third notion of truth coming with this natural semantic becomes identical to the second notion of truth, the truth in the "real world" of logical relationships. It seems the right choice of semantic. This is the argument of Skolem’s conference published in 1923, of Manzano in 1996 (a student of Henkin) and of many others.
Now, we need to explain why a different semantic has been originally used for SOL instead of this natural choice. The history is important to put things in their context. If the signature of a logic has variables that represent sets and we can have quantifiers over these variables, it is considered a second order logic. To define the semantic, we must fix what are the possible domains for such a variable in the different interpretations. If the domain of such a variable is always, i.e, in all interpretations, all possible subsets of the universe (which is itself fixed in the structure), then it is the standard or full semantic. This is a very natural definition. It also makes the logic very expressive, because there is less structures. This is such a natural and attractive definition for the domain of a variable (that represents a set) that it is only in 1950 that a more general semantics was proposed by Henkin. Henkin realized what we discussed before and much more. He proved the interesting result that with a more general and flexible semantic that accepts all structures that can be formally described, we have only one notion of truth ! The deductive systems match perfectly with the "real world" of logical relationships as captured by the Henkin semantic !
Still, aren't we losing something by not using the expressive power of the full semantic? This is a bad question. Who said that we should not use it? The mathematical universe that is used to describe the models of the full semantics is itself described in first order logic. So, we have access to this limited collection of structures in the world of first order logic and we are not saying that we cannot consider it as a "real world" object and study it, again with a first order logic. The point here is only that we should not use this limited collection of models to define the semantic of a logic. What is the advantage of using the full semantic to define a logic, when it is known that the deductive system cannot prove the corresponding validities?