How does host key checking prevent Man in the Middle Attack?
as the host key is publicly available, is it possible for anyone to spoof the server by copying & using this host key?
No. The public key of the server is publicly available. To spoof the server one needs its private key. They are different, they are mathematically connected, they form a pair. The math behind them makes deriving the private key from the public key very, very hard (computationally expensive).
Knowing the public key of the server (in your local ~/.ssh/known_hosts
), your SSH client can build a "puzzle" for the server to solve. Solving this puzzle is easy if and only if the server knows the corresponding private key. Therefore if the server solves the puzzle then your client knows the server has the right private key, so it's the genuine server.
And the other way around: when you want to authenticate with your private key, the server (having your public key in ~/.ssh/authorized_keys
of the user you want to log in as) builds a "puzzle" for your SSH client to solve. Solving this puzzle is easy if and only if the client knows the corresponding private key, i.e. your private key. Your client does know it, it solves the puzzle and the server knows it's you.
Potential spoofers can know the public key of the server. This allows them to check if the real server is genuine (by building a puzzle), but they still cannot impersonate the server because they cannot easily solve somebody else's (e.g. your) puzzle designed to verify the authenticity of the server. They need the private key of the server to do this.
Similarly if somebody knows your public key but not your private key, they are not able to authenticate to servers you can authenticate to (with your private key).
Private keys should be kept secret.
Additionally the client and the server encrypt the communication using a session key. They start by each picking some (random) secret and some information easily derived from the secret. They exchange the derived information and both arrive at an identical secret session key. The math behind the process makes deriving the session key (or the starting secrets) from the information exchanged very, very hard (computationally expensive). An external observer cannot predict the session key. But the starting secret and the information received from the other end is enough to get the same key on both ends. The key depends on both starting secrets, so no end can force some particular key.
This happens before authentication. In fact the authentication procedures use the session key to build "puzzles". Thanks to this a man in the middle who plays the server in front of the client, and plays the client in front of the server, cannot simply relay puzzles and their solutions. His communication with the client will use different session key than the communication with the server. He could relay data by decrypting with one key and encrypting with the other. But if he relays a puzzle, the session key built into the puzzle will not match with what the other end expects. The authentication will fail. The man in the middle needs to actually solve puzzles to relay them (or rather to "translate" them from one session key to the other). Or he needs to build independent puzzles. In any case he needs to know the private keys to successfully sit in the middle and eavesdrop, modify or inject data.
To summarize: without knowing the private key of the server, attackers cannot impersonate the server on their own, nor can they relay authentication from the genuine server.