What's the advantage of synchronizing UID/GID across Linux machines?

Before I plunge into the depths of how to synchronize UID's/GID's across my different Linux machines, I would like to know what is actually the benefit?

I know that this keeps file synchronization relatively easy (as ownership is "naturally" retained). However this can also be achieved otherwise depending on the transmission service.

Is there anything else that would benefit from consistent UIDs/GIDs?


Solution 1:

technical debt

For the reasons below, it is much simpler to address this problem early on to avoid the accumulation of technical debt. Even if you find yourself already in this situation, it's probably better to deal with it in the near future than let it continue building.

networked filesystems

This question seems to be focused on the narrow scope of transferring files between machines with local filesystems, which allows for machine specific ownership states.

Networked filesystem considerations are easily the biggest case for trying to keep your UID/GID mappings in sync, because you can usually throw that "achieved otherwise" you mentioned out the window the moment they enter the picture. Sure, you might not have networked filesystems shared between these hosts right now...but what about the future? Can you honestly say that there will never be a use case for a networked filesystem being introduced between your current hosts, or hosts that are created in the future? It's not very forward thinking to assume otherwise.

Assume that /home is a networked filesystem shared between host1 and host2 in the following examples.

  • Disagreeing permissions: /home/user1 is owned by a different user on each system. This prevents a user from being able to consistently access or modify their home directory across systems.
  • chown wars: It's very common for a user to submit a ticket requesting that their home directory permissions be fixed on a specific system. Fixing this problem on host2 breaks the permissions on host1. It can sometimes take several of these tickets to be worked before someone steps back and realizes that a tug of war is in play. The only solution is to fix the disagreeing ID mappings. Which leads to...
  • UID/GID rebalancing hell: The complexity of correcting IDs later increases exponentially by the number of remappings involved to correct a single user across multiple machines. (user1 has the ID of user2, but user2 has the ID of user17...and that's just the first system in the cluster) The longer you wait to fix the problem, the more complex these chains can become, often requiring the downtime of applications on multiple servers in order to get things properly in synch.
  • Security problems: user2 on host2 has the same UID as user1 on host1, allowing them to write to /home/user1 on host2 without the knowledge of user1. These changes are then evaluated on host1 with the permissions of user1. What could possibly go wrong? (if user1 is an app user, someone in dev will discover it's writable and will make changes. this is a time proven fact.)

There are other scenarios, and these are just examples of the most common ones.

names aren't always an option

Any scripts or config files written against numeric IDs become inherently unportable within your environment. Generally not a problem because most people don't hardcode these unless they're absolutely required to...but sometimes the tool you're working with doesn't give you a choice in the matter. In these scenarios, you're forced to maintain n different versions of the script or configuration file.

Example: pam_succeed_if allows you to use fields of user, uid, and gid...a "group" option is conspicuously absent. If you were put in a position where multiple systems were expected to implement some form of group-based access restriction, you'd have n different variations of the PAM configs. (or at least a single GID that you have to avoid collisions on)

centralized management

natxo's answer has this covered pretty well.

Solution 2:

once you reach a certain size (and it is always sooner than you think) you will realize that changing your passwords or disabling accounts for someone on all the hosts is a PITA. That's why people use systems with LDAP databases (or NIS but don't do that, not safe nowadays) like openldap or nowadays the excellent freeipa.

You maintain all accounts/groups info in a central database, all hosts share that information. You can do many more things from there: use the users info for file permissions, of course, but also create virtual users for all the applications that have ldap bindings instead of having to create your users there as well (lots of web applications can use ldap for their user database), maintain a central sudo rules database, distribute your autofs environment, keep your dns zones, ...