Why doesn't Canonical use pings from the Update Manager to gauge the size of the Ubuntu installbase?
I've heard from many people, Jono Bacon included, that it's impossible to say how many computers are running Ubuntu since there is nothing in the OS that phones home, and the Ubuntu Census package is only installed on OEM machines sold with Ubuntu. This 'phone home' count is important since the marketshare of an OS is based on the number of units sold which is rather disingenuous when you consider that many, if not most, Ubuntu installations are running on computers that were purchased running Windows, or even Mac OS.
Doesn't the update manager phone home when it's figuring out if there's anything to download?
There are several issues with this level of tracking:
-
As others have said in this thread, most people use the most geographically "relevant" mirror possible as it speeds things up for them. These mirrors are mostly not under the control of Canonical.
Even if Canonical could grab the remote logs, there'd be international privacy laws to obey which would end up with users perhaps having to agree to multiple agreements depending on the jurisdiction of a chosen server.
Some people don't use public mirrors at all. If you're using Ubuntu in a farm or corporate scenario over dozens or hundreds of machines (anything over three machines would make sense IMO), people proxy the repositories so only the first hit would count.
Some people could just beat on the refresh link all day. Not only would that skquiff the statistics but it would also gum the servers up more than they needed to be.
-
Sometimes it's better not to know.
It sounds stupid but one of my jobs is maintaining a webapp for a company that several large businesses use to train their employees. I've talked about adding extra data collection so we have a better idea what a user is doing as that would be handy for improvement but if we collect that data (and advertise that fact, as we're required to by UK law), our clients will expect to know the results of our collection.
That's fine if the data shows strong growth (or lots of their employees using the webapp in my case) but if it doesn't, it can really undermine marketing efforts. Not all data can be spun into something positive.
Because the data would not be a full representation of all users, any statistics drawn would be lower than actual values and Microsoft (et al) could very quickly bomb them with a few sales stats.
There is a package for OEM installs called canonical-census
but as I've already detailed this only works for OEM installs. Read the link to see how it works but I will say it's slightly better than repo-logging.
I guess one question to ask you back would be: Why do Canonical need these numbers? Even if they were great results, the problem still exists that there just isn't that much money to market Ubuntu. And if they weren't as stellar as needed for certain advertising claims (or weren't released for that reason), the collection would certainly undermine the project.
Not all servers are maintained by Canonical. There are many mirrors of the Ubuntu repositories, and which is used varies by geographic location and personal choice. Canonical does not have access to statistics from most of these servers.
Even if they did have information on the number of update checks, there is currently no unique identifier associated with updates that could be used to distinguish between multiple users and a single user checking multiple times. Having such an identifier could pose a privacy risk.
As far as I know, though I can't find the source, it is actually used. There was some report about a year ago that had information on the number of unique hits on archive servers. The issue arises when people use mirrors, which many do. If someone is using a mirror outside of Canonical's control, then Canonical doesn't have logs of any updates. That's why there are ballpark estimates, but never anything concrete.
When using Windows, OS updates come from Microsoft's servers. Additionally, they have sales data. Both can be used to have a closer estimate of usage. Ubuntu doesn't require central update servers and is obviously not sold, so usage estimates are a lot harder to deduce.