There have been a few articles recently trying to estimate the number of Linux users, which is apparently a challenging problem. However I have to wonder why it can’t be figured out at least at the distro level by simply storing hashes of IP addresses that hit Canonical’s update site, and looking at the number of unique ones each week/month.
There are going to be people using mirrors, but this is a small percent to lose to at least get something in the right magnitude, and the most popular mirrors could probably do a similar thing and contribute their numbers anyway. The only other main drawback would be multiple Ubuntu machines under the same IP, which again seems like it would only result in a slight inaccuracy. You’d also lose a small percent to users infrequently using their computers such that they aren’t updated on a monthly basis, but yearly results would pull back in any of these people using their computers frequently enough to warrant counting.
Alternatively, as others have suggested as well, if Google would just release their numbers for browsers hitting google.com, we’d probably have a solid idea as well.
Are there already accurate numbers for Ubuntu and if not, am I missing something with my proposal?
UPDATE: Jef pointed out that Fedora is already doing this at http://fedoraproject.org/wiki/Statistics#Yum_Data, which is pretty awesome! That shows about 14 million unique repository connections, so making a VERY rough, not remotely scientific estimate, we could use distrowatch to estimate that Ubuntu has 1.68 times the number of users as Fedora, and get something around the order of 24 million users that have connected.
There are going to be people using mirrors, but this is a small percent to lose to at least get something in the right magnitude, and the most popular mirrors could probably do a similar thing and contribute their numbers anyway. The only other main drawback would be multiple Ubuntu machines under the same IP, which again seems like it would only result in a slight inaccuracy. You’d also lose a small percent to users infrequently using their computers such that they aren’t updated on a monthly basis, but yearly results would pull back in any of these people using their computers frequently enough to warrant counting.
Alternatively, as others have suggested as well, if Google would just release their numbers for browsers hitting google.com, we’d probably have a solid idea as well.
Are there already accurate numbers for Ubuntu and if not, am I missing something with my proposal?
UPDATE: Jef pointed out that Fedora is already doing this at http://fedoraproject.org/wiki/Statistics#Yum_Data, which is pretty awesome! That shows about 14 million unique repository connections, so making a VERY rough, not remotely scientific estimate, we could use distrowatch to estimate that Ubuntu has 1.68 times the number of users as Fedora, and get something around the order of 24 million users that have connected.
And I would like to see too that MAC address is hashed to upgrades, but Fedora is already using the smolt what generates unique ID of the system specs and sends it to Fedora. Sadly (good thing) it is not default so not all enable it. But by those figures says that Fedora has more installs than Ubuntu. Sorry about that, you ain’t special, only gaining the media attention ;-)
I dont think that market share figures means anything for us. (OK, mayby some like to extend their virtual penis).
We all - what ever distribution we use - are using same OS. It does not matter what is your opinion about system, packagemanager, brand or even freedom. All we use Linux OS to power all the other software. Without Linux, we would not be here. So kudos to Linus Torvalds to code the OS in first place and kudos for Richard Stallman to start GNU project what gave us GPL and so on free software and we got the Linux OS licended under it (GPLv2).
I see more other distributions than Ubuntu in Finland, (the homeland of the Linux OS). Same thing around on the European or Asia where I have traveled.
Ubuntu has smaller share on technical oriented users (science labs, computer stores, universities etc) but that does not matter at all. We are all using same OS, the Linux (kernel)!
Now, it's 2 years out of date, but back then Fedora had 6% and Ubuntu 30%; if that's still the case today this would put Ubuntu at 5 times Fedora rather than 1.68 times Fedora.
Another metric might be Google Trends: http://www.google.com/trends?q=fedora%2C+ubuntu – suggests that the Ubuntu:Fedora ratio of searches has been increasing; so either the ratio of usage has been changing or Ubuntu users are becoming even more likely to search for Ubuntu for some reason.
Now, Google Trends and Desktop Linux surveys can be inaccurate for a variety of reasons. I'll note, however, that the google trends data is consistent with the desktop Linux survey data; both imply roughly 5x - and both are doubtlessly better proxies for usage share than distro watch.
At no point did I say that all “fun” things can’t be “meaningful.”
What I said was very specific. The distrowatch metric is not meaningful. The google trends metric is not meaningful. I making absolutely no claim about the meaningfulness of any other “fun” activity. I will say that making global maps of client connections to MirrorManager using GeoIP is both “fun for me” and “meaningiful” as it gives Fedora an easy to understand snapshot of how globally used Fedora is.
You’ve extrapolated what i said and attempted to apply beyond the bounds of the original context. Is that “fun” for you as well..making gross generalization about what other people say? That’s neither friendly nor healthy. You want to keep this friendly..you want to keep this constructive? Then take more care and rein in your tendency to generalize.
I find it really amazing that you can so easily discount accurate ShipIt numbers as a useful rough metric and yet… you reached for distrowatch as a scaling metric. Stop putting the cart before the horse. Make accuracy the primary importance.. then worry about interpretation. Don’t waste your time trying to interpret the meaning of numbers that aren’t even accurate measure.
If LoCos have hundreds of cds collecting dust every release…that’s also something you could get accurate stats on…you just have to survey LoCos and ask them. if they are requesting CDs and not giving them out..that is a drain on Canonical resources. It benefits everyone by making sure that’s not happening too much.
-jef
soren, it sounds like you are right, sorry :) The more I look into this, the more I see that mirrors are more common than I thought. Thanks for enlightening me and sharing your knowledge! Though, I still think it would be feasible if enough mirrors participated. Combined with hashed MAC addresses, it has a decent accuracy potential.
Jef, ShipIt is certainly countable but I am not convinced how useful it would be. Surely some people order CDs and never use them, and someone in a shop might order one and install it a hundred times. And each LoCo could have hundreds on hand that never get used. Also, I’m not sure I could agree that something which is fun can have no meaning; fun IS meaning!
Lots of knowledgeable people have shared great stuff here, so that’s awesome! Let’s just try to keep it friendly and healthy :)
Canonical could tell us tomorrow exactly the number of ShipIt disks they have paid for AND the number of disks purchased directly for the Canonical shop. Have they ever done that? Have they ever put any hard numbers out with regard to how active ShipIt is? I haven’t found them. If they haven’t that’s a pretty remarkable lack of transparency.
How about you press your leadership to publish the no-guesswork numbers associated with the amount of media sent via ShipIt for Intrepid on a monthly basis since the release of Intrepid. What is it maybe 1% of the total number of Ubuntu users Canonical employees have publicly claimed exist?
You want to haggle over a statistic that below the noise floor of any overall estimate fine..go right ahead…noise seems to be pretty important for Ubuntu supporters…much more than accuracy.
Jef, you’re missing the fact that Ubuntu is freely available from ShipIt, no matter where in the world you are, and no matter what sort of network connectivity you have (as long as you can actually get to ShipIt, of course). Hence, users who can only get Linux by these means will never be counted by a service such as MirrorManager. Since a similar service to ShipIt does not (to my knowledge) exist for any other Linux distro, number from MirrorManager-like services will be biased in favour of non-Ubuntu distros. Let’s face it: There’s no way to get hard numbers for number of Linux installs.
I’m sure there’s also organisations that for whatever reason do not want to publish their use of Linux at all, even to an (alleged) anonymous service like MirrorManager. I don’t say “alleged” because I don’t believe it to be anonymous, but because there’s no way for me to know whether that’s the case or not, and for some organisations, that’s simply not good enough.
Do you want solid numbers or not for total linux usage? If you do, then don’t publish goofball numbers yourself.
distrowatch and google trends…while “fun” to look at..have no meaning..have no value..in any well understood sense. You might as well just generate random numbers between 1 and 10 million for all distros and call them a rough estimate with a +/- 9 million errorbar on all the numbers.
MirrorManager and the statistics it generates is a methodological approach that everyone can use..we could get solid consistent numbers across pretty much all linux distributions if they adopted the MirrorManager approach to handing out mirror information to clients dynamically. There’s real value for everyone in this tech. Users, network admins, and distributors. We don’t have to rely on CEO’s making up deployment numbers in press interviews.
-jef
pfrields, thanks for leaving a comment! It is good to hear that you guys are somewhat serious about metrics like this, and are certainly quite ahead of Ubuntu (as far as I can tell) in terms of collecting the data and being open about it. Like Jef said, it is certainly an aspect that is missing from Canonical/Ubuntu which is unfortunate considering their other marketing efforts.
Anyway I definitely appreciate some Fedora folks chiming in here and didn’t mean to offend anyone with my wild extrapolations.
Xandros is ranked pretty low.. and yet it has a significant number of pre-installs via being the linux Asus uses on its EEE netbooks…for like what is it now 2 whole years. In fact from netbook sales estimates Xandros is crushing Ubuntu netbook pre-installs.
In no way whatsoever can you reasonably argue that the distrowatch metric correctly places Xandros compared to the 30 or so other distros in front of it. No way.
The distrowatch scaling metric does not stand up to scrutiny.
To understand how to use the distrowatch metric you have to understand why people are going to distrowatch. You also have to understand that distrowatch’s own ranking system has a nonlinear affect on the ranking. Higher ranked distros are going to get more interests from new distrowatch visitors..because they are highly ranked. It’s a feedback loop in the methodology. And it makes for an absolutely crap metric of anything at all.
If you are serious about this you need to find a metric that actually measures what you are interested in.
What you need to do is demand Shuttleworth or any other Canonical employee who has so far been quoting userbase numbers in the press for the last 3 years..that they actually describe how they get those numbers.
http://www.theregister.co.uk/2008/10/27/shuttleworth_ubuntu_commitment/
“Precise Ubuntu installed base numbers are impossible to obtain, but Shuttleworth said the most recent estimate is about 8 million users for the Linux variant. Ubuntu does not have any call-home features to help Canonical count installations. That’s because Shuttleworth does not want to violate users’ privacy or put up any barriers to adoption for the software. “We actually have no idea,” Shuttleworth admitted.”
Numbers have contexts… methodology has meaning. You can’t just make up numbers and scaling factors just because they seem to fit the argument you are making. You have to test them for sanity. The distrowatch scaling factor is not a sane metric.
Nevertheless, the count I just found from the already collected IP lists was over 12.5 million totally unique IP addresses, out of around 14 million current IP addresses found through simple summing. Obviously this count doesn’t include Fedora derivatives such as CentOS, Scientific Linux, Red Hat Enterprise Linux, and so on.
PriceChild, using the security repo seems like a good idea, although it appears that for me I am hitting my mirror for security too. Some users/admins probably disable everything but that anyway.
Jef, that is awesome, thanks for the resource! Do you know if the total across versions is unique across them? That is, if I have Fedora 9 installed and then upgrade to Fedora 10, do I get counted twice? That seems like a common place where people are going to get double counted over the course of a year.
pochu, hopefully getting the main and US repository picks up a large percentage, and making it easy for the other mirrors to participate would make it fairly accurate.
Fedora already has a way to estimate users via IP. We have the dynamic MirrorManager service..and it has logs…
http://fedoraproject.org/wiki/Statistics#Yum_Data
People have put a lot of thought in to what is actually achievable and what is not with regard to Fedora metrics:
http://fedoraproject.org/wiki/Infrastructure/Metrics
There’s absolutely no reason Canonical couldn’t take the MirrorManager codebase and adapt it for Ubuntu’s needs. Unlike Canonical, which spends a lot of time building closed web services codebases they are reluctant to share. All of Fedora infrastructure is done in the open..including the MirrorManager service.
MirrorManager is important enough to talk more about. Every Fedora client by default contacts the MirrorManager service asking for which mirrors to use. The MirrorManager service even lets admins on large private networks redirect fedora clients in their network block to a local private mirror..without client reconfiguration. We still count those clients because they contact MirrorManager instead of having to be manually reconfigured to point to the local mirror. Our MirrorManager service is a benefit to both the user and the local network admin who is trying to conserve bandwidth….and its enabled by default.
-jef
Stefano, I agree, but again this number would be better than throwing around download numbers and random guesses. And conversely to your point, as Drew mentions, there are going to be users on netbooks/laptops updating from many different IPs which will partially offset that issue.
counting unique IPs is the first whing that can be done, but clearly this is not accurate due to NAT.
Behind a single IP you might have N users, and you don’t know how many they are.