2010-08-10

What's all the fuzz about canonical-census?

I know I have not updated this blog in quite a long time now, but something caught my attention today: canonical-census.

As slashdot.org reports Canonical begins with tracking their (OEM) installations. Now it's obvious that people are uncomfortable with a program running on their system which phones back to their OS vendor, that's why I have had a quick look at what exactly canonical-census does.

Firstly however, I would like to point out that the report on slashdot.org is very clear about which information is being gathered, being "the number of times this system previously sent to Canonical [...], the Ubuntu distributor channel, the product name as acquired by the system's DMI information, and which Ubuntu release is being used". And it's perfectly correct. After getting the canonical-census Debian source package (using dget -u https://launchpad.net/ubuntu/+archive/partner/+files/canonical-census_0.1.dsc) the source package shows, besides the Debian packaging information, two scripts:

  • census (written in Python) and
  • send-census (a GNU bash script).
Now what do those scripts actually do?

send-census is installed in /etc/cron.daily, which means it will be executed once a day by the system's cron daemon. It's a mere 48 lines long, and its code is quite simple. So everyone with at least some shell scripting experience can easily check what it's doing. Now guess what, it sends exactly the information as reported on slashdot to Canonical. Nothing more and nothing less.

Technically it keeps a plain text file containing a single number as its call-counter, residing in /var/lib/send-install-count/counter and uses an on my Ubuntu Lucid system nonexistent /var/lib/ubuntu_dist_channel file for getting information about the distribution channel.
The above mentioned "system's DMI information" is not the whole bunch of DMI information available, but only the contents of /sys/class/dmi/id/product_name, which strangely enough returns "System Product Name" on my machine. Last but not least it uses lsb-release to get the distribution release (ie. 10.04 for my system).

Now those four pieces of information are sent to http://census.canonical.com/submit via a simple HTTP GET query, using wget. The full URL with all the parameters added is:
http://census.canonical.com/submit?count=count&dcd=dist_channel&product=dmi_product_name&release=ubuntu_release_version

The second script, census, is the part working on Canonical's script. Basically census reads in their Apache's access log file and creates an SQLite database from the contents of the log file. With 391 lines this script is a bit longer, but it does not end up in the Debian package at all.

Personally I do not see how Canonical or one of their partners could possibly do anything harmful with that information. Comparing this to Debian's popcon reveals that Debian is gathering a lot more information.

Now there are two more things one should consider: census is targeted at OEMs, which means its unlikely that it will end up on each and every Ubuntu installation and can be uninstalled by removing the canonical-census package with your favorite package manager.

Finally, think about this for a second: It's a shell script you can always examine. There is no hidden magic and it's a plain HTTP request the script is sending. No evil things happening there.
And now compare that to what other (often proprietary) software vendors do and how much data they submit, possibly even in encrypted form so you do not know for sure what is being sent to them.

Personally I welcome the openness of Canonical with providing their users with the package's code this early and being straight about what information it submits. They could have silently added it to those installations after all...

Happy hacking!

7 comments:

  1. Stuff like this needs to be opt-in. As much as it sucks for Canonical. They could easily log times and counts and IPs and build a movement profile for my machine (me).

    ReplyDelete
  2. Well, nothing has been said whether this is going to be opt-in only or not yet. So it's possible they are targeting such an integration anyways.

    As for the movement profile I am not entirely sure. If enough systems are sold I doubt creating a movement profile using log times, counts and IP addresses (and the DMI product name) is actually possible. What they seem to process though is your location, in terms of using GeoIP to build a map of where the systems are distributed. I have only had a quick look at the log parsing code, but they are importing the GeoIP module there.

    Also, what I wanted to get straight using this article was what information gets submitted and that it's better to have this system in the open than having a proprietary one that uses encrypted channels to submit information to the vendor.

    Last but not least: they could also gather times, IPs and counts from apt mirrors, if they wanted to.

    ReplyDelete
  3. Ditto the opt-in approach. This could be a simple popup, "Do you want to send version and model information to Canonical?"... but it MUST be opt-in, or security experts will cry foul.

    Thou shalt not send any information from my pc, to anyone, at any time, without my knowledge and consent.

    ReplyDelete
  4. I think that you are too naive about the intentions of bigger commercial structures like Canonical. They might have chosen an open source method at the moment, but in the future they might change that to an encrypted data schemes and a closed source method for their tracking
    intentions. They are slowly sneaking in. I was happy to dump Ubuntu recently for other reasons, and after learning about Canonical`s future intentions I am even more delighted about my past move.

    Btw it should be enough to provide an email address and a name for posting comments. At the moment your comment script looks for certain profiles from Aim , Google or some others. Why would I need to login a service to post comments on a page that talks about tracking user data?

    ReplyDelete
  5. Thanks for your comments.

    @Brett:

    Yes, an opt-in would be great, especially if this should ever be added to the non-OEM distribution of Ubuntu. As for the OEM versions I am sure the OEMs are at least in charge of informing their users that the system is going to report back and/or have an opt-in method.

    @kkkkkk:

    The point that I was trying to make is that the way canonical-census works right now is not bad. We know which kind of information is being transferred, we know that it is being transferred using an unencrypted transport and we know how to get rid of it. I strongly object any way of transferring such data in a for us users nontransparent way, such as in encrypted form or from binary programs we cannot have a look at. If at any point in time Canonical would opt for deploying such a program you can rest assured that I will also cry out loud.

    As for requiring a login of some kind of posting comments:0

    I have been hit with a lot of spam in the past and thus have decided to turn on user authentication (OpenID is also available, which should give you a better chance at protecting your information, given your OpenID provider asks you which information to submit upon logging in).

    I later turned on comment moderation too, but disabling the authentication option was not possible due to the lack of proper spam detection methods available via blogspot.com.

    I just checked (after reading your comment) and it seems like spam detection has finally been added, so you will be able to provide anonymous comments in the future. However, those comments will still go through moderation until I can verify that the spam detection method works properly.

    Thanks for point this out though.

    ReplyDelete
  6. @sp

    It is all dandy and good and I am glad that they are open about their scripts, but the thing is that that does not mean that they will stay this way forever. Extrapolating the future newer works this way. The second problem is that not many users are developers or command line savy. How many of those Ubuntu users will really be happy that the code is open and ready to dig in and hack those scripts?


    thanks for the comments fix btw. Word.

    kkkkkk

    ReplyDelete
  7. Well, there are people like you and me who actually do have a look at those scripts, so users who are not tech-savvy can still read our blogs, emails and so on. However, they have to trust us to really believe that what we are saying is true. It is the same issue as with an OEM, they have to trust the OEM too to be sure they are not doing something evil with their data.

    As I said, the second the script changes and either encrypts data somehow or is not open to the public to read anymore it's time to have a look at it again. I totally agree on that fact that just because those scripts are open and not doing something evil right now doesn't mean they will not do so in the future. It's really something we, the tech-savvy community, must watch.

    ReplyDelete