git smart protocol via WebSockets - proof of concept

Yesterday an idea came to my mind: let's try running git's smart transport protocol via a WebSocket. In a few hours of work I came up with a solution which works.

But why would one want to do that? Basically the only options for running git's smart protocol you have right now is either using git's own protocol or tunneling it via ssh. The first option leaves you without any ways of authentication - so it's only usable for read-only access to public repositories. The second option involves using an ssh server, which then allows read-write access and authentication, but is quite some work to set up.
As I am working on a university assignment which involves using WebSockets right now it occurred to me that there is no reason for not using WebSockets for this.

The main idea is providing a tunnel, just like the ssh transport does, but this time via a WebSocket. The logic is the same and there is no modification to git itself required.
For now I have only implemented a proof of concept which allows you to update your repository from a remote system, but the approach should work perfectly well for pushing your changes to a remote repository too.

Let's have a look at how this works.
On the local system git-fetch-pack is invoked, which talks to a git-upload-pack process on the remote end. The code I wrote provides a script which acts like an ssh client, but creates a WebSocket connection to the remote end, using Python and the websocket-client Python package. On the other side of the tunnel a simple Python WSGI application, which uses gevent-websocket, provides the server-side implementation.
Now when a WebSocket connection is established the server spawns a git-upload-pack process and redirects its stdout to the WebSocket. Data which is received over the WebSocket is sent to the git-upload-pack's stdin file descriptor.
On the client this logic is reversed, redirecting its stdout to the WebSocket and sending data received over the WebSocket to its stdin file descriptor.

That's about it. Keep in mind this is a proof-of-concept, so there may be rough edges here and there and both stability and performance may be "sub-optimal".
I'd also like to point out that using WebSockets and HTTP as the underlying transport protocol gives one the opportunity to use standard HTTP(s) authentication mechanisms. This means that the WebSocket approach could be useful to git hosting sites, basically removing the need for running an ssh server.

You can find the Python code over at https://github.com/speijnik/gitws. Have fun giving it a try.


ptrace-based security just does not work

In 2009 I wrote about building a ptrace-based sandboxing system named "ujail", including some basic proof of concepts.

I have been thinking about this idea for a long time now, but sadly did not have the time to implement it - until now.
Right now I am working on this idea again and whilst doing some research I came across a thread on the linux-kernel mailing list.
At first a problem with 64-bit binaries trapping into 32-bit syscall handling code via int 80 got me there. While this is awkward and keeps one from implementing a sandbox in userspace (due to not being able to access TS_COMPAT, as described in the thread) it led me to something else - a more severe problem.
Unfortunately I cannot remember who wrote this and am unable to recover the actual mail (if someone finds it I would be happy if you notified me), but someone mentioned race conditions when using ptrace as a security measure.

In short I came up with a proof of concept which works around possible limitations imposed by a ptrace-based security mechanism. For those in a hurry: you can find the code of the proof of concept at github.
In the following parts of this article I would like to elaborate on the problem and how the proof of concept code exploits it.


How to force a local DNS resolver to be used using resolvconf

I know it has been a while, but after reading a blog post by Anand Kumria over at planet.debian.org I decided to have a quick look at one of the problems he described.

Basically, Anand wants to force the local resolver to be used for each and every network connection, may that connection be established manually or via NetworkManager. He wrote that fixing this configuration for every new connection manually is tedious, and I fully agree on that. So here is a solution to do this all automatically, using resolvconf:

After installing the resolvconf package every time /etc/resolv.conf is to be updated resolvconf takes care of that. Using the files in /etc/resolvconf this process can be controlled and the resulting file modified to fit one own's needs.

So at first we would like the local resolver to be used for every connection. This works by simply adding the "nameserver" directive to the /etc/resolvconf/resolv.conf.d/head file. Simple as that. Every time /etc/resolv.conf gets generated the contents of the head file are actually used as /etc/resolv.conf's header.

Using this method the local resolver is used for every connection. But Anand wanted to use only the local resolver and discard any resolvers possibly obtained via DHCP for example. Guess what, this is also possible using resolvconf.

Adding TRUNCATE_NAMESERVER_LIST_AFTER_127="yes" to /etc/default/resolvconf does exactly that. Now every nameserver directive after the one is ignored and will not make it into /etc/resolv.conf. You can of course add more nameservers to the head file above the directive.

Problem fixed I guess.
Don't forget to re-connect to the network or manually force re-creation of /etc/resolv.conf so the changes you made get populated. I really hope this is of use to some of you facing similar problems.


ISC dhcpd and IP assignments from a pool to specific hosts only

Assigning an IP address statically to a host with a given MAC address using ISC dhcpd is quite trivial, one host entry, a hardware ethernet entry and a fixed-address entry and you are up and running.
But what if you want to assign IP addresses from a pool to only a few hosts with specific MAC addresses?

Before you ask yourself why someone might want to do that, have a look at my (very real) use-case.
I am currently working on setting up an installation server for my employer, ANEXIA Internetdienstleistungs GmbH. The server itself uses PXE, TFTP and FAI for installing systems. To be able to do PXE booting one has to set up an DHCP server to provide configuration details, like the TFTP Server Address and the boot filename.

Now what one should consider is that this system is designed to provide automatic installations for internet-facing hosts, namely ones in public IP networks. Running a DHCP server in such a network is not a good idea. We neither want to dish out configurations to each and every hosts that asks for them, neither do not want to do a PXE boot each and every time one of our systems is restarted. Now the combination of FAI and pxelinux allows for default configurations which force local booting, but this still causes the (re-)boot time for those systems to increase and potentially also increases the load on the TFTP server. Also, let's not even consider thinking about whether this setup is "clean" or not. I personally believe that dishing out IP addresses in a public IP network is a bad thing(tm) and I guess a lot of people will be nodding when reading these lines.

What I was asking myself is how to get something like that set up in a cleaner way, and guess what, I found a solution.
The basic idea behind this is only providing IP configuration via DHCP to a specific set of hosts (with a specific set of MAC addresses) and not providing any information to all other hosts. The specific set of hosts are those that we want to do an install run on. This is a no-brainer and I guess the right way to do that, but implementing this approach is not as straight-forward as I initially thought.

Actually the implementation of that idea caused me a bit of a headache and cost me a few work-hours to get right, that's why I'd like to share the configuration details with you.


What's all the fuzz about canonical-census?

I know I have not updated this blog in quite a long time now, but something caught my attention today: canonical-census.

As slashdot.org reports Canonical begins with tracking their (OEM) installations. Now it's obvious that people are uncomfortable with a program running on their system which phones back to their OS vendor, that's why I have had a quick look at what exactly canonical-census does.

Firstly however, I would like to point out that the report on slashdot.org is very clear about which information is being gathered, being "the number of times this system previously sent to Canonical [...], the Ubuntu distributor channel, the product name as acquired by the system's DMI information, and which Ubuntu release is being used". And it's perfectly correct. After getting the canonical-census Debian source package (using dget -u https://launchpad.net/ubuntu/+archive/partner/+files/canonical-census_0.1.dsc) the source package shows, besides the Debian packaging information, two scripts:

  • census (written in Python) and
  • send-census (a GNU bash script).
Now what do those scripts actually do?

send-census is installed in /etc/cron.daily, which means it will be executed once a day by the system's cron daemon. It's a mere 48 lines long, and its code is quite simple. So everyone with at least some shell scripting experience can easily check what it's doing. Now guess what, it sends exactly the information as reported on slashdot to Canonical. Nothing more and nothing less.

Technically it keeps a plain text file containing a single number as its call-counter, residing in /var/lib/send-install-count/counter and uses an on my Ubuntu Lucid system nonexistent /var/lib/ubuntu_dist_channel file for getting information about the distribution channel.
The above mentioned "system's DMI information" is not the whole bunch of DMI information available, but only the contents of /sys/class/dmi/id/product_name, which strangely enough returns "System Product Name" on my machine. Last but not least it uses lsb-release to get the distribution release (ie. 10.04 for my system).

Now those four pieces of information are sent to http://census.canonical.com/submit via a simple HTTP GET query, using wget. The full URL with all the parameters added is:

The second script, census, is the part working on Canonical's script. Basically census reads in their Apache's access log file and creates an SQLite database from the contents of the log file. With 391 lines this script is a bit longer, but it does not end up in the Debian package at all.

Personally I do not see how Canonical or one of their partners could possibly do anything harmful with that information. Comparing this to Debian's popcon reveals that Debian is gathering a lot more information.

Now there are two more things one should consider: census is targeted at OEMs, which means its unlikely that it will end up on each and every Ubuntu installation and can be uninstalled by removing the canonical-census package with your favorite package manager.

Finally, think about this for a second: It's a shell script you can always examine. There is no hidden magic and it's a plain HTTP request the script is sending. No evil things happening there.
And now compare that to what other (often proprietary) software vendors do and how much data they submit, possibly even in encrypted form so you do not know for sure what is being sent to them.

Personally I welcome the openness of Canonical with providing their users with the package's code this early and being straight about what information it submits. They could have silently added it to those installations after all...

Happy hacking!


Rest in peace Flo: 13.11.1986–16.12.2009

Today is a sad day. Everything feels like I am having a bad nightmare. That's because today I learnt from the too early death of my friend Florian Hufksy.

I am sitting here and do not really know what to write. I keep on thinking about the great times we spent together. The time we started programming when we were twelve. The time we spent learning BASIC. All the times you knew more than me and could teach me a thing or two. I remember our geek talks. How we would discuss latest games. How we lost contact and how we met again. I am thinking about how sorry I am for not having met you often enough. I keep on trying to understand what drove you that far. How you could just end it all. More and more memories come to my mind, like the moment when you showed me one of your projects, Super Mario War. The moments we had playing video games together. All those moments, all that time, I miss you my friend. You were a genius, always a step ahead, not only of me, but seemingly the whole world. I can't stop thinking about your brilliant ideas and how you always finished your projects. You were a real hacker, a real genius, a person trying to make the world a better place, a person who will be missed, not only by me.

You were a genius and I always respected you, not only as a hacker, but as a beloved friend. Why did we not spend more time together? Why did you have to go? Why do I have to write this now, sitting here in my chair with tears in my eyes? And all those memories come up again and again. There is so much more that comes to my mind, but I can't keep on writing, it just hurts too much.

The world is a sad place today. I am sad. I am mourning the too early death of my beloved friend, Florian. You will always have a special place in my heart.


ujail: use cases, FAQs, part 1 & proof of concept, part 2

As I ran out of time whilst writing the "introducing ujail" post on monday I would like to further elaborate on the idea, giving you some examples of possible use cases and then having a look at FAQs regarding ujail. Additionally I have created a second proof of concept that should be a lot faster, see below for more details.

Use cases of ujail

Monday's post was rather technical, so let's have a look at possible use cases today.

The main reason for both having the idea of ujail and starting working on it is my web server. I am running quite a few (S)CGI scripts there and, even though running them as different users, on a per-vhost basis, I have the impression of the whole thing being a bit insecure.

Okay, PHP does provide its famous open_basedir feature, but I am also running some Python applications which I simply cannot restrict easily. My first ideas involved adding something similar to open_basedir to Python, followed by the idea of replacing some C library functions, like fopen and friends on startup time.

Whilst the adding open_basedir to Python would have involved changing a lot of Python's internals I soon discarded the library patching idea as those could be worked around by injected code directly invoking syscalls. It didn't take long for me to notice that I have to dig deeper. The idea of ujail was born and after coming up with the proof of concept this seems to be a viable solution.

Now ujail is not only about protecting a web server from its web applications, but could do a lot more, for example:

  • Creating a sandbox for untrusted code (socket&file i/o emulation)
  • Implementing some sort of personal firewall (socket-call only emulation)
  • Testing applications that perform low-level system operations (read: package managers and friends, filesystem emulation)
 I am sure you can come up with even more use-cases. What should be noted is that emulating a system call does not mean that one necessarily needs to emulate the whole filesystem. What can be done, for example, is patching through access to common files (libraries, executables, etc.) whilst maintaining a virtual filesystem for data that will eventually be modified. A copy-on-write approach is possible too, for example. There are multiple methods with which the multiple filesystem could be implemented, the most common would probably be using a state directory.


There have been some questions about ujail in comments to my first post which I would like to answer. Also, I have been thinking about things that are different about ujail compared to other virtualization techniques. Feel free to add additional questions either in a comment or drop me an email: debian at sp dot or dot at.

  1. Could you change the license of ujail to ... ?

    Not likely to happen. The proof of concept's license is GPLv3 and the actual code's license will be too. However, ujail is a userspace application that does not need any modifications to the kernel so there should be no problems with porting ujail from GNU/Linux to any other system.

  2. Does ujail work on operating systems other than GNU/Linux?

    Not yet. If it's technically possible to implement the technique on other operating systems I would be happy to accept patches.

  3. Do I need to patch my kernel for ujail to work?

    No, ujail is running in userspace. The only thing it needs is Linux with support for PTRACE_SYSEMU.

  4. How is this approach different from using LD_PRELOAD?

    With LD_PRELOAD one can replace library functions, but malicious code could still directly invoke syscalls, working around this protection completely. Also, statically linked binaries cannot be restricted with LD_PRELOAD.

  5. How is this approach different from user-mode-linux?

    User-mode-linux (UML) works by emulating a full kernel in userspace and allows you to virtualize a whole Linux instance (including a new init process, etc). ujail is about providing a way of restricting a single process (and its childs) inside a running system in terms of access to syscalls and the partial emulation of those.

  6. How is this approach different from linux-vserver?

    Linux-vserver is a kernel patch and runs in kernel space, as opposed to ujail, which works in userspace.
    Also, linux-vserver works similarly to user-mode-linux, providing a fully virtualized Linux instance.

  7. Does the account running ujail need any special privileges?

    No, the only restrictions that apply are those of ptrace.

  8. Where is the code?

    Right now ujail is in a planning phase, and only the proof of concept code has been written and published. The actual ujail code is yet to be written and the code will be hosted on launchpad.net.

Proof of concept, part 2

An anonymous person (who were you stranger?) added a comment to my first post, suggesting "Also, why patch the process rather than just modifying its state and trapping into the kernel?". I have had a look at this approach earlier, but it didn't work out. However, I decided to give it yet another try and created a second proof of concept. That code does not require patching any code, but only modifies the instruction pointer (eip) and the first register (eax). This should be a lot faster than patching the code.

Technically the new main loop works by calling PTRACE_SYSEMU and waiting for a notification. It then saves the instruction pointer and switches to PTRACE_SYSCALL. As before it waits for the emulated syscall to exit and at this point sets eax from orig_eax and decreases the value of the instruction pointer by the size of the "int $0x80" instruction. Another call to PTRACE_SYSCALL resumes the process. The next event is the process actually entering the real syscall and yet another one leaving the syscall again. These are resumed by PTRACE_SYSCALL and PTRACE_SYSEMU respectively. So, comparing this with the first approach we are only modifying two registers now, instead of writing to the TEXT area of the running process.

Thanks should go to the anonymous commenter for making me give this approach another try.

Questions? Criticism? More ideas? Want to contribute?

Coming to an end I would yet again like to let you know that I am  open for questions, criticism, more ideas and contributions in general. So if you are interested in this topic come join the discussion by either dropping me an email, writing a comment to this post or replying to this post on your own blog.