2012-06-05

git smart protocol via WebSockets - proof of concept

Yesterday an idea came to my mind: let's try running git's smart transport protocol via a WebSocket. In a few hours of work I came up with a solution which works.

But why would one want to do that? Basically the only options for running git's smart protocol you have right now is either using git's own protocol or tunneling it via ssh. The first option leaves you without any ways of authentication - so it's only usable for read-only access to public repositories. The second option involves using an ssh server, which then allows read-write access and authentication, but is quite some work to set up.
As I am working on a university assignment which involves using WebSockets right now it occurred to me that there is no reason for not using WebSockets for this.

The main idea is providing a tunnel, just like the ssh transport does, but this time via a WebSocket. The logic is the same and there is no modification to git itself required.
For now I have only implemented a proof of concept which allows you to update your repository from a remote system, but the approach should work perfectly well for pushing your changes to a remote repository too.

Let's have a look at how this works.
On the local system git-fetch-pack is invoked, which talks to a git-upload-pack process on the remote end. The code I wrote provides a script which acts like an ssh client, but creates a WebSocket connection to the remote end, using Python and the websocket-client Python package. On the other side of the tunnel a simple Python WSGI application, which uses gevent-websocket, provides the server-side implementation.
Now when a WebSocket connection is established the server spawns a git-upload-pack process and redirects its stdout to the WebSocket. Data which is received over the WebSocket is sent to the git-upload-pack's stdin file descriptor.
On the client this logic is reversed, redirecting its stdout to the WebSocket and sending data received over the WebSocket to its stdin file descriptor.

That's about it. Keep in mind this is a proof-of-concept, so there may be rough edges here and there and both stability and performance may be "sub-optimal".
I'd also like to point out that using WebSockets and HTTP as the underlying transport protocol gives one the opportunity to use standard HTTP(s) authentication mechanisms. This means that the WebSocket approach could be useful to git hosting sites, basically removing the need for running an ssh server.

You can find the Python code over at https://github.com/speijnik/gitws. Have fun giving it a try.