random things are here

last update:

My last post (over 2 years ago) was my initial tests of running my http_flood project under docker. At the time, there was a huge performance impact running through docker. Performance dropped by almost 90%.

Two years later, things are a bit different.

Bare metal:

duration=3.364873568s megabytes=10000.0 speed=2971.9MB/s

Inside docker:

duration=7.283130136s megabytes=10000.0 speed=1373.0MB/s

There is still some overhead, but that is still over 10 gigabit.

What is very interesting, is that running inside docker using –net=host, gives;

inside docker with net=host:

duration=3.329915691s megabytes=10000.0 speed=3003.1MB/s

How does this manage to be faster than the bare metal process? Most likely the google/golang container I am using is running a newer version of go that peforms better.

http_flood in docker

I built a (somewhat bloated) docker image for http_flood with a simple Dockerfile

root@dovm1:~# docker run -p 7070:7070 justinazoff/http-flood
Unable to find image 'justinazoff/http-flood' (tag: latest) locally
Pulling repository justinazoff/http-flood
0d71f044c41f: Download complete
511136ea3c5a: Download complete
adfd622eb223: Download complete
9a776d8a79dd: Download complete
1819c4b85615: Download complete
032f3d3190f6: Download complete
6c027abac266: Download complete
f05032d12482: Download complete
11a10f000aca: Download complete
01df4a932bd2: Download complete
2013/12/01 02:33:20 Listening on :7070

There seems to be a severe performance impact of running through docker though, running across the loopback interface I get 434.5MB/s, running through docker I get 48.6MB/s. Further testing locally is needed. Running bare-metal on my linux laptop it easily pushes over 2000MB/s

I spent a while the other day figuring out how to get websockets working on heroku, so I thought I’d write it up.

First, Heroku doesn’t actually support websockets, so you must use something like socket.io which can fallback to various long polling mechanisms.

Step 1, disable websocket support in socket.io

Without this, socket.io tries to connect first using websockets and it takes a while to timeout before switching to long polling.

// remove websocket for heroku
var options = {transports:["flashsocket", "htmlfile", "xhr-polling", "jsonp-polling"]};
var socket = io.connect('http://.../", options);

Step 2, configure tornadio to use xheaders

If you don’t tell tornadio to use xheaders it will think heroku is trying to hijack sessions and nothing will work. You will get 401 unauthorized messages back from tornado and the error from this statement in your logs:

# If IP address don't match - refuse connection
if handler.request.remote_ip != self.remote_ip:
    logging.error('Attempted to attach to session %s (%s) from different IP (%s)'   % (
                  self.session_id,
                  self.remote_ip,
                  handler.request.remote_ip
                  ))

Enabling xheaders is a good idea when deploying to heroku in general and is not tornadio specific.

Add the xheaders option to the main SocketServer initialization, and everything is happy.

SocketServer(application,xheaders=True)

TL;DR

Whatever you do, make sure you are using versioned python packages, even for simple tasks. And use pip+virtualenv.

So you want to program in python..

It seems like only yesterday, and not 7 years ago, that I decided to learn python. I may not be the best python programmer, but I have made probably every mistake you can, so here are a bunch of things not to do, and a few things you should be doing.

Don’t: write python ‘scripts’

Don’t write programs like this:

temp = input("C: ")
print temp*9/5+32

The way you fix that is not by writing the following:

if __name__ == "__main__":
    temp = input("C: ")
    print temp*9/5+32

And don’t write this either:

def main():
    temp = input("C: ")
    print temp*9/5+32
if __name__ == "__main__":
    main()

No matter how good your logic is, if you couple the logic with your input and output you are painting yourself into a corner. I’ve seen people write scripts like this, and then have other scripts call them using os.system. In a loop. Then they wonder why python is so slow.

Do: Write python modules and packages

Minimally this could look something like:

def ctof(temp):
    return temp*9/5+32
def main():
    temp = input("C: ")
    print ctof(temp)
if __name__ == "__main__":
    main()

Even better would be to have main parse sys.argv rather than working interactively. For simple interactive tools it is hard to beat the cmd module

Now you have a (albeit poorly named) python module that can properly be imported from a larger program:

>>> import temp
>>> print temp.ctof(100)
212

Don’t: mess with PYTHONPATH

Now that you have a module you can import, what do you do with it? For years my development/production environment consisted of the following: a lib directory containing modules and packages and a util directory containing scripts that used those modules. This worked fine for a long time, especially when I only had one machine. When I got more systems, I used the high tech method of rsync‘ing the entire directory tree to /srv/python or ~/python/ and mucking with the python path. This system worked, but had a number of problems:

  • If I wanted to run a program on a new system, I had to rsync the entire directory tree.
  • Since there was no dependency information, the first time I wanted to share a program I wrote, I had to figure out the dependencies manually.
  • I had no idea what modules were being used, and which were obsolete.
  • When I started writing test code and documentation, I did not have a good place to store them. I used a single directory for all my tiny modules because one directory per module seemed like overkill at the time.
  • When the version of python on the system was upgraded, bad things happened.

It’s very tempting to simply throw all of your python code into a single directory tree, but that method only causes problems later on.

Do: Create python modules

For the example above, we can write a simple setup.py file:

from distutils.core import setup

setup(name="temp",
    version="1.0",
    py_modules = ["temp"], 
    entry_points = {
        'console_scripts': [
            'ctof   = temp:main',
        ]
    },
)

If you have a full package instead of a single file module, you should use packages and not py_modules. The the official documentation should be read if you are doing anything more complicated. There are fields for your name, short and long descriptions, licensing information, etc. This example was kept purposely short to make it clear that there is not much you actually have to do to get started. Even a barebones setup.py is better than no setup.py.

Don’t: use ‘scripts’ in setup.py (Do: Use entry points)

console_scripts entry_points should be preferred over the ‘scripts’ that setup.py can install. The last time I tried, scripts did not get correctly installed on Windows systems, but console_scripts did. Additionally, the more code you have in scripts, the less testable code you have in your modules. When you use scripts, eventually you will get to the point where they all contain something similar to:

from mypackage.commands import frob
frob()

and at that point, you are just re-implementing what console_scripts does for you.

Do: Version your packages and depend on specific versions.

So, after years of doing-the-wrong-thing, I finally created proper packages for each of my libraries and tools. Shortly after that I started having problems again. While I had been versioning all of my packages, any package that required another package simply depended on the package name and not any specific version or it. This created problems any time I would add new features. I would install the latest version of a utility package on a server, and it would crash since I had forgotten to upgrade the library it depended on. Since I wasn’t syncing the entire directory tree anymore, libraries were becoming out of date.

Don’t install packages system wide. (Do: Use virtualenv and pip)

Once you get to the point where you are using versioned packages, you’ll want to be able install different versions of modules under different python versions. When I was simply sticking everything under /srv/python it was next to impossible to have multiple versions of python. I could change PYTHONPATH to point somewhere else, but there was no easy way to maintain two complete different trees of modules.

It is extremely simple to get started using pip and virtual environments. You can use the -E option to create a virtual environment and install a package in one command. The -E option to pip creates a virtual environment if one doesn’t already exist:

justin@eee:~/tmp$ pip  -E python_env install bottle
Creating new virtualenv environment in python_env
  New python executable in python_env/bin/python
  Installing distribute...done........................
Downloading/unpacking bottle
  Downloading bottle-0.9.5.tar.gz (45Kb): 45Kb downloaded
  Running setup.py egg_info for package bottle

Installing collected packages: bottle
  Running setup.py install for bottle

Successfully installed bottle
Cleaning up...
justin@eee:~/tmp$ ./python_env/bin/python 
>>> import bottle
>>> bottle.__file__
'/home/justin/tmp/python_env/lib/python2.7/site-packages/bottle.pyc'
>>> 

I can use that same method to install the toy module I wrote for this post as well:

justin@eee:~/tmp$ pip  -E python_env install ~/tmp/post/temp_mod/
Unpacking ./post/temp_mod
  Running setup.py egg_info for package from file:///home/justin/tmp/post/temp_mod

Installing collected packages: temp
  Running setup.py install for temp

    Installing ctof script to /home/justin/tmp/python_env/bin

Successfully installed temp
Cleaning up...

pip was also nice enough to install my console_script:

justin@eee:~/tmp$ ./python_env/bin/ctof 
C: 34
93

Too long; Did read

The barrier to entry for python is a lot lower compared to a language like java or c++. It’s true that helloworld is simply:

print("Hello, World")

However, if you plan on using python for anything more complicated, you will want to learn how to take advantage of modules and packages. Python doesn’t force you to do this, but not doing so can quickly turn into a maintenance nightmare.

os.popen uses the shell by default, and unlike subprocess.Popen, has no way of disabling it. Problems can occur when the program you are trying to run does not exist or is unable to be ran due to a permissions issue.

Consider the following example function:

def logged_in_users():
    users = set()
    for line in os.popen("who"):
        users.add(line.split()[0])
    return users

This runs just fine when everything is working:

In [4]: logged_in_users()
Out[4]: set(['justin'])

But if there is a problem running the command(for the example lets change the ‘who’ to ‘whom’:

In [6]: logged_in_users()
sh: whom: not found
Out[6]: set()

What happened was os.popen ran

"sh -c whom"

While sh started fine, the actually command could not be ran. Since os.popen also does not pass the exit code back to the parent process there is no easy method to use to see if anything went wrong.

If we switch over to subprocess.Popen, everything works fine:

for line in subprocess.Popen(["whom"], stdout=subprocess.PIPE).stdout:

This call will instead immediately raise an exception:

OSError: [Errno 2] No such file or directory

So using subprocess.Popen and not using os.popen has the following benefits:

  • Is more secure against potential command injection
  • Does not waste a process
  • Returns better error message to the parent process

One of the first steps in groking ipv6 is getting a handle on ipv6 addresses.

The ‘dotted quad’ notation for ipv4 is fairly simple, and other than possible zero padding issues, they all look the same. ipv6 addresses are a bit different. Rather than a dotted quad they are 8 hex groups, and there are a lot of rules for displaying the addresses. For working with ipv6 addresses there are two options:

  • Convert them to a 16 byte string
  • Normalize them

There are some very nice libraries for working with ip addreses, but the low level socket functions can be used to convert and normalize:

>>> import socket
>>> bytes=socket.inet_pton(socket.AF_INET6, "2001:4860:800f:0:0:0:0:0063")
>>> bytes
' \x01H`\x80\x0f\x00\x00\x00\x00\x00\x00\x00\x00\x00c'
>>> 'we can see that the data is the same:'
>>> binascii.hexlify(bytes)
'20014860800f00000000000000000063'
>>> print socket.inet_ntop(socket.AF_INET6, bytes)
2001:4860:800f::63

We can make a simple fuction to do that:

def normalize(ip):
    bytes=socket.inet_pton(socket.AF_INET6, ip)
    return socket.inet_ntop(socket.AF_INET6, bytes)

You can see some of the weird normalization rules in action:

>>> print normalize("2001:4860:800f:0:0:0:0:0063")
2001:4860:800f::63
>>> print normalize("::ffff:c000:280")
::ffff:192.0.2.128
>>> print normalize("ff02:0:0:0:0:0:0:1")
ff02::1

Debian kFreeBSD

A few days ago I installed Debian/kFreeBSD on my home server. It had been running opensolaris for years, but doing just about anything on that system was a complete pain in the ass. I had been meaning to give Debian/kFreeBSD a try, but had been putting it off thinking the changeover would break a lot of things, or I would have trouble importing the ZFS pools.

The other day I had some free time so I gave it a go.

I downloaded the mini.iso and dd’d it to a spare usb stick. The kFreeBSD ISOs support both cd and hard disk booting like the linux images. The install took about 40 minutes(including the time taken to download everything).

After that I expected to have a few problems.. but everything worked. I was able to install zfsutils and import the zfs pools. Debian/kFreeBSD doesn’t currently support nfs, but it was easy enough to install samba.

I’m left with a speedy, lightweight system, with thousands of packages and full security support:

root@pip:~# df -h /
Filesystem            Size  Used Avail Use% Mounted on
/dev/ad0s1             35G  596M   32G   2% /

root@pip:~# free -m
             total       used       free     shared    buffers     cached
Mem:          2026        222       1804         17          0          0
-/+ buffers/cache:        222       1804
Swap:            0          0          0

root@pip:~# apt-cache search ""|wc -l
26258

Other than a few utilities working a little differently (the main one I noticed was netstat not taking the same flags) it feels exactly like a debian/linux system. But with ZFS.

Nice page titles

The first thing I wanted to do was fix the page titles. Blog posts should automatically have their page title set. This was a trivial change to head.mako:

-<title>${bf.config.blog.name}</title>
+<title>
+    BB.Net
+%if post and post.title:
+- ${post.title}
+%endif
+</title>

Easy blogging

The second thing I needed to do was write a script for easily adding a new post. newblog.py was the result:

justin@eee:~/projects/bbdotnet$ ./newblog.py
Title: Playing with blogofile
cats: tech,python,blogofile

This drops me into a vim session with the following contents

all I have to do when I’m done is ‘git commit’

Makefile

Finally, I wrote a stupid simple Makefile, that way I can just kick off a :make inside of vim.

all: build

build:
    blogofile build

Shared HTTP Caching

I’ve been wondering why the web doesn’t have a mechanism for uniquely identifying a resource by a means other than its URL. I think if such a thing existed, then HTTP caches for common files could be shared between sites.

There has been a push lately to let Google host common JS libraries for you. The main reason for this is increased performance, there are two cases where this helps:

  • The user has never loaded jQuery before - They get to download it from fast servers
  • The user has visited another site that also hosted jQuery on google - They don’t have to download it at all.

However, there are issues with this:

  • This will not work on a restricted intranet
  • If the copy of jQuery on google was somehow compromised, a large number of sites would be effected.
  • If google is unreachable(it happens!), the site will fail to function properly

There should be a way to include a checksum like so:

<script type="text/javascript"
    src="/js/jquery-1.3.2.min.js"
    sha1="3dc9f7c2642efff4482e68c9d9df874bf98f5bcb">
</script>

(sha1 usage here is just an example, a more secure method could easily be used instead)

This would have two benefits:

  • If the copy of jQuery was maliciously modified, or simply corrupted, the browser would refuse to load it.
  • The browser may be able to use a cached copy of jQuery from another site with the same checksum.

This sort of fits in with one of the ideas in the A New Way to look at Networking talk by Van Jacobson.

Remap capslock to z

My ‘z’ key has been (physically) broken for a while now. Generally this isn’t a problem because there aren’t that many places where I need to type a ‘z’ that I can’t autocomplete it. Between tab completion in the shell, and the irssi dictcomplete plugin, it hasn’t bothered me that much.

I finally got around to figuring out how to remap Caps lock to ‘z’, the magic lines to add to ~/.Xmodmap are

remove Lock = Caps_Lock
keycode 66 = z

Most of the examples I found are for swapping capslock with control or escape(which are mostly obsolete now that you can use the Keyboard prefs thing in Gnome and swap keys around with a single click). Remapping caps lock to z is still too obscure to be in the nice GUI.

Now, if only two lines in a config file could fix the battery :-)