random things are here

last update:

Debian kFreeBSD

A few days ago I installed Debian/kFreeBSD on my home server. It had been running opensolaris for years, but doing just about anything on that system was a complete pain in the ass. I had been meaning to give Debian/kFreeBSD a try, but had been putting it off thinking the changeover would break a lot of things, or I would have trouble importing the ZFS pools.

The other day I had some free time so I gave it a go.

I downloaded the mini.iso and dd’d it to a spare usb stick. The kFreeBSD ISOs support both cd and hard disk booting like the linux images. The install took about 40 minutes(including the time taken to download everything).

After that I expected to have a few problems.. but everything worked. I was able to install zfsutils and import the zfs pools. Debian/kFreeBSD doesn’t currently support nfs, but it was easy enough to install samba.

I’m left with a speedy, lightweight system, with thousands of packages and full security support:

root@pip:~# df -h /
Filesystem            Size  Used Avail Use% Mounted on
/dev/ad0s1             35G  596M   32G   2% /

root@pip:~# free -m
             total       used       free     shared    buffers     cached
Mem:          2026        222       1804         17          0          0
-/+ buffers/cache:        222       1804
Swap:            0          0          0

root@pip:~# apt-cache search ""|wc -l
26258

Other than a few utilities working a little differently (the main one I noticed was netstat not taking the same flags) it feels exactly like a debian/linux system. But with ZFS.

Nice page titles

The first thing I wanted to do was fix the page titles. Blog posts should automatically have their page title set. This was a trivial change to head.mako:

-<title>${bf.config.blog.name}</title>
+<title>
+    BB.Net
+%if post and post.title:
+- ${post.title}
+%endif
+</title>

Easy blogging

The second thing I needed to do was write a script for easily adding a new post. newblog.py was the result:

justin@eee:~/projects/bbdotnet$ ./newblog.py
Title: Playing with blogofile
cats: tech,python,blogofile

This drops me into a vim session with the following contents

all I have to do when I’m done is ‘git commit’

Makefile

Finally, I wrote a stupid simple Makefile, that way I can just kick off a :make inside of vim.

all: build

build:
    blogofile build

Shared HTTP Caching

I’ve been wondering why the web doesn’t have a mechanism for uniquely identifying a resource by a means other than its URL. I think if such a thing existed, then HTTP caches for common files could be shared between sites.

There has been a push lately to let Google host common JS libraries for you. The main reason for this is increased performance, there are two cases where this helps:

  • The user has never loaded jQuery before - They get to download it from fast servers
  • The user has visited another site that also hosted jQuery on google - They don’t have to download it at all.

However, there are issues with this:

  • This will not work on a restricted intranet
  • If the copy of jQuery on google was somehow compromised, a large number of sites would be effected.
  • If google is unreachable(it happens!), the site will fail to function properly

There should be a way to include a checksum like so:

<script type="text/javascript"
    src="/js/jquery-1.3.2.min.js"
    sha1="3dc9f7c2642efff4482e68c9d9df874bf98f5bcb">
</script>

(sha1 usage here is just an example, a more secure method could easily be used instead)

This would have two benefits:

  • If the copy of jQuery was maliciously modified, or simply corrupted, the browser would refuse to load it.
  • The browser may be able to use a cached copy of jQuery from another site with the same checksum.

This sort of fits in with one of the ideas in the A New Way to look at Networking talk by Van Jacobson.

Remap capslock to z

My ‘z’ key has been (physically) broken for a while now. Generally this isn’t a problem because there aren’t that many places where I need to type a ‘z’ that I can’t autocomplete it. Between tab completion in the shell, and the irssi dictcomplete plugin, it hasn’t bothered me that much.

I finally got around to figuring out how to remap Caps lock to ‘z’, the magic lines to add to ~/.Xmodmap are

remove Lock = Caps_Lock
keycode 66 = z

Most of the examples I found are for swapping capslock with control or escape(which are mostly obsolete now that you can use the Keyboard prefs thing in Gnome and swap keys around with a single click). Remapping caps lock to z is still too obscure to be in the nice GUI.

Now, if only two lines in a config file could fix the battery :-)

finding duplicate files

This post is about my duplicate finding program available here. The program is a little bare, and needs a nicer API, but the method it uses is the most efficient one that I am aware of.

There are a couple of different ways you can find duplicate files:

Compute the hash of all the files, and look for duplicates

This method works well if the files on disk are mostly static, and files are added infrequently. In this case you can compute the hashes once, and keep it around for later scans. However, if you are only running the scan once, this method is not ideal since it requires you to read the full contents of every file

Compute the hash of files with the same size

This is the method that I think fdupes still uses. It first builds a candidate list of files that are the same size, and computes the checksum of each. This method works well if most of the files that are the same size are really duplicates, but otherwise triggers too much unneeded IO.

Compare all files with the same size in parallel

This is the method that my program uses. Like fdupes, I first built up a candidate list of files with the same size. Instead of hashing the files, it simply reads each file at the same time, comparing block by block. This is just like what the cmp(1) program does, but for multiple files at the same time. The benefit of this over calculating the files hash, is that as soon as the files differ, you can stop reading.

##Implementation There are a couple of things you need to keep in mind to implement this method.

Don’t open too many files.

You have to be careful not to try and open too many files at once. If the user has 5,000 files that all have the same size, the program shouldn’t try and open all 5,000 at once. My program uses a simple helper class to handle opening and closing files. The default blocksize in my program would probably waste a bit of memory in this case, but that is easily changed.

Correctly handle diverging sets.

Imagine the filesystem contains 4 files of the same size, ‘a’, ‘b’,‘c’, and ’d’, where a==c, and b==d. While reading through the files, it will become clear that a!=b, a==c, and a!=d. It is important that at this step the program continues searching using (a,c) and (b,d) as possible duplicates. This is implemented using recursion, the sets (a,c) and (b,d) are fed back into the duplicate finding function.

##Example run, compared to fdupes. Here is dupes.py running against fdupes on a modestly sized directory. Notice how dupes.py only needs to read 600K(not counting metadata).

According to iofileb.d from the dtrace toolkit, dupes.py reads 10M of data (which I think includes python), and fdupes reads 517M. This alone explains the 20x speedup seen in dupes.py

justin@pip:~$ du -hs $DIR
15G   $DIR

justin@pip:~$ time python code/dupes.py $DIR
2896 total files
35 size collisions, max of length 5
bytes read 647168

real    0m1.224s
user    0m0.234s
sys     0m0.494s

justin@pip:~$ time fdupes -r $DIR
real    0m41.694s
user    0m13.612s
sys     0m7.491s

justin@pip:~$ time python code/dupes.py $DIR
2896 total files
35 size collisions, max of length 5
bytes read 647168

real    0m3.662s
user    0m0.256s
sys     0m0.568s

justin@pip:~$ time fdupes -r $DIR
real    0m55.473s
user    0m11.383s
sys     0m6.433s

The problem

The Xen documentation on live migration states:

Currently, there is no support for providing automatic remote access to filesystems stored on local disk when a domain is migrated. Administrators should choose an appropriate storage solution (i.e. SAN, NAS, etc.) to ensure that domain filesystems are also available on their destination node. GNBD is a good method for exporting a volume from one machine to another. iSCSI can do a similar job, but is more complex to set up.

This does not mean that it is impossible though. Live migration is a more efficient migration, and migration can be seen as a save on one node, and a restore on another. Normally, if you save a VM on one machine, and try to restore it on another machine, it will fail when it is unable to read its filesystems. But what would happen if you coppied the filesystem to the other node between the save and restore? If done right, it works pretty well.

The solution?

The solution is simple:

  • Save running image
  • Sync disks
  • copy image to other node, restore

This can be somewhat sped up by syncing the disks twice:

  • Sync disks
  • Save running image
  • Sync disks - only having to save any changes in the last few seconds
  • copy image to other node, restore

Syncronizing block devices

File backed

If you are using plain files as vbds, you can sync the disks using rsync.

Raw devices

If you are using raw devices, rsync can not be used. I wrote a small utility called blocksync which can syncronize 2 block devices over the network. In my testing it was easily able to max out the network on an initial sync, and max out the disk read speed on a resync.

$ blocksync.py /dev/xen/vm-root 1.2.3.4

Will sync /dev/xen/vm-root onto 1.2.3.4. The device should already exist on the destination and be the same size.

Solaris ZFS

If you are using ZFS, it should be possible to use zfs send to sync the block devices before migration. This would give an almost instantaneous sync time.

Automation

A simple script xen_migrate.sh and its helper xen_vbds.py will migrate a domain to another host. File and raw vbds are supported. ZFS send support is not yet implemented.

Example migration

#migrating a 1G / + 128M swap over the network
#physical machines are 350mhz with 64M of ram,
#total downtime is about 3 minutes

xen1:~# time ./migrate.sh test 192.168.1.2
+ '[' 2 -ne 2 ']'
+ DOMID=test
+ DSTHOST=192.168.1.2
++ xen_vbds.py test
+ FILES=/dev/xen/test-root
/dev/xen/test-swap
+ main
+ check_running
+ xm list test
Name              Id  Mem(MB)  CPU  State  Time(s)  Console
+ sync_disk
+ blocksync.py /dev/xen/test-root 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576
same: 942, diff: 82, 1024/1024
+ blocksync.py /dev/xen/test-swap 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576
same: 128, diff: 0, 128/128
+ save_image
+ xm save test test.dump
+ sync_disk
+ blocksync.py /dev/xen/test-root 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576
same: 1019, diff: 5, 1024/1024
+ blocksync.py /dev/xen/test-swap 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576
same: 128, diff: 0, 128/128
+ copy_image
+ scp test.dump 192.168.1.2:
test.dump                                       100%   16MB   3.2MB/s   00:05
+ restore_image
+ ssh 192.168.1.2 'xm restore test.dump && rm test.dump'
(domain
    (id 89)
    [domain info stuff cut out]
)
+ rm test.dump

real    6m6.272s
user    1m29.610s
sys     0m30.930s