A few days ago I installed Debian/kFreeBSD on my home server. It had
been running opensolaris for years, but doing just about anything on
that system was a complete pain in the ass. I had been meaning to
give Debian/kFreeBSD a try, but had been putting it off thinking the
changeover would break a lot of things, or I would have trouble
importing the ZFS pools.
The other day I had some free time so I gave it a go.
I downloaded the mini.iso
and dd’d it to a spare usb stick. The kFreeBSD ISOs support both cd and hard
disk booting like the linux images. The install took about 40
minutes(including the time taken to download everything).
After that I expected to have a few problems.. but everything worked.
I was able to install zfsutils and import the zfs pools.
Debian/kFreeBSD doesn’t currently support nfs, but it was easy enough
to install samba.
I’m left with a speedy, lightweight system, with thousands of packages and
full security support:
root@pip:~# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/ad0s1 35G 596M 32G 2% /
root@pip:~# free -m
total used free shared buffers cached
Mem: 2026 222 1804 17 0 0
-/+ buffers/cache: 222 1804
Swap: 0 0 0
root@pip:~# apt-cache search ""|wc -l
26258
Other than a few utilities working a little differently (the main one
I noticed was netstat not taking the same flags) it feels exactly like
a debian/linux system. But with ZFS.
Nice page titles
The first thing I wanted to do was fix the page titles. Blog posts
should automatically have their page title set. This was a trivial
change to head.mako:
-<title>${bf.config.blog.name}</title>
+<title>
+ BB.Net
+%if post and post.title:
+- ${post.title}
+%endif
+</title>
Easy blogging
The second thing I needed to do was write a script for easily adding
a new post. newblog.py was the result:
justin@eee:~/projects/bbdotnet$ ./newblog.py
Title: Playing with blogofile
cats: tech,python,blogofile
This drops me into a vim session with the following contents
all I have to do when I’m done is ‘git commit’
Makefile
Finally, I wrote a stupid simple Makefile, that way I can just kick
off a :make inside of vim.
all: build
build:
blogofile build
I’ve been wondering why the web doesn’t have a mechanism for uniquely
identifying a resource by a means other than its URL. I think if such a thing
existed, then HTTP caches for common files could be shared between sites.
There has been a push
lately to let Google host common JS
libraries for you. The main reason for this is increased performance, there
are two cases where this helps:
- The user has never loaded jQuery before - They get to download it from fast servers
- The user has visited another site that also hosted jQuery on google - They don’t have to download it at all.
However, there are issues with this:
- This will not work on a restricted intranet
- If the copy of jQuery on google was somehow compromised, a large number of sites would be effected.
- If google is unreachable(it happens!), the site will fail to function properly
There should be a way to include a checksum like so:
<script type="text/javascript"
src="/js/jquery-1.3.2.min.js"
sha1="3dc9f7c2642efff4482e68c9d9df874bf98f5bcb">
</script>
(sha1 usage here is just an example, a more secure method could easily be used instead)
This would have two benefits:
- If the copy of jQuery was maliciously modified, or simply corrupted, the browser would refuse to load it.
- The browser may be able to use a cached copy of jQuery from another site with the same checksum.
This sort of fits in with one of the ideas in the A New Way to look at Networking talk by Van Jacobson.
My ‘z’ key has been (physically) broken for a while now. Generally this isn’t
a problem because there aren’t that many places where I need to type a ‘z’
that I can’t autocomplete it. Between tab completion in the shell, and the
irssi dictcomplete plugin, it hasn’t bothered me that much.
I finally got around to figuring out how to remap Caps lock to ‘z’, the
magic lines to add to ~/.Xmodmap are
remove Lock = Caps_Lock
keycode 66 = z
Most of the examples I found are for swapping capslock with control or
escape(which are mostly obsolete now that you can use the Keyboard prefs
thing in Gnome and swap keys around with a single click). Remapping caps
lock to z is still too obscure to be in the nice GUI.
Now, if only two lines in a config file could fix the battery :-)
finding duplicate files
This post is about my duplicate finding program available here.
The program is a little bare, and needs a nicer API, but the method it uses is
the most efficient one that I am aware of.
There are a couple of different ways you can find duplicate files:
Compute the hash of all the files, and look for duplicates
This method works well if the files on disk are mostly static, and files are
added infrequently. In this case you can compute the hashes once, and keep it
around for later scans. However, if you are only running the scan once, this
method is not ideal since it requires you to read the full contents of every
file
Compute the hash of files with the same size
This is the method that I think fdupes still uses. It first builds a candidate
list of files that are the same size, and computes the checksum of each. This
method works well if most of the files that are the same size are really
duplicates, but otherwise triggers too much unneeded IO.
Compare all files with the same size in parallel
This is the method that my program uses. Like fdupes, I first built up a
candidate list of files with the same size. Instead of hashing the files,
it simply reads each file at the same time, comparing block by block.
This is just like what the cmp(1) program does, but for multiple files at the
same time. The benefit of this over calculating the files hash, is that
as soon as the files differ, you can stop reading.
##Implementation
There are a couple of things you need to keep in mind to implement this method.
Don’t open too many files.
You have to be careful not to try and open too many files at once. If the user
has 5,000 files that all have the same size, the program shouldn’t try and open
all 5,000 at once. My program uses a simple helper class to handle opening and
closing files. The default blocksize in my program would probably waste a bit
of memory in this case, but that is easily changed.
Correctly handle diverging sets.
Imagine the filesystem contains 4 files of the same size, ‘a’, ‘b’,‘c’, and ’d’,
where a==c, and b==d. While reading through the files, it will become clear
that a!=b, a==c, and a!=d. It is important that at this step the program
continues searching using (a,c) and (b,d) as possible duplicates. This is
implemented using recursion, the sets (a,c) and (b,d) are fed back into the
duplicate finding function.
##Example run, compared to fdupes.
Here is dupes.py running against fdupes on a modestly sized directory.
Notice how dupes.py only needs to read 600K(not counting metadata).
According to iofileb.d from the dtrace toolkit, dupes.py reads 10M of data (which
I think includes python), and fdupes reads 517M. This alone explains the 20x speedup
seen in dupes.py
justin@pip:~$ du -hs $DIR
15G $DIR
justin@pip:~$ time python code/dupes.py $DIR
2896 total files
35 size collisions, max of length 5
bytes read 647168
real 0m1.224s
user 0m0.234s
sys 0m0.494s
justin@pip:~$ time fdupes -r $DIR
real 0m41.694s
user 0m13.612s
sys 0m7.491s
justin@pip:~$ time python code/dupes.py $DIR
2896 total files
35 size collisions, max of length 5
bytes read 647168
real 0m3.662s
user 0m0.256s
sys 0m0.568s
justin@pip:~$ time fdupes -r $DIR
real 0m55.473s
user 0m11.383s
sys 0m6.433s
The problem
The Xen documentation on live migration states:
Currently, there is no support for providing automatic remote access to
filesystems stored on local disk when a domain is migrated. Administrators
should choose an appropriate storage solution (i.e. SAN, NAS, etc.) to ensure
that domain filesystems are also available on their destination node. GNBD is a
good method for exporting a volume from one machine to another. iSCSI can do a
similar job, but is more complex to set up.
This does not mean that it is impossible though. Live migration is a more
efficient migration, and migration can be seen as a save on one node, and a
restore on another. Normally, if you save a VM on one machine, and try to
restore it on another machine, it will fail when it is unable to read its
filesystems. But what would happen if you coppied the filesystem to the other
node between the save and restore? If done right, it works pretty well.
The solution?
The solution is simple:
- Save running image
- Sync disks
- copy image to other node, restore
This can be somewhat sped up by syncing the disks twice:
- Sync disks
- Save running image
- Sync disks - only having to save any changes in the last few seconds
- copy image to other node, restore
Syncronizing block devices
File backed
If you are using plain files as vbds, you can sync the disks using rsync.
Raw devices
If you are using raw devices, rsync can not be used. I wrote a small utility
called blocksync which can syncronize 2 block
devices over the network. In my testing it was easily able to max out the
network on an initial sync, and max out the disk read speed on a resync.
$ blocksync.py /dev/xen/vm-root 1.2.3.4
Will sync /dev/xen/vm-root onto 1.2.3.4. The device should already exist on the destination and be the same size.
Solaris ZFS
If you are using ZFS, it should be possible to use zfs send
to sync the block
devices before migration. This would give an almost instantaneous sync time.
Automation
A simple script xen_migrate.sh and its helper xen_vbds.py will migrate a domain to another host.
File and raw vbds are supported. ZFS send
support is not yet implemented.
Example migration
#migrating a 1G / + 128M swap over the network
#physical machines are 350mhz with 64M of ram,
#total downtime is about 3 minutes
xen1:~# time ./migrate.sh test 192.168.1.2
+ '[' 2 -ne 2 ']'
+ DOMID=test
+ DSTHOST=192.168.1.2
++ xen_vbds.py test
+ FILES=/dev/xen/test-root
/dev/xen/test-swap
+ main
+ check_running
+ xm list test
Name Id Mem(MB) CPU State Time(s) Console
+ sync_disk
+ blocksync.py /dev/xen/test-root 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576
same: 942, diff: 82, 1024/1024
+ blocksync.py /dev/xen/test-swap 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576
same: 128, diff: 0, 128/128
+ save_image
+ xm save test test.dump
+ sync_disk
+ blocksync.py /dev/xen/test-root 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576
same: 1019, diff: 5, 1024/1024
+ blocksync.py /dev/xen/test-swap 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576
same: 128, diff: 0, 128/128
+ copy_image
+ scp test.dump 192.168.1.2:
test.dump 100% 16MB 3.2MB/s 00:05
+ restore_image
+ ssh 192.168.1.2 'xm restore test.dump && rm test.dump'
(domain
(id 89)
[domain info stuff cut out]
)
+ rm test.dump
real 6m6.272s
user 1m29.610s
sys 0m30.930s