This feed contains pages in the "tech" category.
odd nmap timings
Posted Fri Aug 22 22:02:33 2008Back story
A section on a web application I have pings (using a background AJAX request) a list of IP addresses. Most of the time all of these adresses are up, sometimes one or two of them are down. One day I noticed that if all of them were down, nmap would take much longer to ping them all.
The odd part
Lets ping 19 addresses on my home network, none of which exist.
justin@latitude:~$ time nmap -sP 192.168.5.2-20 Starting Nmap 4.53 ( http://insecure.org ) at 2008-08-22 21:56 EDT Nmap done: 19 IP addresses (0 hosts up) scanned in 4.072 seconds real 0m4.081s user 0m0.068s sys 0m0.004s
Ok... now lets add the routers address, which is pingable.
justin@latitude:~$ time nmap -sP 192.168.5.1-19 Starting Nmap 4.53 ( http://insecure.org ) at 2008-08-22 21:58 EDT Host router (192.168.5.1) appears to be up. Nmap done: 19 IP addresses (1 host up) scanned in 2.258 seconds real 0m2.259s user 0m0.048s sys 0m0.008s
Notice anything odd?
I have experimented with the usual host timeout and max rtt time options, but I am not sure what the problem is. As soon as I get a chance I will look into the code. I am not sure if it is a BUG or just user error. A simple strace of the two commands show much different 'select' behaviour.
Python Evolution: From Script To Program
Posted Sat Jun 21 23:18:12 2008The Evolution of a Python Programmer is funny, but it only covers one aspect of programming. Many times I will see code that is fine from a CS point of view, but absolutely horrible when it comes to program structure and module organization.
You often see people saying things like "Hello World in python is just 'print "Hello World"'", and that is true. It is very easy to get started writing python, but if you don't structure your modules correctly, you will be in a world of pain later on. It is something that can be hard to explain, since the results in the short term are the same, and it may not be clear at first why one way of doing things is better than the other.
Instead of Hello World, let's take the example of a program to get stock quotes. The actual implementation here is not relevant, pretend it contacts a web service or database or something.
A common case is the "python script". I HATE python scripts. "script" almost always ends up being a single file with no entry points, no main function, and mixes IO with logic.
s = raw_input("symbol:")
if s == 'MSFT':
print 'price=', 28.23
elif s == 'GOOG':
print 'price=', 546.43
The first step in fixing this is to define an actual function. Now you can import the module and run get_price().
def get_price(): s = raw_input("symbol:") if s == 'MSFT': print 'price=', 28.23 elif s == 'GOOG': print 'price=', 546.43
The (hopefully) obvious problem with this is that the IO is mixed in with the logic. What if you wanted to get the stock price for 1000 stocks and output a nice summary? This next version is slightly better, here the input is a proper parameter, but you still have no control over the output. You could get your 1000 quotes, but you would have no way to report on the output. Again, this should be obvious, but I come across code that does this way too often.
def get_price(s): if s == 'MSFT': print 'price=', 28.23 elif s == 'GOOG': print 'price=', 546.43 ### if __name__ == "__main__": s = raw_input("symbol:") get_price(s)
The first respectable version adds a main() function that
handles the input and output. The main function should also get the
stock from the command line arguments, rather than interactively. I
think you tend to see things like this more often from windows
users, who like to double click on things rather than run them from
a shell. You could probably write a whole book on this subject
though 
def get_price(s): if s == 'MSFT': return 28.23 elif s == 'GOOG': return 546.43 ### def main(): s = raw_input("symbol:") print 'price=', get_price(s) if __name__ == "__main__": main()
The final steps are to make a proper python package out of this module, but I'll save that for a later post.
erlang basic distributed application
Posted Mon Jun 9 18:15:31 2008Erlang
with OTP is a fairly powerful framework for creating
distributed redundant applications. The basic
gen_server behavior can easily extended to create a
redundant server with built in failover. With Mnesia you
also get a replicated Database.
I've been trying to figure out how exactly this is supposed to work, so I've been working on a quick application to demonstrate this. It's nothing fancy, just a simple set(k,v) and get(k) API.
The files are available as a tarball from here ddict.tgz
-module(ddict). -behaviour(gen_server). -export([start/0,stop/0,terminate/2]). -export([init/1, handle_call/3, handle_cast/2,handle_info/2]). -export([create_schema/0]). -export([get/1,set/2]). -define(GD,{global, ddict}). -include_lib("stdlib/include/qlc.hrl"). -record(rec, {key, value}). init_mnesia() -> mnesia:start(), ok = mnesia:wait_for_tables([rec], 2000). init(_Arg) -> process_flag(trap_exit, true), io:format("dict server starting~n"), init_mnesia(), {ok, []}. start() -> gen_server:start_link(?GD, ddict, [], []). stop() -> gen_server:cast(?GD, stop). terminate(Reason, State) -> io:format("dict server terminating~n"). %"model" methods do_get(Key) -> Res = mnesia:dirty_read({rec, Key}), case Res of [] -> undefined; [Rec] -> Rec#rec.value end. do_set(Key, Value) -> F = fun() -> Row = #rec{key=Key, value=Value}, mnesia:write(Row) end, {atomic, ok} = mnesia:transaction(F), ok. %"controller" methods handle_call({get, Key}, From, State) -> Rec = do_get(Key), {reply, Rec, State}; handle_call({set, Key, Value}, From, State) -> Rec = do_set(Key, Value), {reply, Rec, State}. handle_cast(stop, State) -> io:format("ddict server stopping~n"), {stop, normal, State}. handle_info(Info, State) -> {noreply, State}. %"client api" methods get(Key) -> gen_server:call(?GD, {get, Key}). set(Key,Value) -> gen_server:call(?GD, {set, Key, Value}). create_schema() -> mnesia:create_schema([node()|nodes()]), mnesia:start(), %this is defnitely wrong lists:foreach(fun(N) -> io:format("starting mnesia on ~w~n", [N]), rpc:call(N, mnesia, start, []) end, nodes()), mnesia:create_table(rec, [ {disc_copies, [node()|nodes()]}, {attributes, record_info(fields, rec)} ]).download file "main gen_server file"
The key to this server being distributed is the use of {global, ddict} as the server name, instead of {local, ddict}. This enables other nodes in the cluster to see this server.
do_get() and do_set() are the "model"
like methods that deal with mnesia. handle_call
defines the gen_server api. get() and set() are helper
functions that call the remote gen server. If there was more to
this module, it would be a good idea to put these methods in
separate modules.
The one thing I am not sure about is the
create_schema() method. I'm sure there is a propper
way to initalize mnesia on a cluster, I just have no idea what it
is yet 
To make this into a propper gen server the supervisor and application needs to be defined with the following three files:
-module(ddict_sup). -behaviour(supervisor). -export([start_link/0]). -export([init/1]). start_link() -> supervisor:start_link(ddict_sup, []). init(_Args) -> {ok, {{one_for_one, 10, 60}, [{ddict, {ddict, start, []}, permanent, brutal_kill, worker, [ddict]}]}}.download file "/ramblings/files/erlang/ddict/ddict_sup.erl"
-module(ddict_app). -behaviour(application). -export([start/2, stop/1,go/0]). start(_Type, _Args) -> ddict_sup:start_link(). stop(_State) -> io:format("ddict server terminating~n"), ok. go() -> application:start(ddict).download file "/ramblings/files/erlang/ddict/ddict_app.erl"
{application, ddict,
[
{mod, {ddict_app,[]}}
]}.
download file
"/ramblings/files/erlang/ddict/ddict.app"To get erlang to start this application on boot, a config file for each node needs to be written:
[{kernel,
[{distributed, [{ddict, 3000, [one@media, {two@media}]}]},
{sync_nodes_optional, [two@media]},
{sync_nodes_timeout, 5000}
]
}
].
download file
"/ramblings/files/erlang/ddict/one.config"
[{kernel,
[{distributed, [{ddict, 3000, [one@media, {two@media}]}]},
{sync_nodes_optional, [one@media]},
{sync_nodes_timeout, 5000}
]
}
].
download file
"/ramblings/files/erlang/ddict/two.config"To create the initial database I ran the
ddict:create_schema method, which I'm sure is
completely incorrect, but it works:
erl -sname one -config one.config
erl -sname two -config two.config
(one@media)1> ddict:create_schema().
starting mnesia on two@media
{atomic,ok}
(one@media)2> mnesia:info().
...
running db nodes = [two@media,one@media]
disc_copies = [rec,schema]
[{one@media,disc_copies},{two@media,disc_copies}] = [schema,rec]
...
ok
download file
"/ramblings/files/erlang/ddict/create_db.txt"Once that is done, the application can be started with
erl -pa . -sname one -config one.config -s ddict_app go erl -pa . -sname two -config two.config -s mnesia start -s ddict_app go
I have to start mnesia separately on the second VM because I haven't yet figured out how mnesia should be started when dealing with distributed applications. mnesia needs to be running on both nodes, but not the ddict application itself.
Once it is running, you can call ddict:set("Foo","bar") and ddict:get("Foo"). You can also kill either VM, and it will restart the server after 3 seconds on the other node.
using nmap for network monitoring
Posted Fri May 30 22:34:40 2008The problem
You need to know if any of 900 IP addresses are unreachable. You also need to know this within about a minute of any outages. Nmap is primary a security tool, but it can be very helpful when it comes to monitoring as well.
fping
For years I used fping for this, here is an example of what it can do:
$ wc -l ips.txt 900 ips.txt $ time fping < ips.txt ... real 0m41.347s user 0m0.028s sys 0m0.248s
Not too bad.. 41 seconds to poll 900 devices. It actually seems to finish at around 35 seconds, and then sits there for a bit before exiting.
nmap
Now lets try with nmap. Nmap needs to be ran as root to allow it to send icmp packets, otherwise it will use connect(). In my tests it is actually faster when running in tcp mode, but some devices only respond to ICMP. (It would be best for security to put this into a nmap_ping helper script and put that in sudoers instead of allowing all nmap commands to be ran as root. It is probably also possible to use the capabilities system to just allow a normal user to send ICMP packets.)
$ time sudo nmap -n -sP -PE -iL ips.txt ... real 0m3.961s user 0m1.072s sys 0m1.780s
Not bad at all, about 10 times faster than using fping!
Note in these examples, all of the addresses are pingable, so timeouts and retry times do not come into play. My monitoring system maintains separate lists of the reachable and unreachable devices, and pings them from different processes. This prevents unreachable devices from slowing down the normal process of making sure everything else is working. Currently the time between pings to a single device is about 8 seconds.
building pig on debian
Posted Thu May 15 23:52:12 2008I've been playing with Hadoop and Pig. It is some really neat technology.
I had a bunch of trouble getting pig to build though, it seems that this error from ant:
Could not create task or type of type: jjtree
Is caused by missing the 'ant-optional' package.
how my dupe finding program works
Posted Thu Feb 21 23:41:03 2008finding duplicate files
This post is about my duplicate finding program available under Programs. The program is a little bare, and needs a nicer API, but the method it uses is the most efficient one that I am aware of.
There are a couple of different ways you can find duplicate files:
Compute the hash of all the files, and look for duplicates
This method works well if the files on disk are mostly static, and files are added infrequently. In this case you can compute the hashes once, and keep it around for later scans. However, if you are only running the scan once, this method is not ideal since it requires you to read the full contents of every file
Compute the hash of files with the same size
This is the method that I think fdupes still uses. It first builds a candidate list of files that are the same size, and computes the checksum of each. This method works well if most of the files that are the same size are really duplicates, but otherwise triggers too much unneeded IO.
Compare all files with the same size in parallel
This is the method that my program uses. Like fdupes, I first built up a candidate list of files with the same size. Instead of hashing the files, it simply reads each file at the same time, comparing block by block. This is just like what the cmp(1) program does, but for multiple files at the same time. The benefit of this over calculating the files hash, is that as soon as the files differ, you can stop reading.
Implementation
There are a couple of things you need to keep in mind to implement this method.
Don't open too many files.
You have to be careful not to try and open too many files at once. If the user has 5,000 files that all have the same size, the program shouldn't try and open all 5,000 at once. My program uses a simple helper class to handle opening and closing files. The default blocksize in my program would probably waste a bit of memory in this case, but that is easily changed.
Correctly handle diverging sets.
Imagine the filesystem contains 4 files of the same size, 'a', 'b','c', and 'd', where a==c, and b==d. While reading through the files, it will become clear that a!=b, a==c, and a!=d. It is important that at this step the program continues searching using (a,c) and (b,d) as possible duplicates. This is implemented using recursion, the sets (a,c) and (b,d) are fed back into the duplicate finding function.
Example run, compared to fdupes.
Here is dupes.py running against fdupes on a modestly sized directory. Notice how dupes.py only needs to read 600K(not counting metadata).
According to iofileb.d from the dtrace toolkit, dupes.py reads 10M of data (which I think includes python), and fdupes reads 517M. This alone explains the 20x speedup seen in dupes.py
justin@pip:~$ du -hs $DIR 15G $DIR justin@pip:~$ time python code/dupes.py $DIR 2896 total files 35 size collisions, max of length 5 bytes read 647168 real 0m1.224s user 0m0.234s sys 0m0.494s justin@pip:~$ time fdupes -r $DIR real 0m41.694s user 0m13.612s sys 0m7.491s justin@pip:~$ time python code/dupes.py $DIR 2896 total files 35 size collisions, max of length 5 bytes read 647168 real 0m3.662s user 0m0.256s sys 0m0.568s justin@pip:~$ time fdupes -r $DIR real 0m55.473s user 0m11.383s sys 0m6.433s
regex with named groups
Posted Wed Feb 20 11:42:21 2008As I mentioned in a comment at Some more tweaks to my Python script, there are a lot of ways you can use the re module. If you need to match multiple expressions against each line, you can build up a single regular expression that includes all the patterns, and used named groups to tell them apart.
import re #if you were matching many of these it would be a good idea #to make a function that simply fills in '%s>(?P<%s>[^<]+)<' cpattern = 'total_credit>(?P<credit>[^<]+)<' opattern = 'os_name>(?P<os>[^<]+)<' pattern = '(%s)|(%s)' % (cpattern, opattern) search = re.compile(pattern).search lines = [ 'blah blah blah total_credit>10< blah blah', 'hkfhsd klfjhs dfkljsdfsl fds', 'hkashflksd os_name>win< hhkjhdflksj d', 'hkfhsd klfjhs dfkljsdfsl fds', 'blah blah blah total_credit>20< blah blah', ] for line in lines: r = search(line) if r: print r.groupdict()
Running this gives
{'credit': '10', 'os': None}
{'credit': None, 'os': 'win'}
{'credit': '20', 'os': None}
In this case you could even generalize the regular expression further, like so:
pattern = '\s(?P<key>[^\s>]+)>(?P<value>[^<]+)<'
Running that (probably less than optimal) regular expression over the input gives
{'key': 'total_credit', 'value': '10'}
{'key': 'os_name', 'value': 'win'}
{'key': 'total_credit', 'value': '20'}
xen live migration without shared storage
Posted Sat Feb 16 17:02:25 2008The problem
The Xen documentation on live migration states:
Currently, there is no support for providing automatic remote access to filesystems stored on local disk when a domain is migrated. Administrators should choose an appropriate storage solution (i.e. SAN, NAS, etc.) to ensure that domain filesystems are also available on their destination node. GNBD is a good method for exporting a volume from one machine to another. iSCSI can do a similar job, but is more complex to set up.
This does not mean that it is impossible though. Live migration is a more efficient migration, and migration can be seen as a save on one node, and a restore on another. Normally, if you save a VM on one machine, and try to restore it on another machine, it will fail when it is unable to read its filesystems. But what would happen if you coppied the filesystem to the other node between the save and restore? If done right, it works pretty well.
The solution?
The solution is simple:
- Save running image
- Sync disks
- copy image to other node, restore
This can be somewhat sped up by syncing the disks twice:
- Sync disks
- Save running image
- Sync disks - only having to save any changes in the last few seconds
- copy image to other node, restore
Syncronizing block devices
File backed
If you are using plain files as vbds, you can sync the disks using rsync.
Raw devices
If you are using raw devices, rsync can not be used. I wrote a small utility called blocksync which can syncronize 2 block devices over the network. In my testing it was easily able to max out the network on an initial sync, and max out the disk read speed on a resync.
$ blocksync.py /dev/xen/vm-root 1.2.3.4
Will sync /dev/xen/vm-root onto 1.2.3.4. The device should already exist on the destination and be the same size.
Solaris ZFS
If you are using ZFS, it should be possible to use zfs
send to sync the block devices before migration. This would
give an almost instantaneous sync time.
Automation
A simple script xen
migrate.sh and its helper xen vbds.py will migrate a
domain to another host. File and raw vbds are supported. ZFS
send support is not yet implemented.
Example migration
#migrating a 1G / + 128M swap over the network #physical machines are 350mhz with 64M of ram, #total downtime is about 3 minutes xen1:~# time ./migrate.sh test 192.168.1.2 + '[' 2 -ne 2 ']' + DOMID=test + DSTHOST=192.168.1.2 ++ xen_vbds.py test + FILES=/dev/xen/test-root /dev/xen/test-swap + main + check_running + xm list test Name Id Mem(MB) CPU State Time(s) Console test 87 15 0 -b--- 0.0 9687 + sync_disk + blocksync.py /dev/xen/test-root 192.168.1.2 ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576 same: 942, diff: 82, 1024/1024 + blocksync.py /dev/xen/test-swap 192.168.1.2 ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576 same: 128, diff: 0, 128/128 + save_image + xm save test test.dump + sync_disk + blocksync.py /dev/xen/test-root 192.168.1.2 ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576 same: 1019, diff: 5, 1024/1024 + blocksync.py /dev/xen/test-swap 192.168.1.2 ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576 same: 128, diff: 0, 128/128 + copy_image + scp test.dump 192.168.1.2: test.dump 100% 16MB 3.2MB/s 00:05 + restore_image + ssh 192.168.1.2 'xm restore test.dump && rm test.dump' (domain (id 89) [domain info stuff cut out] ) + rm test.dump real 6m6.272s user 1m29.610s sys 0m30.930sdownload file "/ramblings/files/example_migration.txt"
dynamic ikiwiki pages
Posted Fri Feb 15 20:57:58 2008The static pages that ikiwiki generates are great, but I want to have some dynamic content here as well.
If this works, this page should include the servers uptime.
13:49:09 up 88 days, 19:05, 0 users, load average: 0.00, 0.00, 0.00yay 
So how does that work?
first configure nginx as follows
server {
listen 80;
server_name bouncybouncy.net *.bouncybouncy.net web;
location / {
root /home/justin/bbdotnet/static/;
index index.html index.htm;
ssi on;
}
location /dyn {
# All POST requests go to pylons directly
include /usr/local/nginx/conf/proxy.conf;
proxy_redirect default;
if ($request_method = POST) {
proxy_pass http://127.0.0.1:5000;
break;
}
default_type text/html;
set $memcached_key "$uri";
memcached_pass localhost:11211;
proxy_intercept_errors on;
# If no info would be found in memcache or memecache would be dead, go to real dynamic location
error_page 404 502 = @dynamic_request;
}
location @dynamic_request{
# This means, that we can't get to this location from outside - only by internal redirect
internal;
include /usr/local/nginx/conf/proxy.conf;
proxy_redirect default;
proxy_pass http://127.0.0.1:5000;
}
}
Pylons is setup to run on port 5000 as usual, nothing fancy there.
Then anywhere we want some dynamic content we can simply do
<!--# include virtual="/dyn/demo/uptime" -->
For now, you have to disable the htmlscrubber plugin for this to work. There is probably a better solution. I think this would simply involve a plugin that could run after htmlscrubber to insert the include, then you would only need to have something like [[include virtual="/dyn/demo/uptime"]] in your pages.
If you did not mind requring javscript, you could use HInclude instead of SSI.
To keep things running fast, we enable to caching on the pylons controller. using a modified version of the beakercache decorator. The following lines are inserted at the end of the createfunc method, which causes the page result to be cached in memcache as well as in beaker.
url = pylons.request.path_info if pylons.request.params: url += "?" + pylons.request.environ['QUERY_STRING'] mc = memcache.Client(['localhost']) mc.set(url, result, cache_expire)
The only remaining problem I see is a small race condition. If
the cache expires, and 20 concurrent requests all come in for the
page, most of them will end up hitting python instead of waiting
for the memcache key to appear. This might actually work better
using varnish or apache2 with mod_disk_cache, but the
last time I tried I could not get varnish to work at all, and
apache2 (I think) still does not support PURGE.
ikiwiki problem solved
Posted Thu Feb 14 21:36:03 2008I figured out the problem I was having with linking. I had to
move ramblings/index.mdwm to
ramblings.mdwn and change the pagespecs around a bit,
but now everything seems to work