BB.Net / ramblings / tags / tech - Justin's Ramblings

This feed contains pages in the "tech" category.

odd nmap timings

Posted Fri Aug 22 22:02:33 2008

Back story

A section on a web application I have pings (using a background AJAX request) a list of IP addresses. Most of the time all of these adresses are up, sometimes one or two of them are down. One day I noticed that if all of them were down, nmap would take much longer to ping them all.

The odd part

Lets ping 19 addresses on my home network, none of which exist.

justin@latitude:~$ time nmap  -sP 192.168.5.2-20 

Starting Nmap 4.53 ( http://insecure.org ) at 2008-08-22 21:56 EDT
Nmap done: 19 IP addresses (0 hosts up) scanned in 4.072 seconds

real    0m4.081s
user    0m0.068s
sys     0m0.004s

Ok... now lets add the routers address, which is pingable.

justin@latitude:~$ time nmap  -sP 192.168.5.1-19

Starting Nmap 4.53 ( http://insecure.org ) at 2008-08-22 21:58 EDT
Host router (192.168.5.1) appears to be up.
Nmap done: 19 IP addresses (1 host up) scanned in 2.258 seconds

real    0m2.259s
user    0m0.048s
sys     0m0.008s

Notice anything odd?

I have experimented with the usual host timeout and max rtt time options, but I am not sure what the problem is. As soon as I get a chance I will look into the code. I am not sure if it is a BUG or just user error. A simple strace of the two commands show much different 'select' behaviour.

Tags: tech

Python Evolution: From Script To Program

Posted Sat Jun 21 23:18:12 2008

The Evolution of a Python Programmer is funny, but it only covers one aspect of programming. Many times I will see code that is fine from a CS point of view, but absolutely horrible when it comes to program structure and module organization.

You often see people saying things like "Hello World in python is just 'print "Hello World"'", and that is true. It is very easy to get started writing python, but if you don't structure your modules correctly, you will be in a world of pain later on. It is something that can be hard to explain, since the results in the short term are the same, and it may not be clear at first why one way of doing things is better than the other.

Instead of Hello World, let's take the example of a program to get stock quotes. The actual implementation here is not relevant, pretend it contacts a web service or database or something.

A common case is the "python script". I HATE python scripts. "script" almost always ends up being a single file with no entry points, no main function, and mixes IO with logic.

s = raw_input("symbol:")
if s == 'MSFT':
    print 'price=', 28.23
elif s == 'GOOG':
    print 'price=', 546.43

The first step in fixing this is to define an actual function. Now you can import the module and run get_price().

def get_price():
    s = raw_input("symbol:")
    if s == 'MSFT':
        print 'price=', 28.23
    elif s == 'GOOG':
        print 'price=', 546.43

The (hopefully) obvious problem with this is that the IO is mixed in with the logic. What if you wanted to get the stock price for 1000 stocks and output a nice summary? This next version is slightly better, here the input is a proper parameter, but you still have no control over the output. You could get your 1000 quotes, but you would have no way to report on the output. Again, this should be obvious, but I come across code that does this way too often.

def get_price(s):
    if s == 'MSFT':
        print 'price=', 28.23
    elif s == 'GOOG':
        print 'price=', 546.43
###
if __name__ == "__main__":
    s = raw_input("symbol:")
    get_price(s)

The first respectable version adds a main() function that handles the input and output. The main function should also get the stock from the command line arguments, rather than interactively. I think you tend to see things like this more often from windows users, who like to double click on things rather than run them from a shell. You could probably write a whole book on this subject though :-)

def get_price(s):
    if s == 'MSFT':
        return 28.23
    elif s == 'GOOG':
        return 546.43
###
def main():
    s = raw_input("symbol:")
    print 'price=', get_price(s)

if __name__ == "__main__":
    main()

The final steps are to make a proper python package out of this module, but I'll save that for a later post.

Tags: tech

erlang basic distributed application

Posted Mon Jun 9 18:15:31 2008

Erlang with OTP is a fairly powerful framework for creating distributed redundant applications. The basic gen_server behavior can easily extended to create a redundant server with built in failover. With Mnesia you also get a replicated Database.

I've been trying to figure out how exactly this is supposed to work, so I've been working on a quick application to demonstrate this. It's nothing fancy, just a simple set(k,v) and get(k) API.

The files are available as a tarball from here ddict.tgz

-module(ddict).
-behaviour(gen_server).
-export([start/0,stop/0,terminate/2]).
-export([init/1, handle_call/3, handle_cast/2,handle_info/2]).
-export([create_schema/0]).
-export([get/1,set/2]).


-define(GD,{global, ddict}).

-include_lib("stdlib/include/qlc.hrl").
-record(rec, {key, value}).


init_mnesia() ->
    mnesia:start(),
    ok = mnesia:wait_for_tables([rec], 2000).

init(_Arg) ->
    process_flag(trap_exit, true),
    io:format("dict server starting~n"),
    init_mnesia(),
    {ok, []}.

start() ->
    gen_server:start_link(?GD, ddict, [], []).

stop() ->
    gen_server:cast(?GD, stop).

terminate(Reason, State) ->
   io:format("dict server terminating~n").

%"model" methods
do_get(Key) ->
    Res = mnesia:dirty_read({rec, Key}),
    case Res of 
        [] -> undefined;
        [Rec] -> Rec#rec.value
    end.

do_set(Key, Value) ->
    F = fun() ->
            Row = #rec{key=Key, value=Value},
            mnesia:write(Row)
        end,
    {atomic, ok} = mnesia:transaction(F),
    ok.

%"controller" methods
handle_call({get, Key}, From, State) ->
    Rec = do_get(Key),
    {reply, Rec, State};

handle_call({set, Key, Value}, From, State) ->
    Rec = do_set(Key, Value),
    {reply, Rec, State}.

handle_cast(stop, State) ->
    io:format("ddict server stopping~n"),
    {stop, normal, State}.

handle_info(Info, State) ->
    {noreply, State}.


%"client api" methods
get(Key) ->
   gen_server:call(?GD, {get, Key}).

set(Key,Value) ->
   gen_server:call(?GD, {set, Key, Value}).


create_schema() ->
    mnesia:create_schema([node()|nodes()]),
    mnesia:start(),
    %this is defnitely wrong
    lists:foreach(fun(N) ->
        io:format("starting mnesia on ~w~n", [N]),
        rpc:call(N, mnesia, start, [])
    end, nodes()),
    mnesia:create_table(rec, [
        {disc_copies, [node()|nodes()]},
        {attributes, record_info(fields, rec)}
    ]).

download file "main gen_server file"

The key to this server being distributed is the use of {global, ddict} as the server name, instead of {local, ddict}. This enables other nodes in the cluster to see this server.

do_get() and do_set() are the "model" like methods that deal with mnesia. handle_call defines the gen_server api. get() and set() are helper functions that call the remote gen server. If there was more to this module, it would be a good idea to put these methods in separate modules.

The one thing I am not sure about is the create_schema() method. I'm sure there is a propper way to initalize mnesia on a cluster, I just have no idea what it is yet :-)

To make this into a propper gen server the supervisor and application needs to be defined with the following three files:

-module(ddict_sup).
-behaviour(supervisor).

-export([start_link/0]).
-export([init/1]).

start_link() ->
    supervisor:start_link(ddict_sup, []).

init(_Args) ->
    {ok, {{one_for_one, 10, 60},
          [{ddict, {ddict, start, []},
            permanent, brutal_kill, worker, [ddict]}]}}.

download file "/ramblings/files/erlang/ddict/ddict_sup.erl"
-module(ddict_app).
-behaviour(application).

-export([start/2, stop/1,go/0]).

start(_Type, _Args) ->
    ddict_sup:start_link().

stop(_State) ->
    io:format("ddict server terminating~n"),
    ok.

go() ->
    application:start(ddict).

download file "/ramblings/files/erlang/ddict/ddict_app.erl"
{application, ddict,
[
    {mod, {ddict_app,[]}}
]}.

download file "/ramblings/files/erlang/ddict/ddict.app"

To get erlang to start this application on boot, a config file for each node needs to be written:

[{kernel,
  [{distributed, [{ddict, 3000, [one@media, {two@media}]}]},
   {sync_nodes_optional, [two@media]},
   {sync_nodes_timeout, 5000}
  ]
 }
].

download file "/ramblings/files/erlang/ddict/one.config"
[{kernel,
  [{distributed, [{ddict, 3000, [one@media, {two@media}]}]},
   {sync_nodes_optional, [one@media]},
   {sync_nodes_timeout, 5000}
  ]
 }
].

download file "/ramblings/files/erlang/ddict/two.config"

To create the initial database I ran the ddict:create_schema method, which I'm sure is completely incorrect, but it works:

erl -sname one -config one.config
erl -sname two -config two.config

(one@media)1> ddict:create_schema().
starting mnesia on two@media
{atomic,ok}
(one@media)2> mnesia:info().
...
running db nodes   = [two@media,one@media]
disc_copies        = [rec,schema]
[{one@media,disc_copies},{two@media,disc_copies}] = [schema,rec]
...
ok

download file "/ramblings/files/erlang/ddict/create_db.txt"

Once that is done, the application can be started with

erl  -pa . -sname one -config one.config -s ddict_app go
erl  -pa . -sname two -config two.config -s mnesia start -s ddict_app go

I have to start mnesia separately on the second VM because I haven't yet figured out how mnesia should be started when dealing with distributed applications. mnesia needs to be running on both nodes, but not the ddict application itself.

Once it is running, you can call ddict:set("Foo","bar") and ddict:get("Foo"). You can also kill either VM, and it will restart the server after 3 seconds on the other node.

Comments here

Tags: tech

using nmap for network monitoring

Posted Fri May 30 22:34:40 2008

The problem

You need to know if any of 900 IP addresses are unreachable. You also need to know this within about a minute of any outages. Nmap is primary a security tool, but it can be very helpful when it comes to monitoring as well.

fping

For years I used fping for this, here is an example of what it can do:

$ wc -l ips.txt 
900 ips.txt
$ time fping < ips.txt 
...
real    0m41.347s
user    0m0.028s
sys     0m0.248s

Not too bad.. 41 seconds to poll 900 devices. It actually seems to finish at around 35 seconds, and then sits there for a bit before exiting.

nmap

Now lets try with nmap. Nmap needs to be ran as root to allow it to send icmp packets, otherwise it will use connect(). In my tests it is actually faster when running in tcp mode, but some devices only respond to ICMP. (It would be best for security to put this into a nmap_ping helper script and put that in sudoers instead of allowing all nmap commands to be ran as root. It is probably also possible to use the capabilities system to just allow a normal user to send ICMP packets.)

$ time sudo nmap -n -sP -PE -iL ips.txt
...
real    0m3.961s
user    0m1.072s
sys     0m1.780s

Not bad at all, about 10 times faster than using fping!

Note in these examples, all of the addresses are pingable, so timeouts and retry times do not come into play. My monitoring system maintains separate lists of the reachable and unreachable devices, and pings them from different processes. This prevents unreachable devices from slowing down the normal process of making sure everything else is working. Currently the time between pings to a single device is about 8 seconds.

Tags: tech

building pig on debian

Posted Thu May 15 23:52:12 2008

I've been playing with Hadoop and Pig. It is some really neat technology.

I had a bunch of trouble getting pig to build though, it seems that this error from ant:

Could not create task or type of type: jjtree

Is caused by missing the 'ant-optional' package.

Tags: tech

how my dupe finding program works

Posted Thu Feb 21 23:41:03 2008

finding duplicate files

This post is about my duplicate finding program available under Programs. The program is a little bare, and needs a nicer API, but the method it uses is the most efficient one that I am aware of.

There are a couple of different ways you can find duplicate files:

Compute the hash of all the files, and look for duplicates

This method works well if the files on disk are mostly static, and files are added infrequently. In this case you can compute the hashes once, and keep it around for later scans. However, if you are only running the scan once, this method is not ideal since it requires you to read the full contents of every file

Compute the hash of files with the same size

This is the method that I think fdupes still uses. It first builds a candidate list of files that are the same size, and computes the checksum of each. This method works well if most of the files that are the same size are really duplicates, but otherwise triggers too much unneeded IO.

Compare all files with the same size in parallel

This is the method that my program uses. Like fdupes, I first built up a candidate list of files with the same size. Instead of hashing the files, it simply reads each file at the same time, comparing block by block. This is just like what the cmp(1) program does, but for multiple files at the same time. The benefit of this over calculating the files hash, is that as soon as the files differ, you can stop reading.

Implementation

There are a couple of things you need to keep in mind to implement this method.

Don't open too many files.

You have to be careful not to try and open too many files at once. If the user has 5,000 files that all have the same size, the program shouldn't try and open all 5,000 at once. My program uses a simple helper class to handle opening and closing files. The default blocksize in my program would probably waste a bit of memory in this case, but that is easily changed.

Correctly handle diverging sets.

Imagine the filesystem contains 4 files of the same size, 'a', 'b','c', and 'd', where a==c, and b==d. While reading through the files, it will become clear that a!=b, a==c, and a!=d. It is important that at this step the program continues searching using (a,c) and (b,d) as possible duplicates. This is implemented using recursion, the sets (a,c) and (b,d) are fed back into the duplicate finding function.

Example run, compared to fdupes.

Here is dupes.py running against fdupes on a modestly sized directory. Notice how dupes.py only needs to read 600K(not counting metadata).

According to iofileb.d from the dtrace toolkit, dupes.py reads 10M of data (which I think includes python), and fdupes reads 517M. This alone explains the 20x speedup seen in dupes.py

justin@pip:~$ du -hs $DIR
15G   $DIR

justin@pip:~$ time python code/dupes.py $DIR
2896 total files
35 size collisions, max of length 5
bytes read 647168

real    0m1.224s
user    0m0.234s
sys     0m0.494s

justin@pip:~$ time fdupes -r $DIR
real    0m41.694s
user    0m13.612s
sys     0m7.491s

justin@pip:~$ time python code/dupes.py $DIR
2896 total files
35 size collisions, max of length 5
bytes read 647168

real    0m3.662s
user    0m0.256s
sys     0m0.568s

justin@pip:~$ time fdupes -r $DIR
real    0m55.473s
user    0m11.383s
sys     0m6.433s

Tags: tech

regex with named groups

Posted Wed Feb 20 11:42:21 2008

As I mentioned in a comment at Some more tweaks to my Python script, there are a lot of ways you can use the re module. If you need to match multiple expressions against each line, you can build up a single regular expression that includes all the patterns, and used named groups to tell them apart.


import re
#if you were matching many of these it would be a good idea
#to make a function that simply fills in '%s>(?P<%s>[^<]+)<'
cpattern    = 'total_credit>(?P<credit>[^<]+)<'
opattern    = 'os_name>(?P<os>[^<]+)<'
pattern     = '(%s)|(%s)' % (cpattern, opattern)

search = re.compile(pattern).search

lines = [
    'blah blah blah total_credit>10< blah blah',
    'hkfhsd klfjhs dfkljsdfsl fds',
    'hkashflksd os_name>win< hhkjhdflksj d',
    'hkfhsd klfjhs dfkljsdfsl fds',
    'blah blah blah total_credit>20< blah blah',
]

for line in lines:
    r = search(line)
    if r:
        print r.groupdict()

Running this gives

{'credit': '10', 'os': None}
{'credit': None, 'os': 'win'}
{'credit': '20', 'os': None}

In this case you could even generalize the regular expression further, like so:

pattern     = '\s(?P<key>[^\s>]+)>(?P<value>[^<]+)<'

Running that (probably less than optimal) regular expression over the input gives

{'key': 'total_credit', 'value': '10'}
{'key': 'os_name', 'value': 'win'}
{'key': 'total_credit', 'value': '20'}

Tags: tech

xen live migration without shared storage

Posted Sat Feb 16 17:02:25 2008

The problem

The Xen documentation on live migration states:

Currently, there is no support for providing automatic remote access to filesystems stored on local disk when a domain is migrated. Administrators should choose an appropriate storage solution (i.e. SAN, NAS, etc.) to ensure that domain filesystems are also available on their destination node. GNBD is a good method for exporting a volume from one machine to another. iSCSI can do a similar job, but is more complex to set up.

This does not mean that it is impossible though. Live migration is a more efficient migration, and migration can be seen as a save on one node, and a restore on another. Normally, if you save a VM on one machine, and try to restore it on another machine, it will fail when it is unable to read its filesystems. But what would happen if you coppied the filesystem to the other node between the save and restore? If done right, it works pretty well.

The solution?

The solution is simple:

  • Save running image
  • Sync disks
  • copy image to other node, restore

This can be somewhat sped up by syncing the disks twice:

  • Sync disks
  • Save running image
  • Sync disks - only having to save any changes in the last few seconds
  • copy image to other node, restore

Syncronizing block devices

File backed

If you are using plain files as vbds, you can sync the disks using rsync.

Raw devices

If you are using raw devices, rsync can not be used. I wrote a small utility called blocksync which can syncronize 2 block devices over the network. In my testing it was easily able to max out the network on an initial sync, and max out the disk read speed on a resync.

$ blocksync.py /dev/xen/vm-root 1.2.3.4

Will sync /dev/xen/vm-root onto 1.2.3.4. The device should already exist on the destination and be the same size.

Solaris ZFS

If you are using ZFS, it should be possible to use zfs send to sync the block devices before migration. This would give an almost instantaneous sync time.

Automation

A simple script xen migrate.sh and its helper xen vbds.py will migrate a domain to another host. File and raw vbds are supported. ZFS send support is not yet implemented.

Example migration

#migrating a 1G / + 128M swap over the network
#physical machines are 350mhz with 64M of ram,
#total downtime is about 3 minutes

xen1:~# time ./migrate.sh test 192.168.1.2
+ '[' 2 -ne 2 ']'
+ DOMID=test
+ DSTHOST=192.168.1.2
++ xen_vbds.py test
+ FILES=/dev/xen/test-root
/dev/xen/test-swap
+ main
+ check_running
+ xm list test
Name              Id  Mem(MB)  CPU  State  Time(s)  Console
test              87       15    0  -b---      0.0    9687
+ sync_disk
+ blocksync.py /dev/xen/test-root 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576
same: 942, diff: 82, 1024/1024
+ blocksync.py /dev/xen/test-swap 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576
same: 128, diff: 0, 128/128
+ save_image
+ xm save test test.dump
+ sync_disk
+ blocksync.py /dev/xen/test-root 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576
same: 1019, diff: 5, 1024/1024
+ blocksync.py /dev/xen/test-swap 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576
same: 128, diff: 0, 128/128
+ copy_image
+ scp test.dump 192.168.1.2:
test.dump                                       100%   16MB   3.2MB/s   00:05
+ restore_image
+ ssh 192.168.1.2 'xm restore test.dump && rm test.dump'
(domain
    (id 89)
    [domain info stuff cut out]
)
+ rm test.dump

real    6m6.272s
user    1m29.610s
sys     0m30.930s


download file "/ramblings/files/example_migration.txt"
Tags: tech

dynamic ikiwiki pages

Posted Fri Feb 15 20:57:58 2008

The static pages that ikiwiki generates are great, but I want to have some dynamic content here as well.

If this works, this page should include the servers uptime.

13:49:09 up 88 days, 19:05, 0 users, load average: 0.00, 0.00, 0.00

yay :-)

So how does that work?

first configure nginx as follows

server {
    listen       80;
    server_name  bouncybouncy.net  *.bouncybouncy.net web;

    location / {
        root   /home/justin/bbdotnet/static/;
        index  index.html index.htm;
        ssi on;
    }
    location /dyn {
        # All POST requests go to pylons directly
        include /usr/local/nginx/conf/proxy.conf;
        proxy_redirect  default; 
        if ($request_method = POST) {
            proxy_pass  http://127.0.0.1:5000;
            break;
        }
        default_type text/html; 

        set $memcached_key "$uri";
        memcached_pass localhost:11211;

        proxy_intercept_errors  on;

        # If no info would be found in memcache or memecache would be dead, go to real dynamic location
        error_page 404 502 = @dynamic_request;
    }
    location @dynamic_request{
        # This means, that we can't get to this location from outside - only by internal redirect
        internal;

        include /usr/local/nginx/conf/proxy.conf;
        proxy_redirect  default; 
        proxy_pass  http://127.0.0.1:5000;
    }

}

Pylons is setup to run on port 5000 as usual, nothing fancy there.

Then anywhere we want some dynamic content we can simply do

<!--# include virtual="/dyn/demo/uptime" -->

For now, you have to disable the htmlscrubber plugin for this to work. There is probably a better solution. I think this would simply involve a plugin that could run after htmlscrubber to insert the include, then you would only need to have something like [[include virtual="/dyn/demo/uptime"]] in your pages.

If you did not mind requring javscript, you could use HInclude instead of SSI.

To keep things running fast, we enable to caching on the pylons controller. using a modified version of the beakercache decorator. The following lines are inserted at the end of the createfunc method, which causes the page result to be cached in memcache as well as in beaker.

url = pylons.request.path_info
if pylons.request.params:
    url += "?" + pylons.request.environ['QUERY_STRING']

mc = memcache.Client(['localhost'])
mc.set(url, result, cache_expire)

The only remaining problem I see is a small race condition. If the cache expires, and 20 concurrent requests all come in for the page, most of them will end up hitting python instead of waiting for the memcache key to appear. This might actually work better using varnish or apache2 with mod_disk_cache, but the last time I tried I could not get varnish to work at all, and apache2 (I think) still does not support PURGE.

Tags: tech

ikiwiki problem solved

Posted Thu Feb 14 21:36:03 2008

I figured out the problem I was having with linking. I had to move ramblings/index.mdwm to ramblings.mdwn and change the pagespecs around a bit, but now everything seems to work

Tags: tech