<?xml version="1.0"?>
<rss version="2.0"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:dcterms="http://purl.org/dc/terms/" >
<channel>
<title>Justin&#x27;s Ramblings</title>
<link>http://bouncybouncy.net//ramblings/</link>
<description>BB.Net</description>
<item>
	
	<title>odd nmap timings</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/odd_nmap_timings/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/odd_nmap_timings/</link>
	
	
	<category>tags/tech</category>
	
	
	<pubDate>Fri, 22 Aug 2008 22:02:33 -0400</pubDate>
	<dcterms:modified>2008-09-09T00:45:49Z</dcterms:modified>
	
	<description><![CDATA[<h2>Back story</h2>
<p>A section on a web application I have pings (using a background
AJAX request) a list of IP addresses. Most of the time all of these
adresses are up, sometimes one or two of them are down. One day I
noticed that if all of them were down, nmap would take much longer
to ping them all.</p>
<h2>The odd part</h2>
<p>Lets ping 19 addresses on my home network, none of which
exist.</p>
<div class="syntax">
<pre>
justin@latitude:~$ time nmap  -sP 192.168.5.2-20 

Starting Nmap 4.53 ( http://insecure.org ) at 2008-08-22 21:56 EDT
Nmap done: 19 IP addresses (0 hosts up) scanned in 4.072 seconds

real    0m4.081s
user    0m0.068s
sys     0m0.004s

</pre></div>
<p>Ok... now lets add the routers address, which is pingable.</p>
<div class="syntax">
<pre>
justin@latitude:~$ time nmap  -sP 192.168.5.1-19

Starting Nmap 4.53 ( http://insecure.org ) at 2008-08-22 21:58 EDT
Host router (192.168.5.1) appears to be up.
Nmap done: 19 IP addresses (1 host up) scanned in 2.258 seconds

real    0m2.259s
user    0m0.048s
sys     0m0.008s

</pre></div>
<p>Notice anything odd?</p>
<p>I have experimented with the usual host timeout and max rtt time
options, but I am not sure what the problem is. As soon as I get a
chance I will look into the code. I am not sure if it is a BUG or
just user error. A simple strace of the two commands show much
different 'select' behaviour.</p>

]]></description>
	
</item>
<item>
	
	<title>Python Evolution: From Script To Program</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/python_evolution_from_script_to_program/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/python_evolution_from_script_to_program/</link>
	
	
	<category>tags/python</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Sat, 21 Jun 2008 23:18:12 -0400</pubDate>
	<dcterms:modified>2008-06-22T15:12:26Z</dcterms:modified>
	
	<description><![CDATA[<p><a href=
"http://forums.thedailywtf.com/forums/p/6978/132159.aspx">The
Evolution of a Python Programmer</a> is funny, but it only covers
one aspect of programming. Many times I will see code that is fine
from a CS point of view, but absolutely horrible when it comes to
program structure and module organization.</p>
<p>You often see people saying things like "Hello World in python
is just 'print "Hello World"'", and that is true. It is very easy
to get started writing python, but if you don't structure your
modules correctly, you will be in a world of pain later on. It is
something that can be hard to explain, since the results in the
short term are the same, and it may not be clear at first why one
way of doing things is better than the other.</p>
<p>Instead of Hello World, let's take the example of a program to
get stock quotes. The actual implementation here is not relevant,
pretend it contacts a web service or database or something.</p>
<p>A common case is the "python script". I HATE python scripts.
"script" almost always ends up being a single file with no entry
points, no main function, and mixes IO with logic.</p>
<div class="syntax">
<pre>
s = raw_input("<span class="synConstant">symbol:</span>")
<span class="synStatement">if</span> s == '<span class=
"synConstant">MSFT</span>':
    <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', 28.23
<span class="synStatement">elif</span> s == '<span class=
"synConstant">GOOG</span>':
    <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', 546.43

</pre></div>
<p>The first step in fixing this is to define an actual function.
Now you can import the module and run get_price().</p>
<div class="syntax">
<pre>
<span class="synStatement">def</span> <span class=
"synIdentifier">get_price</span>():
    s = raw_input("<span class="synConstant">symbol:</span>")
    <span class="synStatement">if</span> s == '<span class=
"synConstant">MSFT</span>':
        <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', 28.23
    <span class="synStatement">elif</span> s == '<span class=
"synConstant">GOOG</span>':
        <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', 546.43

</pre></div>
<p>The (hopefully) obvious problem with this is that the IO is
mixed in with the logic. What if you wanted to get the stock price
for 1000 stocks and output a nice summary? This next version is
slightly better, here the input is a proper parameter, but you
still have no control over the output. You could get your 1000
quotes, but you would have no way to report on the output. Again,
this should be obvious, but I come across code that does this way
too often.</p>
<div class="syntax">
<pre>
<span class="synStatement">def</span> <span class=
"synIdentifier">get_price</span>(s):
    <span class="synStatement">if</span> s == '<span class=
"synConstant">MSFT</span>':
        <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', 28.23
    <span class="synStatement">elif</span> s == '<span class=
"synConstant">GOOG</span>':
        <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', 546.43
<span class="synComment">###</span>
<span class="synStatement">if</span> __name__ == "<span class=
"synConstant">__main__</span>":
    s = raw_input("<span class="synConstant">symbol:</span>")
    get_price(s)

</pre></div>
<p>The first respectable version adds a main() function that
handles the input and output. The main function should also get the
stock from the command line arguments, rather than interactively. I
think you tend to see things like this more often from windows
users, who like to double click on things rather than run them from
a shell. You could probably write a whole book on this subject
though <img src="http://bouncybouncy.net//ramblings/../smileys/smile.png" alt=":-)" /></p>
<div class="syntax">
<pre>
<span class="synStatement">def</span> <span class=
"synIdentifier">get_price</span>(s):
    <span class="synStatement">if</span> s == '<span class=
"synConstant">MSFT</span>':
        <span class="synStatement">return</span> 28.23
    <span class="synStatement">elif</span> s == '<span class=
"synConstant">GOOG</span>':
        <span class="synStatement">return</span> 546.43
<span class="synComment">###</span>
<span class="synStatement">def</span> <span class=
"synIdentifier">main</span>():
    s = raw_input("<span class="synConstant">symbol:</span>")
    <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', get_price(s)

<span class="synStatement">if</span> __name__ == "<span class=
"synConstant">__main__</span>":
    main()

</pre></div>
<p>The final steps are to make a proper python package out of this
module, but I'll save that for a later post.</p>

]]></description>
	
</item>
<item>
	
	<title>erlang basic distributed application</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/erlang_basic_distributed_application/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/erlang_basic_distributed_application/</link>
	
	
	<category>tags/erlang</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Mon, 09 Jun 2008 18:15:31 -0400</pubDate>
	<dcterms:modified>2008-06-10T11:56:22Z</dcterms:modified>
	
	<description><![CDATA[<p><a href=
"http://erlang.org/doc/design_principles/part_frame.html">Erlang
with OTP</a> is a fairly powerful framework for creating
distributed redundant applications. The basic
<code>gen_server</code> behavior can easily extended to create a
redundant server with built in failover. With <a href=
"http://www.erlang.org/doc/apps/mnesia/index.html">Mnesia</a> you
also get a replicated Database.</p>
<p>I've been trying to figure out how exactly this is supposed to
work, so I've been working on a quick application to demonstrate
this. It's nothing fancy, just a simple set(k,v) and get(k)
API.</p>
<p>The files are available as a tarball from here <a href="http://bouncybouncy.net//ramblings/files/erlang/ddict.tgz">ddict.tgz</a></p>
<div class="syntax">
<pre>
<span class="synType">-module</span>(ddict)<span class=
"synSpecial">.</span>
<span class=
"synStatement">-</span>behaviour(gen_server)<span class="synSpecial">.</span>
<span class="synType">-export</span>([start<span class=
"synStatement">/</span><span class=
"synConstant">0</span>,stop<span class=
"synStatement">/</span><span class=
"synConstant">0</span>,terminate<span class=
"synStatement">/</span><span class=
"synConstant">2</span>])<span class="synSpecial">.</span>
<span class="synType">-export</span>([init<span class=
"synStatement">/</span><span class=
"synConstant">1</span>, handle_call<span class=
"synStatement">/</span><span class=
"synConstant">3</span>, handle_cast<span class=
"synStatement">/</span><span class=
"synConstant">2</span>,handle_info<span class=
"synStatement">/</span><span class=
"synConstant">2</span>])<span class="synSpecial">.</span>
<span class="synType">-export</span>([create_schema<span class=
"synStatement">/</span><span class=
"synConstant">0</span>])<span class="synSpecial">.</span>
<span class="synType">-export</span>([<span class=
"synIdentifier">get</span><span class=
"synStatement">/</span><span class=
"synConstant">1</span>,set<span class=
"synStatement">/</span><span class=
"synConstant">2</span>])<span class="synSpecial">.</span>


<span class=
"synType">-define</span>(GD,{global, ddict})<span class="synSpecial">.</span>

<span class="synType">-include</span><span class=
"synSpecial">_</span>lib(<span class=
"synConstant">"stdlib/include/qlc.hrl"</span>)<span class=
"synSpecial">.</span>
<span class="synType">-record</span>(rec, {key, value})<span class=
"synSpecial">.</span>


init_mnesia() <span class="synStatement">-&gt;</span>
    <span class="synIdentifier">mnesia</span><span class=
"synSpecial">:</span><span class="synIdentifier">start</span>(),
    ok <span class="synStatement">=</span> <span class=
"synIdentifier">mnesia</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">wait</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">for</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">tables</span>([rec], 2000)<span class=
"synSpecial">.</span>

init(<span class="synSpecial">_</span>Arg) <span class=
"synStatement">-&gt;</span>
    <span class="synIdentifier">process_flag</span>(<span class=
"synSpecial">trap_exit</span>, <span class=
"synStatement">true</span>),
    <span class="synIdentifier">io</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">format</span>(<span class=
"synConstant">"dict server starting</span><span class=
"synSpecial">~n</span><span class="synConstant">"</span>),
    init_mnesia(),
    {ok, []}<span class="synSpecial">.</span>

start() <span class="synStatement">-&gt;</span>
    <span class="synIdentifier">gen</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">server</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">start</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">link</span>(?GD, ddict, [], [])<span class=
"synSpecial">.</span>

stop() <span class="synStatement">-&gt;</span>
    <span class="synIdentifier">gen</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">server</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">cast</span>(?GD, stop)<span class=
"synSpecial">.</span>

terminate(Reason, State) <span class="synStatement">-&gt;</span>
   <span class="synIdentifier">io</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">format</span>(<span class=
"synConstant">"dict server terminating</span><span class=
"synSpecial">~n</span><span class=
"synConstant">"</span>)<span class="synSpecial">.</span>

<span class="synComment">%"model" methods</span>
do_get(Key) <span class="synStatement">-&gt;</span>
    Res <span class="synStatement">=</span> <span class=
"synIdentifier">mnesia</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">dirty</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">read</span>({rec, Key}),
    <span class="synStatement">case</span> Res <span class=
"synStatement">of</span> 
        [] <span class="synStatement">-&gt;</span> undefined;
        [Rec] <span class=
"synStatement">-&gt;</span> Rec#rec<span class=
"synSpecial">.</span>value
    <span class="synStatement">end</span><span class=
"synSpecial">.</span>

do_set(Key, Value) <span class="synStatement">-&gt;</span>
    F <span class="synStatement">=</span> <span class=
"synStatement">fun</span>() <span class="synStatement">-&gt;</span>
            Row <span class=
"synStatement">=</span> #rec{key<span class=
"synStatement">=</span>Key, value<span class=
"synStatement">=</span>Value},
            <span class="synIdentifier">mnesia</span><span class=
"synSpecial">:</span><span class="synIdentifier">write</span>(Row)
        <span class="synStatement">end</span>,
    {atomic, ok} <span class="synStatement">=</span> <span class=
"synIdentifier">mnesia</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">transaction</span>(F),
    ok<span class="synSpecial">.</span>

<span class="synComment">%"controller" methods</span>
handle_call({<span class=
"synIdentifier">get</span>, Key}, From, State) <span class=
"synStatement">-&gt;</span>
    Rec <span class="synStatement">=</span> do_get(Key),
    {reply, Rec, State};

handle_call({set, Key, Value}, From, State) <span class=
"synStatement">-&gt;</span>
    Rec <span class="synStatement">=</span> do_set(Key, Value),
    {reply, Rec, State}<span class="synSpecial">.</span>

handle_cast(stop, State) <span class="synStatement">-&gt;</span>
    <span class="synIdentifier">io</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">format</span>(<span class=
"synConstant">"ddict server stopping</span><span class=
"synSpecial">~n</span><span class="synConstant">"</span>),
    {stop, <span class=
"synStatement">normal</span>, State}<span class=
"synSpecial">.</span>

handle_info(Info, State) <span class="synStatement">-&gt;</span>
    {noreply, State}<span class="synSpecial">.</span>


<span class="synComment">%"client api" methods</span>
<span class="synIdentifier">get</span>(Key) <span class=
"synStatement">-&gt;</span>
   <span class="synIdentifier">gen</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">server</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">call</span>(?GD, {<span class=
"synIdentifier">get</span>, Key})<span class="synSpecial">.</span>

set(Key,Value) <span class="synStatement">-&gt;</span>
   <span class="synIdentifier">gen</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">server</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">call</span>(?GD, {set, Key, Value})<span class=
"synSpecial">.</span>


create_schema() <span class="synStatement">-&gt;</span>
    <span class="synIdentifier">mnesia</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">create</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">schema</span>([<span class=
"synSpecial">node</span>()|<span class=
"synIdentifier">nodes</span>()]),
    <span class="synIdentifier">mnesia</span><span class=
"synSpecial">:</span><span class="synIdentifier">start</span>(),
    <span class="synComment">%this is defnitely wrong</span>
    <span class="synIdentifier">lists</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">foreach</span>(<span class=
"synStatement">fun</span>(N) <span class=
"synStatement">-&gt;</span>
        <span class="synIdentifier">io</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">format</span>(<span class=
"synConstant">"starting mnesia on </span><span class=
"synSpecial">~w~n</span><span class="synConstant">"</span>, [N]),
        <span class="synIdentifier">rpc</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">call</span>(N, mnesia, start, [])
    <span class="synStatement">end</span>, <span class=
"synIdentifier">nodes</span>()),
    <span class="synIdentifier">mnesia</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">create</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">table</span>(rec, [
        {disc_copies, [<span class=
"synSpecial">node</span>()|<span class=
"synIdentifier">nodes</span>()]},
        {attributes, <span class=
"synStatement">record_info</span>(fields, rec)}
    ])<span class="synSpecial">.</span>

</pre>
<span class="synTitle"><a href="http://bouncybouncy.net//ramblings/../../files/erlang/ddict/ddict.erl">download file "main gen_server
file"</a></span></div>
<p>The key to this server being distributed is the use of {global,
ddict} as the server name, instead of {local, ddict}. This enables
other nodes in the cluster to see this server.</p>
<p><code>do_get()</code> and <code>do_set()</code> are the "model"
like methods that deal with mnesia. <code>handle_call</code>
defines the <code>gen_server</code> api. get() and set() are helper
functions that call the remote gen server. If there was more to
this module, it would be a good idea to put these methods in
separate modules.</p>
<p>The one thing I am not sure about is the
<code>create_schema()</code> method. I'm sure there is a propper
way to initalize mnesia on a cluster, I just have no idea what it
is yet <img src="http://bouncybouncy.net//ramblings/../smileys/smile.png" alt=":-)" /></p>
<p>To make this into a propper gen server the supervisor and
application needs to be defined with the following three files:</p>
<div class="syntax">
<pre>
<span class="synType">-module</span>(ddict_sup)<span class=
"synSpecial">.</span>
<span class=
"synStatement">-</span>behaviour(supervisor)<span class="synSpecial">.</span>

<span class="synType">-export</span>([start_link<span class=
"synStatement">/</span><span class=
"synConstant">0</span>])<span class="synSpecial">.</span>
<span class="synType">-export</span>([init<span class=
"synStatement">/</span><span class=
"synConstant">1</span>])<span class="synSpecial">.</span>

start_link() <span class="synStatement">-&gt;</span>
    <span class="synIdentifier">supervisor</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">start</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">link</span>(ddict_sup, [])<span class=
"synSpecial">.</span>

init(<span class="synSpecial">_</span>Args) <span class=
"synStatement">-&gt;</span>
    {ok, {{one_for_one, 10, 60},
          [{ddict, {ddict, start, []},
            permanent, brutal_kill, worker, [ddict]}]}}<span class=
"synSpecial">.</span>

</pre>
<span class="synTitle"><a href="http://bouncybouncy.net//ramblings/../../files/erlang/ddict/ddict_sup.erl">download file
"/ramblings/files/erlang/ddict/ddict_sup.erl"</a></span></div>
<div class="syntax">
<pre>
<span class="synType">-module</span>(ddict_app)<span class=
"synSpecial">.</span>
<span class=
"synStatement">-</span>behaviour(application)<span class=
"synSpecial">.</span>

<span class="synType">-export</span>([start<span class=
"synStatement">/</span><span class=
"synConstant">2</span>, stop<span class=
"synStatement">/</span><span class=
"synConstant">1</span>,go<span class=
"synStatement">/</span><span class=
"synConstant">0</span>])<span class="synSpecial">.</span>

start(<span class="synSpecial">_</span>Type, <span class=
"synSpecial">_</span>Args) <span class="synStatement">-&gt;</span>
    <span class="synIdentifier">ddict</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">sup</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">start</span><span class=
"synSpecial">_</span><span class=
"synIdentifier">link</span>()<span class="synSpecial">.</span>

stop(<span class="synSpecial">_</span>State) <span class=
"synStatement">-&gt;</span>
    <span class="synIdentifier">io</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">format</span>(<span class=
"synConstant">"ddict server terminating</span><span class=
"synSpecial">~n</span><span class="synConstant">"</span>),
    ok<span class="synSpecial">.</span>

go() <span class="synStatement">-&gt;</span>
    <span class="synIdentifier">application</span><span class=
"synSpecial">:</span><span class=
"synIdentifier">start</span>(ddict)<span class=
"synSpecial">.</span>

</pre>
<span class="synTitle"><a href="http://bouncybouncy.net//ramblings/../../files/erlang/ddict/ddict_app.erl">download file
"/ramblings/files/erlang/ddict/ddict_app.erl"</a></span></div>
<div class="syntax">
<pre>
{application, ddict,
[
    {mod, {ddict_app,[]}}
]}.

</pre>
<span class="synTitle"><a href="http://bouncybouncy.net//ramblings/../../files/erlang/ddict/ddict.app">download file
"/ramblings/files/erlang/ddict/ddict.app"</a></span></div>
<p>To get erlang to start this application on boot, a config file
for each node needs to be written:</p>
<div class="syntax">
<pre>
[{kernel,
  [{distributed, [{ddict, 3000, [one@media, {two@media}]}]},
   {sync_nodes_optional, [two@media]},
   {sync_nodes_timeout, 5000}
  ]
 }
].

</pre>
<span class="synTitle"><a href="http://bouncybouncy.net//ramblings/../../files/erlang/ddict/one.config">download file
"/ramblings/files/erlang/ddict/one.config"</a></span></div>
<div class="syntax">
<pre>
[{kernel,
  [{distributed, [{ddict, 3000, [one@media, {two@media}]}]},
   {sync_nodes_optional, [one@media]},
   {sync_nodes_timeout, 5000}
  ]
 }
].

</pre>
<span class="synTitle"><a href="http://bouncybouncy.net//ramblings/../../files/erlang/ddict/two.config">download file
"/ramblings/files/erlang/ddict/two.config"</a></span></div>
<p>To create the initial database I ran the
<code>ddict:create_schema</code> method, which I'm sure is
completely incorrect, but it works:</p>
<div class="syntax">
<pre>
erl -sname one -config one.config
erl -sname two -config two.config

(one@media)1&gt; ddict:create_schema().
starting mnesia on two@media
{atomic,ok}
(one@media)2&gt; mnesia:info().
...
running db nodes   = [two@media,one@media]
disc_copies        = [rec,schema]
[{one@media,disc_copies},{two@media,disc_copies}] = [schema,rec]
...
ok

</pre>
<span class="synTitle"><a href="http://bouncybouncy.net//ramblings/../../files/erlang/ddict/create_db.txt">download file
"/ramblings/files/erlang/ddict/create_db.txt"</a></span></div>
<p>Once that is done, the application can be started with</p>
<div class="syntax">
<pre>
erl  -pa . -sname one -config one.config -s ddict_app go
erl  -pa . -sname two -config two.config -s mnesia start -s ddict_app go

</pre></div>
<p>I have to start mnesia separately on the second VM because I
haven't yet figured out how mnesia should be started when dealing
with distributed applications. mnesia needs to be running on both
nodes, but not the ddict application itself.</p>
<p>Once it is running, you can call ddict:set("Foo","bar") and
ddict:get("Foo"). You can also kill either VM, and it will restart
the server after 3 seconds on the other node.</p>
<p><a href="http://www.reddit.com/info/6mois/comments/">Comments
here</a></p>

]]></description>
	
</item>
<item>
	
	<title>using nmap for network monitoring</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/using_nmap_for_network_monitoring/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/using_nmap_for_network_monitoring/</link>
	
	
	<category>tags/tech</category>
	
	
	<pubDate>Fri, 30 May 2008 22:34:40 -0400</pubDate>
	<dcterms:modified>2008-05-31T02:50:02Z</dcterms:modified>
	
	<description><![CDATA[<h2>The problem</h2>
<p>You need to know if any of 900 IP addresses are unreachable. You
also need to know this within about a minute of any outages. Nmap
is primary a security tool, but it can be very helpful when it
comes to monitoring as well.</p>
<h3>fping</h3>
<p>For years I used fping for this, here is an example of what it
can do:</p>
<div class="syntax">
<pre>
$ wc -l ips.txt 
900 ips.txt
$ time fping &lt; ips.txt 
...
real    0m41.347s
user    0m0.028s
sys     0m0.248s

</pre></div>
<p>Not too bad.. 41 seconds to poll 900 devices. It actually seems
to finish at around 35 seconds, and then sits there for a bit
before exiting.</p>
<h3>nmap</h3>
<p>Now lets try with nmap. Nmap needs to be ran as root to allow it
to send icmp packets, otherwise it will use connect(). In my tests
it is actually faster when running in tcp mode, but some devices
only respond to ICMP. (It would be best for security to put this
into a nmap_ping helper script and put that in sudoers instead of
allowing all nmap commands to be ran as root. It is probably also
possible to use the capabilities system to just allow a normal user
to send ICMP packets.)</p>
<div class="syntax">
<pre>
$ time sudo nmap -n -sP -PE -iL ips.txt
...
real    0m3.961s
user    0m1.072s
sys     0m1.780s

</pre></div>
<p>Not bad at all, about 10 times faster than using fping!</p>
<p>Note in these examples, all of the addresses are pingable, so
timeouts and retry times do not come into play. My monitoring
system maintains separate lists of the reachable and unreachable
devices, and pings them from different processes. This prevents
unreachable devices from slowing down the normal process of making
sure everything else is working. Currently the time between pings
to a single device is about 8 seconds.</p>

]]></description>
	
</item>
<item>
	
	<title>building pig on debian</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/building_pig_on_debian/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/building_pig_on_debian/</link>
	
	
	<category>tags/hadoop</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Thu, 15 May 2008 23:52:12 -0400</pubDate>
	<dcterms:modified>2008-05-16T03:52:12Z</dcterms:modified>
	
	<description><![CDATA[<p>I've been playing with <a href=
"http://hadoop.apache.org/">Hadoop</a> and <a href=
"http://incubator.apache.org/pig/">Pig</a>. It is some really neat
technology.</p>
<p>I had a bunch of trouble getting pig to build though, it seems
that this error from ant:</p>
<p>Could not create task or type of type: jjtree</p>
<p>Is caused by missing the 'ant-optional' package.</p>

]]></description>
	
</item>
<item>
	
	<title>nin theslip</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/nin_theslip/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/nin_theslip/</link>
	
	<pubDate>Fri, 09 May 2008 01:34:51 -0400</pubDate>
	<dcterms:modified>2008-05-09T05:34:51Z</dcterms:modified>
	
	<description><![CDATA[<p><a href="http://dl.nin.com/theslip/signup">This is so beyond
awesome.</a></p>
<p>01:31:48 (1.13 MB/s) - `nin<em>theslip</em>mp3.zip saved
[90209737/90209737]</p>

]]></description>
	
</item>
<item>
	
	<title>how my dupe finding program works</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/how_my_dupe_finding_program_works/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/how_my_dupe_finding_program_works/</link>
	
	
	<category>tags/python</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Thu, 21 Feb 2008 23:41:03 -0500</pubDate>
	<dcterms:modified>2008-02-22T04:59:18Z</dcterms:modified>
	
	<description><![CDATA[<h2>finding duplicate files</h2>
<p>This post is about my duplicate finding program available under
<a href="http://bouncybouncy.net//ramblings/../programs/">Programs</a>. The program is a little bare,
and needs a nicer API, but the method it uses is the most efficient
one that I am aware of.</p>
<p>There are a couple of different ways you can find duplicate
files:</p>
<h3>Compute the hash of all the files, and look for duplicates</h3>
<p>This method works well if the files on disk are mostly static,
and files are added infrequently. In this case you can compute the
hashes once, and keep it around for later scans. However, if you
are only running the scan once, this method is not ideal since it
requires you to read the full contents of every file</p>
<h3>Compute the hash of files with the same size</h3>
<p>This is the method that I think fdupes still uses. It first
builds a candidate list of files that are the same size, and
computes the checksum of each. This method works well if most of
the files that are the same size are really duplicates, but
otherwise triggers too much unneeded IO.</p>
<h3>Compare all files with the same size in parallel</h3>
<p>This is the method that my program uses. Like fdupes, I first
built up a candidate list of files with the same size. Instead of
hashing the files, it simply reads each file at the same time,
comparing block by block. This is just like what the
<em>cmp(1)</em> program does, but for multiple files at the same
time. The benefit of this over calculating the files hash, is that
as soon as the files differ, you can stop reading.</p>
<h2>Implementation</h2>
<p>There are a couple of things you need to keep in mind to
implement this method.</p>
<h3>Don't open too many files.</h3>
<p>You have to be careful not to try and open too many files at
once. If the user has 5,000 files that all have the same size, the
program shouldn't try and open all 5,000 at once. My program uses a
simple helper class to handle opening and closing files. The
default blocksize in my program would probably waste a bit of
memory in this case, but that is easily changed.</p>
<h3>Correctly handle diverging sets.</h3>
<p>Imagine the filesystem contains 4 files of the same size, 'a',
'b','c', and 'd', where a==c, and b==d. While reading through the
files, it will become clear that a!=b, a==c, and a!=d. It is
important that at this step the program continues searching using
(a,c) and (b,d) as possible duplicates. This is implemented using
recursion, the sets (a,c) and (b,d) are fed back into the duplicate
finding function.</p>
<h2>Example run, compared to fdupes.</h2>
<p>Here is dupes.py running against fdupes on a modestly sized
directory. Notice how dupes.py only needs to read 600K(not counting
metadata).</p>
<p>According to iofileb.d from the dtrace toolkit, dupes.py reads
10M of data (which I think includes python), and fdupes reads 517M.
This alone explains the 20x speedup seen in dupes.py</p>
<div class="syntax">
<pre>
justin@pip:~$ du -hs $DIR
15G   $DIR

justin@pip:~$ time python code/dupes.py $DIR
2896 total files
35 size collisions, max of length 5
bytes read 647168

real    0m1.224s
user    0m0.234s
sys     0m0.494s

justin@pip:~$ time fdupes -r $DIR
real    0m41.694s
user    0m13.612s
sys     0m7.491s

justin@pip:~$ time python code/dupes.py $DIR
2896 total files
35 size collisions, max of length 5
bytes read 647168

real    0m3.662s
user    0m0.256s
sys     0m0.568s

justin@pip:~$ time fdupes -r $DIR
real    0m55.473s
user    0m11.383s
sys     0m6.433s

</pre></div>

]]></description>
	
</item>
<item>
	
	<title>regex with named groups</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/regex_with_named_groups/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/regex_with_named_groups/</link>
	
	
	<category>tags/python</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Wed, 20 Feb 2008 11:42:21 -0500</pubDate>
	<dcterms:modified>2008-02-20T17:00:38Z</dcterms:modified>
	
	<description><![CDATA[<p>As I mentioned in a comment at <a href=
"http://handyfloss.wordpress.com/2008/02/19/some-more-tweaks-to-my-python-script/">
Some more tweaks to my Python script</a>, there are a lot of ways
you can use the re module. If you need to match multiple
expressions against each line, you can build up a single regular
expression that includes all the patterns, and used named groups to
tell them apart.</p>
<div class="syntax">
<pre>

<span class="synPreProc">import</span> re
<span class=
"synComment">#if you were matching many of these it would be a good idea</span>
<span class=
"synComment">#to make a function that simply fills in '%s&gt;(?P&lt;%s&gt;[^&lt;]+)&lt;'</span>
cpattern    = '<span class=
"synConstant">total_credit&gt;(?P&lt;credit&gt;[^&lt;]+)&lt;</span>'
opattern    = '<span class=
"synConstant">os_name&gt;(?P&lt;os&gt;[^&lt;]+)&lt;</span>'
pattern     = '<span class=
"synConstant">(%s)|(%s)</span>' % (cpattern, opattern)

search = re.compile(pattern).search

lines = [
    '<span class=
"synConstant">blah blah blah total_credit&gt;10&lt; blah blah</span>',
    '<span class=
"synConstant">hkfhsd klfjhs dfkljsdfsl fds</span>',
    '<span class=
"synConstant">hkashflksd os_name&gt;win&lt; hhkjhdflksj d</span>',
    '<span class=
"synConstant">hkfhsd klfjhs dfkljsdfsl fds</span>',
    '<span class=
"synConstant">blah blah blah total_credit&gt;20&lt; blah blah</span>',
]

<span class="synStatement">for</span> line <span class=
"synStatement">in</span> lines:
    r = search(line)
    <span class="synStatement">if</span> r:
        <span class="synStatement">print</span> r.groupdict()

</pre></div>
<p>Running this gives</p>
<div class="syntax">
<pre>
{'<span class="synConstant">credit</span>': '<span class=
"synConstant">10</span>', '<span class=
"synConstant">os</span>': None}
{'<span class="synConstant">credit</span>': None, '<span class=
"synConstant">os</span>': '<span class="synConstant">win</span>'}
{'<span class="synConstant">credit</span>': '<span class=
"synConstant">20</span>', '<span class=
"synConstant">os</span>': None}

</pre></div>
<p>In this case you could even generalize the regular expression
further, like so:</p>
<div class="syntax">
<pre>
pattern     = '<span class=
"synConstant">\s(?P&lt;key&gt;[^\s&gt;]+)&gt;(?P&lt;value&gt;[^&lt;]+)&lt;</span>'

</pre></div>
<p>Running that (probably less than optimal) regular expression
over the input gives</p>
<div class="syntax">
<pre>
{'<span class="synConstant">key</span>': '<span class=
"synConstant">total_credit</span>', '<span class=
"synConstant">value</span>': '<span class="synConstant">10</span>'}
{'<span class="synConstant">key</span>': '<span class=
"synConstant">os_name</span>', '<span class=
"synConstant">value</span>': '<span class=
"synConstant">win</span>'}
{'<span class="synConstant">key</span>': '<span class=
"synConstant">total_credit</span>', '<span class=
"synConstant">value</span>': '<span class="synConstant">20</span>'}

</pre></div>

]]></description>
	
</item>
<item>
	
	<title>xen live migration without shared storage</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/xen_live_migration_without_shared_storage/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/xen_live_migration_without_shared_storage/</link>
	
	
	<category>tags/tech</category>
	
	<category>tags/xen</category>
	
	
	<pubDate>Sat, 16 Feb 2008 17:02:25 -0500</pubDate>
	<dcterms:modified>2008-02-19T03:51:14Z</dcterms:modified>
	
	<description><![CDATA[<h2>The problem</h2>
<p>The <a href=
"http://www.cl.cam.ac.uk/research/srg/netos/xen/readmes/user/user.html#SECTION03520000000000000000">
Xen documentation on live migration states</a>:</p>
<blockquote>
<p>Currently, there is no support for providing automatic remote
access to filesystems stored on local disk when a domain is
migrated. Administrators should choose an appropriate storage
solution (i.e. SAN, NAS, etc.) to ensure that domain filesystems
are also available on their destination node. GNBD is a good method
for exporting a volume from one machine to another. iSCSI can do a
similar job, but is more complex to set up.</p>
</blockquote>
<p>This does not mean that it is impossible though. Live migration
is a more efficient migration, and migration can be seen as a save
on one node, and a restore on another. Normally, if you save a VM
on one machine, and try to restore it on another machine, it will
fail when it is unable to read its filesystems. But what would
happen if you coppied the filesystem to the other node between the
save and restore? If done right, it works pretty well.</p>
<h2>The solution?</h2>
<p>The solution is simple:</p>
<ul>
<li>Save running image</li>
<li>Sync disks</li>
<li>copy image to other node, restore</li>
</ul>
<p>This can be somewhat sped up by syncing the disks twice:</p>
<ul>
<li>Sync disks</li>
<li>Save running image</li>
<li>Sync disks - only having to save any changes in the last few
seconds</li>
<li>copy image to other node, restore</li>
</ul>
<h3>Syncronizing block devices</h3>
<h4>File backed</h4>
<p>If you are using plain files as vbds, you can sync the disks
using rsync.</p>
<h4>Raw devices</h4>
<p>If you are using raw devices, rsync can not be used. I wrote a
small utility called <a href="http://bouncybouncy.net//ramblings/../programs/blocksync.py">blocksync</a> which can syncronize 2
block devices over the network. In my testing it was easily able to
max out the network on an initial sync, and max out the disk read
speed on a resync.</p>
<p>$ blocksync.py /dev/xen/vm-root 1.2.3.4</p>
<p>Will sync /dev/xen/vm-root onto 1.2.3.4. The device should
already exist on the destination and be the same size.</p>
<h4>Solaris ZFS</h4>
<p>If you are using ZFS, it should be possible to use <code>zfs
send</code> to sync the block devices before migration. This would
give an almost instantaneous sync time.</p>
<h2>Automation</h2>
<p>A simple script <a href="http://bouncybouncy.net//ramblings/../programs/xen_migrate.sh">xen
migrate.sh</a> and its helper <a href="http://bouncybouncy.net//ramblings/../programs/xen_vbds.py">xen
vbds.py</a> will migrate a domain to another host. File and raw
vbds are supported. <code>ZFS send</code> support is not yet
implemented.</p>
<h3>Example migration</h3>
<div class="syntax">
<pre>
<span class=
"synComment">#migrating a 1G / + 128M swap over the network</span>
<span class=
"synComment">#physical machines are 350mhz with 64M of ram,</span>
<span class="synComment">#total downtime is about 3 minutes</span>

xen1:~# time ./migrate.sh test 192.168.1.2
+ <span class="synConstant">'['</span> 2 -ne 2 <span class=
"synConstant">']'</span>
+ DOMID=test
+ DSTHOST=192.168.1.2
++ xen_vbds.py test
+ FILES=/dev/xen/test-root
/dev/xen/test-swap
+ main
+ check_running
+ xm list test
Name              Id  Mem(MB)  CPU  State  Time(s)  Console
test              87       15    0  -b---      0.0    9687
+ sync_disk
+ blocksync.py /dev/xen/test-root 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576
same: 942, diff: 82, 1024/1024
+ blocksync.py /dev/xen/test-swap 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576
same: 128, diff: 0, 128/128
+ save_image
+ xm save test test.dump
+ sync_disk
+ blocksync.py /dev/xen/test-root 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576
same: 1019, diff: 5, 1024/1024
+ blocksync.py /dev/xen/test-swap 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576
same: 128, diff: 0, 128/128
+ copy_image
+ scp test.dump 192.168.1.2:
test.dump                                       100%   16MB   3.2MB/s   00:05
+ restore_image
+ ssh 192.168.1.2 <span class=
"synConstant">'xm restore test.dump &amp;&amp; rm test.dump'</span>
(domain
    (id 89)
    [domain info stuff cut out]
)
+ rm test.dump

real    6m6.272s
user    1m29.610s
sys     0m30.930s


</pre>
<span class="synTitle"><a href="http://bouncybouncy.net//ramblings/../../files/example_migration.txt">download file
"/ramblings/files/example_migration.txt"</a></span></div>

]]></description>
	
</item>
<item>
	
	<title>dynamic ikiwiki pages</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/dynamic_ikiwiki_pages/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/dynamic_ikiwiki_pages/</link>
	
	
	<category>tags/ikiwiki</category>
	
	<category>tags/meta</category>
	
	<category>tags/pylons</category>
	
	<category>tags/python</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Fri, 15 Feb 2008 20:57:58 -0500</pubDate>
	<dcterms:modified>2008-02-16T18:36:26Z</dcterms:modified>
	
	<description><![CDATA[<p>The static pages that <a href="http://ikiwiki.info">ikiwiki</a>
generates are great, but I want to have some dynamic content here
as well.</p>
<p>If this works, this page should include the servers uptime.</p>
<!--# include virtual="/dyn/demo/uptime" -->
<p>yay <img src="http://bouncybouncy.net//ramblings/../smileys/smile.png" alt=":-)" /></p>
<p>So how does that work?</p>
<p>first configure nginx as follows</p>
<div class="syntax">
<pre>
server {
    listen       80;
    server_name  bouncybouncy.net  *.bouncybouncy.net web;

    location / {
        root   /home/justin/bbdotnet/static/;
        index  index.html index.htm;
        ssi on;
    }
    location /dyn {
        # All POST requests go to pylons directly
        include /usr/local/nginx/conf/proxy.conf;
        proxy_redirect  default; 
        if ($request_method = POST) {
            proxy_pass  http://127.0.0.1:5000;
            break;
        }
        default_type text/html; 

        set $memcached_key "$uri";
        memcached_pass localhost:11211;

        proxy_intercept_errors  on;

        # If no info would be found in memcache or memecache would be dead, go to real dynamic location
        error_page 404 502 = @dynamic_request;
    }
    location @dynamic_request{
        # This means, that we can't get to this location from outside - only by internal redirect
        internal;

        include /usr/local/nginx/conf/proxy.conf;
        proxy_redirect  default; 
        proxy_pass  http://127.0.0.1:5000;
    }

}

</pre></div>
<p>Pylons is setup to run on port 5000 as usual, nothing fancy
there.</p>
<p>Then anywhere we want some dynamic content we can simply do</p>
<div class="syntax">
<pre>
&lt;!--# include virtual="/dyn/demo/uptime" --&gt;

</pre></div>
<p>For now, you have to disable the htmlscrubber plugin for this to
work. There is probably a better solution. I think this would
simply involve a plugin that could run after htmlscrubber to insert
the include, then you would only need to have something like
[[include virtual="/dyn/demo/uptime"]] in your pages.</p>
<p>If you did not mind requring javscript, you could use <a href=
"http://www.mnot.net/javascript/hinclude/">HInclude</a> instead of
SSI.</p>
<p>To keep things running fast, we enable to caching on the pylons
controller. using a modified version of the beaker<em>cache
decorator. The following lines are inserted at the end of the
create</em>func method, which causes the page result to be cached
in memcache as well as in beaker.</p>
<div class="syntax">
<pre>
url = pylons.request.path_info
<span class="synStatement">if</span> pylons.request.params:
    url += "<span class=
"synConstant">?</span>" + pylons.request.environ['<span class=
"synConstant">QUERY_STRING</span>']

mc = memcache.Client(['<span class=
"synConstant">localhost</span>'])
mc.set(url, result, cache_expire)

</pre></div>
<p>The only remaining problem I see is a small race condition. If
the cache expires, and 20 concurrent requests all come in for the
page, most of them will end up hitting python instead of waiting
for the memcache key to appear. This might actually work better
using varnish or apache2 with <code>mod_disk_cache</code>, but the
last time I tried I could not get varnish to work at all, and
apache2 (I think) still does not support PURGE.</p>

]]></description>
	
</item>

</channel>
</rss>
