BB.Net / ramblings / posts / regex with named groups

As I mentioned in a comment at Some more tweaks to my Python script, there are a lot of ways you can use the re module. If you need to match multiple expressions against each line, you can build up a single regular expression that includes all the patterns, and used named groups to tell them apart.


import re
#if you were matching many of these it would be a good idea
#to make a function that simply fills in '%s>(?P<%s>[^<]+)<'
cpattern    = 'total_credit>(?P<credit>[^<]+)<'
opattern    = 'os_name>(?P<os>[^<]+)<'
pattern     = '(%s)|(%s)' % (cpattern, opattern)

search = re.compile(pattern).search

lines = [
    'blah blah blah total_credit>10< blah blah',
    'hkfhsd klfjhs dfkljsdfsl fds',
    'hkashflksd os_name>win< hhkjhdflksj d',
    'hkfhsd klfjhs dfkljsdfsl fds',
    'blah blah blah total_credit>20< blah blah',
]

for line in lines:
    r = search(line)
    if r:
        print r.groupdict()

Running this gives

{'credit': '10', 'os': None}
{'credit': None, 'os': 'win'}
{'credit': '20', 'os': None}

In this case you could even generalize the regular expression further, like so:

pattern     = '\s(?P<key>[^\s>]+)>(?P<value>[^<]+)<'

Running that (probably less than optimal) regular expression over the input gives

{'key': 'total_credit', 'value': '10'}
{'key': 'os_name', 'value': 'win'}
{'key': 'total_credit', 'value': '20'}