Regular expression matching anything greater than eight letters in length, in Python


Question

Despite attempts to master grep and related GNU software, I haven't come close to mastering regular expressions. I do like them, but I find them a bit of an eyesore all the same.

I suppose this question isn't difficult for some, but I've spent hours trying to figure out how to search through my favorite book for words greater than a certain length, and in the end, came up with some really ugly code:

twentyfours = [w for w in vocab if re.search('^........................$', w)]
twentyfives = [w for w in vocab if re.search('^.........................$', w)]
twentysixes = [w for w in vocab if re.search('^..........................$', w)]
twentysevens = [w for w in vocab if re.search('^...........................$', w)]
twentyeights = [w for w in vocab if re.search('^............................$', w)]

... a line for each length, all the way from a certain length to another one.

What I want instead is to be able to say 'give me every word in vocab that's greater than eight letters in length.' How would I do that?

1
15
8/30/2010 8:45:21 PM

Accepted Answer

You don't need regex for this.

result = [w for w in vocab if len(w) >= 8]

but if regex must be used:

rx = re.compile('^.{8,}$')
#                  ^^^^ {8,} means 8 or more.
result = [w for w in vocab if rx.match(w)]

See http://www.regular-expressions.info/repeat.html for detail on the {a,b} syntax.

22
8/30/2010 8:55:47 PM

\w will match letter and characters, {min,[max]} allows you to define size. An expression like

\w{9,}

will give all letter/number combinations of 9 characters or more


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon