How to extract the substring between two markers?


Question

Let's say I have a string 'gfgfdAAA1234ZZZuijjk' and I want to extract just the '1234' part.

I only know what will be the few characters directly before AAA, and after ZZZ the part I am interested in 1234.

With sed it is possible to do something like this with a string:

echo "$STRING" | sed -e "s|.*AAA\(.*\)ZZZ.*|\1|"

And this will give me 1234 as a result.

How to do the same thing in Python?

1
258
10/10/2018 10:41:52 PM

Accepted Answer

Using regular expressions - documentation for further reference

import re

text = 'gfgfdAAA1234ZZZuijjk'

m = re.search('AAA(.+?)ZZZ', text)
if m:
    found = m.group(1)

# found: 1234

or:

import re

text = 'gfgfdAAA1234ZZZuijjk'

try:
    found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
    # AAA, ZZZ not found in the original string
    found = '' # apply your error handling

# found: 1234
460
10/8/2013 3:50:59 PM

>>> s = 'gfgfdAAA1234ZZZuijjk'
>>> start = s.find('AAA') + 3
>>> end = s.find('ZZZ', start)
>>> s[start:end]
'1234'

Then you can use regexps with the re module as well, if you want, but that's not necessary in your case.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon