Python: How to use RegEx in an if statement?


Question

I have the following code which looks through the files in one directory and copies files that contain a certain string into another directory, but I am trying to use Regular Expressions as the string could be upper and lowercase or a mix of both.

Here is the code that works, before I tried to use RegEx's

import os
import re
import shutil

def test():
    os.chdir("C:/Users/David/Desktop/Test/MyFiles")
    files = os.listdir(".")
    os.mkdir("C:/Users/David/Desktop/Test/MyFiles2")
    for x in (files):
        inputFile = open((x), "r")
        content = inputFile.read()
        inputFile.close()
        if ("Hello World" in content)
            shutil.copy(x, "C:/Users/David/Desktop/Test/MyFiles2")

Here is my code when I have tried to use RegEx's

import os
import re
import shutil

def test2():
    os.chdir("C:/Users/David/Desktop/Test/MyFiles")
    files = os.listdir(".")
    os.mkdir("C:/Users/David/Desktop/Test/MyFiles2")
    regex_txt = "facebook.com"
    for x in (files):
        inputFile = open((x), "r")
        content = inputFile.read()
        inputFile.close()
        regex = re.compile(regex_txt, re.IGNORECASE)

Im guessing that I need a line of code that is something like

if regex = re.compile(regex_txt, re.IGNORECASE) == True

But I cant seem to get anything to work, if someone could point me in the right direction it would be appreciated.

1
39
2/28/2013 3:31:35 PM

Accepted Answer

if re.match(regex, content) is not None:
  blah..

You could also use re.search depending on how you want it to match.

81
1/8/2013 11:11:31 PM

if re.search(r'pattern', string):

Simple if-test:

if re.search(r'ing\b', "seeking a great perhaps"):     # any words end with ing?
    print("yes")

Pattern check, extract a substring, case insensitive:

match_object = re.search(r'^OUGHT (.*) BE$', "ought to be", flags=re.IGNORECASE)
if match_object:
    assert "to" == match_object.group(1)     # what's between ought and be?

Notes:

  • Use re.search() not re.match. Match restricts to the start of strings, a confusing convention if you ask me. If you do want a string-starting match, use caret or \A instead, re.search(r'^...', ...)

  • Use raw string syntax r'pattern' for the first parameter. Otherwise you would need to double up backslashes, as in re.search('ing\\b', ...)

  • In this example, \b is a special sequence meaning word-boundary in regex. Not to be confused with backspace.

  • re.search() returns None if it doesn't find anything, which is always falsy.

  • re.search() returns a Match object if it finds anything, which is always truthy.

  • a group is what matched inside parentheses

  • group numbering starts at 1

  • Specs

  • Tutorial


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon