I'm making a program to automate the writing of some C code, (I'm writing to parse strings into enumerations with the same name) C's handling of strings is not that great. So some people have been nagging me to try python.
I made a function that is supposed to remove C-style
/* COMMENT */ and
from a string:
Here is the code:
def removeComments(string): re.sub(re.compile("/\*.*?\*/",re.DOTALL ) ,"" ,string) # remove all occurance streamed comments (/*COMMENT */) from string re.sub(re.compile("//.*?\n" ) ,"" ,string) # remove all occurance singleline comments (//COMMENT\n ) from string
So I tried this code out.
str="/* spam * spam */ eggs" removeComments(str) print str
And it apparently did nothing.
Any suggestions as to what I've done wrong?
There's a saying I've heard a couple of times:
If you have a problem and you try to solve it with Regex you end up with two problems.
EDIT: Looking back at this years later. (after a fair bit more parsing experience)
I think regex may have been the right solution. And the simple regex used here "good enough". I may not have emphasized this enough in the question. This was for a single specific file. That had no tricky situations. I think it would be a lot less maintenance to keep the file being parsed simple enough for the regex, than to complicate the regex, into an unreadable symbol soup.
re.sub returns a string, so changing your code to the following will give results:
def removeComments(string): string = re.sub(re.compile("/\*.*?\*/",re.DOTALL ) ,"" ,string) # remove all occurrences streamed comments (/*COMMENT */) from string string = re.sub(re.compile("//.*?\n" ) ,"" ,string) # remove all occurrence single-line comments (//COMMENT\n ) from string return string
Many answers are given already but;
"//comment-like strings inside quotes"?
OP is asking how to do do it using regular expressions; so:
def remove_comments(string): pattern = r"(\".*?\"|\'.*?\')|(/\*.*?\*/|//[^\r\n]*$)" # first group captures quoted strings (double or single) # second group captures comments (//single-line or /* multi-line */) regex = re.compile(pattern, re.MULTILINE|re.DOTALL) def _replacer(match): # if the 2nd group (capturing comments) is not None, # it means we have captured a non-quoted (real) comment string. if match.group(2) is not None: return "" # so we will return empty to remove the comment else: # otherwise, we will return the 1st group return match.group(1) # captured quoted-string return regex.sub(_replacer, string)
This WILL remove:
/* multi-line comments */
// single-line comments
Will NOT remove:
String var1 = "this is /* not a comment. */";
char *var2 = "this is // not a comment, either.";
url = 'http://not.comment.com';