What is the difference between the
match() functions in the Python
I"ve read the documentation (current documentation), but I never seem to remember it. I keep having to look it up and re-learn it. I"m hoping that someone will answer it clearly with examples so that (perhaps) it will stick in my head. Or at least I"ll have a better place to return with my question and it will take less time to re-learn it.
re.match is anchored at the beginning of the string. That has nothing to do with newlines, so it is not the same as using
^ in the pattern.
As the re.match documentation says:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding
Noneif the string does not match the pattern; note that this is different from a zero-length match.
Note: If you want to locate a match anywhere in string, use
re.search searches the entire string, as the documentation says:
Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding
Noneif no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
So if you need to match at the beginning of the string, or to match the entire string use
match. It is faster. Otherwise use
The documentation has a specific section for
search that also covers multiline strings:
Python offers two different primitive operations based on regular expressions:
matchchecks for a match only at the beginning of the string, while
searchchecks for a match anywhere in the string (this is what Perl does by default).
matchmay differ from
searcheven when using a regular expression beginning with
"^"matches only at the start of the string, or in
MULTILINEmode also immediately following a newline. The ‚Äú
match‚Äù operation succeeds only if the pattern matches at the start of the string regardless of mode, or at the starting position given by the optional
posargument regardless of whether a newline precedes it.
Now, enough talk. Time to see some example code:
# example code: string_with_newlines = """something someotherthing""" import re print re.match("some", string_with_newlines) # matches print re.match("someother", string_with_newlines) # won"t match print re.match("^someother", string_with_newlines, re.MULTILINE) # also won"t match print re.search("someother", string_with_newlines) # finds something print re.search("^someother", string_with_newlines, re.MULTILINE) # also finds something m = re.compile("thing$", re.MULTILINE) print m.match(string_with_newlines) # no match print m.match(string_with_newlines, pos=4) # matches print m.search(string_with_newlines, re.MULTILINE) # also matches
match is much faster than search, so instead of doing regex.search("word") you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples.
This comment from @ivan_bilan under the accepted answer above got me thinking if such hack is actually speeding anything up, so let"s find out how many tons of performance you will really gain.
I prepared the following test suite:
import random import re import string import time LENGTH = 10 LIST_SIZE = 1000000 def generate_word(): word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)] word = "".join(word) return word wordlist = [generate_word() for _ in range(LIST_SIZE)] start = time.time() [re.search("python", word) for word in wordlist] print("search:", time.time() - start) start = time.time() [re.match("(.*?)python(.*?)", word) for word in wordlist] print("match:", time.time() - start)
I made 10 measurements (1M, 2M, ..., 10M words) which gave me the following plot:
The resulting lines are surprisingly (actually not that surprisingly) straight. And the
search function is (slightly) faster given this specific pattern combination. The moral of this test: Avoid overoptimizing your code.
For courses in business intelligence or decision support systems. A managerial approach to understanding business intelligence systems. To help future managers use and understand analytics, Business...
Python for Programmers: with Big Data and Artificial Intelligence Case Studies This book, written for programmers with a high-level experience in another language, uses how-to instructions to teach...
We are experiencing a renaissance of artificial intelligence, and everyone and their neighbor wants to be a part of this movement. That’s quite likely why you are browsing through this book. There a...
Python Data Science Handbook: Essential Tools for Working with Data - PDF, 1st Edition For many researchers, Python is a first-class tool, primarily because of its libraries for storing, manipulati...