Change language

Backtracking in regular expressions

Backtracking in regular expressions

Hi.

In this video, we are going to learn more about backtracking.

Whenever a regular expression contains a quantifier like plus or asterisk or a limiting quantifier without the maximum argument, backtracking might come into play.

Whenever there is an opportunity for a regex engine to match a string differently, it will explore as many ways as possible in order to deliver a complete match.

Whenever a regex engine finds this opportunity, it saves its state in order to come back to it later when it fails to find a complete match.

This process of returning to a previous saved state to find a match is known as backtracking.

Lets have a look at the example.

Lets match the last occurrence of "blah" in this string.

We will start with a greedy dot pattern and "blah" word.

We can see that we matched all the string up to the last occurrence of "blah" here. Why? The greedy dot pattern matches all the string to its end first, then its going to look for "b".

As the regex engine doesnt find any "b" at the end of the string, it backtracks further back into the string and goes up to this location where it finds "b" and then it finds "lah" letters right after and delivers a complete match.

Lets switch off the global modifier.

Now, lets replace "blah" with "123".

This pattern matches this string as a whole because "123" is at the end of it.

What if we make this pattern lazy? We only match up to the first occurrence of "123".

This is expected, but whats going on here? The lazy dot pattern is skipped at first and only "123" is being searched for.

So when the regex engine is at the position in front of "7", it tries to find "1", but it cant.

There is "7".

So this position is failed and "7" is matched (or, consumed) with this lazy dot pattern.

Its expanded.

Next, there is "8", but "123" cant match at this position.

Again, this "8" is consumed by this lazy dot pattern.

Its expanded once again.

This subpattern expansion goes on until the regex engine comes to this location where it matches "123".

You can easily see what the lazy dot matched if you use capturing parentheses.

This subpattern expansion is also backtracking because the regex engine is actually going back and forth to deliver a complete match. Now, lets try and match the last number in the string.

We can use "d+" to match one more digits.

Lets try the greedy dot pattern and capture the digits.

This is a very common mistake.

Now you understand why it happens.

The greedy dot pattern matches the whole string first.

Then it sees that it must match at least one digit.

It backtracks and at this position it can match "3" with "d+" pattern and thus only "3" is captured into the group. In order to capture "123" into the capturing group.

you must make sure that the character before this group is not a numeric character.

And now you can see that weve got "123" in this group.

What if the number is at the beginning of the string? In this case, we should make this pattern optional.

Actually, there are other ways to match the first or last occurrence of strings, but well cover that in our further videos. If you liked my video, please click "Like" and subscribe to my channel if you havent done it yet.

Happy regexing.

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News


Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method