Change language

Basic regex constructs overview (Part 1)

Basic regex constructs overview (Part 1)

Hello. In this video I would like to speak about regular expressions in general.

A lot of people asked me to record a video where I explain the basic concepts of regular expressions and I decided to record one.

Actually, there are a lot of regular expression resources on the Web.

In my opinion, one of the most important regular expression resource on the Web is the http://www.regular-expressions.info Web site.

It provides explanation for all basic regular expression constructs.

Lets have a look at those basic regex constructs now.

A regular expression is some text pattern that can be used to search for specific strings.

It can either validate a string, or it can be used to extract a part of a string from a longer string.

It can be used to find and replace parts of a string in longer strings.

It can also be used to split longer strings into smaller chunks.

A regular expression can contain a lot of things inside.

Regular expressions can contain regex escapes, quantifiers, character classes, groups, lookarounds.

Regex escapes are any special sequences of a literal backslash and a character.

For example, zero-width assertions like word boundaries or anchors.

Backreferences are also an example of regex escapes.

They can be numbered or named.

Special characters are also regex escapes, hexadecimal escapes, shorthand character classes, Unicode code points, etc.

These topics are very broad, and we can devote a specific video for each of them.

At this point, I would like to briefly show the use of these constructs.

For example, word boundaries can be used to match a whole word like "REGEX", for example.

So only whole words are matched.

So if you use a "REGEXP", it wont be matched with this regular expression. Anchors, they match the line or string boundaries.

In this case, they match the line boundaries because of this "m" flag here.

So for example, I want to match any line that contains a "META" substring in it.

So this is it.

Well, we dont need any anchors actually, but they dont make any harm here.

Next comes special characters.

For example, if you want to match a dot, we must use a backslash and a dot.

Otherwise a dot matches any character other than a line break character.

So, yes, like this.

Now if we talk about hexadecimal escapes, we can use them like this.

For example, this x20 matches a space or this x0A matches line breaks.

You can see here.

Shorthand character classes are very well-known constructs like "d" to match a digit or "w" to match any word character or "s" that matches a whitespace.

The usage of Unicode code points in regular expressions depends on the regex flavor, and if we select PCRE here, we might use this pattern to match a Polish letter "ł".

This regex escape will match an emoji in a Python regex flavor. Here.

The backreferences are special constructs that let us refer to the part of the regular expression that was captured.

Capturing is done with capturing groups.

For example, we can use a backreference inside a pattern.

For example, when we match "abc" and use "1", we want to match this pattern.

Actually, this is not a good example because "abc" is a literal.

If we do not use a literal it becomes a lot more interesting.

For example, this pattern matches any digit and then the same digit right after it. Like here, we match a zero and another zero.

Here, we do not match zero and one because one is not zero.

Groups can be capturing (those that can be referred to using backreferences) or non-capturing (those that are used to group several patterns with an OR "|" operator or when a whole sequence of patterns needs to be quantified, or repeated.

Capturing groups can be numbered when they are used like this or named when they are named this way.

Capturing groups are defined with a pair of unescaped parentheses and non-capturing groups are defined with open and close parentheses and the open parenthesis is followed directly with a question mark and a colon symbol.

Well discuss more regex basic constructs in the next video.

If you liked my video please click "Like" and subscribe to my channel if you havent done it yet.

Thank you for watching and happy regexing.

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically