Change language

Regular Expressions in Python – Part 2 | What is RegEx ?

Regular Expressions in Python - Part 2 | What is RegEx ?

Well come back in this lecture and continue where we left off and go over a couple of more regular expression techniques.

Lets head over back to the notebook.

Here we are in the same notebook we left off last time previously we actually showed you how to reduce this pattern using regular expressions into this pattern with quantifiers.

One other thing I want to mention are the ability to actually grab separate groups.

Currently this entire phrase for the pattern is one solid group that you can compile separate groups using parentheses.

So I can use parentheses to separate these groups in between the dashes.

So Im adding Prince Cs and now when I search for this pattern inside of the original text which was my telephone number and if I actually say my group that grouped together everything but I can actually call individual groups starting at index position 1 and thats going to grab the first set that was inside the princes.

So if I say Group 1 it just grabs 8 8 8 you can see here if I were to include the dash inside the princes and we run this group one would have been a dash.

So this way you can actually use the same character identifiers and quantifiers as before except now you can easily group things because maybe you only wanted the area code so that you could just say Group One.

Or if you wanted the last digits Well that would be in group 3 and those would be the last four digits.

So thats the ability to use groups and if you want to group everything you just dont provide a number of groups.

All the groups together some additional regular expression syntax that wont show you is the pipe operator you can use the pipe operator to have an OR statement.

For example I can say Ariz search and the pattern I can search for is man pipe operator woman and I can say text is this man was here and we end up getting a match on man.

Or I could say this woman was here and we end up getting a match on a woman so you can see here itll work for either man or woman with the pipe operator.

You can also use a wildcard character you can use a wildcard as a placement that will match any character placed there.

For example I can say r e thought find all and Ill put a dot there for my wildcard and then say 80 and then Ill create some Dr. Seuss sentence The Cat In The Hat set.

And if you run this you can see it found cat and set basically a wild card.

And then 80.

So C.A.T. h a t an s t keep in mind the weight is currently working.

Youre only matching on one letter.

So if you were to actually say splatt at the end of this and run this you would only get out 80.

You could then continue on messing around with quantifiers or identifiers or even more wild cards.

So you can place multiple wildcards here and illand the grabbing the space and then the letter.

Or in this case a grab two letters before a and so on.

So again a dot a wildcard character a combine it using the sort of grouping quantifiers or character identifiers next maybe youre just interested in things that start or end with a particular type of character.

Well you can use the carrot which is above your six on your keyboard in order to say starts with a dollar sign signals ends with.

So for example I can say are either fine.

All I will say backslash D for each digit and then a dollar sign and a dollar sign against signals ends with.

So I can say this ends with a.

Number two and return back and its going to fine where it ends of the number.

And there we see two.

I could then also say carrot here instead of a dollar sign.

And that and the Cates starts with.

So I could say one is the loneliest number.

Run that then I get back one since its starting with a digit.

Now I should note that this is for the entire string not just for individual words.

To exclude characters you can use the carrot symbol in conjunction with a set of brackets.

Let me show what I mean by that.

Lets create a much longer phrase and say there are three numbers 34 inside five this sentence.

Now lets imagine I wanted to get rid of all the numbers inside this sentence.

What I can do is I can say r e.

Find all.

And then I can pasan a pattern.

So Ill say are I then quotes and then Im going to say closed square brackets carot for backslash D.

So my backslash d the carrot.

And when the carrot is actually inside these square brackets thats going to indicate exclusion.

So Im basically saying exclude any digits.

So its pass in our phrase and youll see there are.

And then we just get spaces and then numbers inside this sentence.

So its actually no longer actually grabbing any of these numbers were excluding it with the use of the square braces or brackets.

And then this carrot symbol then if you actually want to get the boards all back together you can add a plus sign here and thats going to return back.

Basically anything that isnt a digit so we can actually use this to remove punctuation from a sentence which is a common thing you have to do when working a natural language processing.

Lets create another test phrase and say this is a string but it has punctuation how to remove it.

Question mark.

So theres a bunch of punctuation here.

Again all you have to do is say R-E find all and then start indicating what we want to remove.

So inside the square braces Ill have my care and its a exclamation point.


Question mark and thats say plus.

And then in my test phrase run that.

And then I get see this nice list.

This is a string but it has punctuation how to remove it.

So you notice right now is a list of everything that isnt punctuation.

So Im using exclusion here.

And if I wanted to I could say my list is equal to this and then I have my list here and I can use the join method off of a string in order to join this list together.

So I could say join.

Lets actually just say right here join a space six say space and thats going to join every item in this list with a space in between.

So now I see this is a string that has punctuation how to remove it which now has successfully removed any punctuation from this test phrase.

Main thing to note here is you have the square braces and the carrot.

And then this plus sign and then anything inside these brackets after this.

Were looking for exclusion.

Now the last regular expression parameter I want to talk about is something we actually just saw which is the plus sign along with using brackets for grouping.

So this plus sign along with brackets allows you to use grouping Mishu what I mean by that by saying text and the text string are going to work with is only find the were going to say hyphenated words here.

So hyphen words where are the longish dash words.

So lets imagine a situation where we want to find any sort of words to have this hyphen in it.

What we can do here is say Ari fine all so were saying find some set of letters a dash and then some other set of letters and the way were going to word this here.

S.a.a be able to grab that period as well.

So the way we do this is we say in square brackets backslash w which if you refer back to your expressions notebook and starts crawling up so you can find the table.

Here we have the identifier backslash lowercase W..

That stands for alphanumerics.

So both letters and numbers will be picked up here.

So this will grab inside square braces with a plus sign.

This basically indicates grab any number of alphanumerics.

So some group of alphanumerics.

So alphanumerics is indicated by backslash w the square braces with the plus sign indicates any number then what we want is a dash and then backslash w and then plus for Again any number of alphanumerics and then were going to search that text.

We run that and it went ahead and find those words for us and actually then pick up this period because we didnt ask for punctuation we just asked for alphanumerics in a period doesnt count as that.

So that allows you to actually grab anything of the sort of pattern.

So you get the idea here is this again a group of alphanumerics a dash and then another out group of alphanumerics.

Notice that we dont have to specify how long we expected either side of number of characters on the dash to be OK.

So I know thats pretty complicated stuff but remember you should always be able to break out a regular expression using the tables that we provide here into its basic steps.

It should also be able to use the example matches and example Petar codes with identifiers and quantifiers to actually build up what you want.

And the main one that were looking for here was this plus one which mean s just occurs one or more times.


So coming up next is going to be your assessment to test your new skills.

Well see you at the next lecture.


Learn programming in R: courses


Best Python online courses for 2022


Best laptop for Fortnite


Best laptop for Excel


Best laptop for Solidworks


Best laptop for Roblox


Best computer for crypto mining


Best laptop for Sims 4


Latest questions


Common xlabel/ylabel for matplotlib subplots

12 answers


How to specify multiple return types using type-hints

12 answers


Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers


Flake8: Ignore specific warning for entire file

12 answers


glob exclude pattern

12 answers


How to avoid HTTP error 429 (Too Many Requests) python

12 answers


Python CSV error: line contains NULL byte

12 answers


csv.Error: iterator should return strings, not bytes

12 answers



Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python


How to specify multiple return types using type-hints


Printing words vertically in Python


Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries


Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically