Change language

Regular Expression in Python with Examples | Set 1

| | |

The Regular Expressions (RE) module specifies the rowset (pattern) that matches it. 
To understand the RE analogy, MetaCharacters are useful, important, and will be used in the functions of the re module. 
There are 14 metacharacters in total and will be discussed as they are executed in the following functions:

 Used to drop the special meaning of character following it (discussed below) [] Represent a character class ^ Matches the beginning $ Matches the end. Matches any character except newline? Matches zero or one occurrence. | Means OR (Matches with any of the characters separated by it. * Any number of occurrences (including 0 occurrences) + One ore more occurrences {} Indicate number of occurrences of a preceding RE to match. () Enclose a group of REs 
  • compile () function
    Regular expressions are compiled into template objects, which have methods for various operations, such as searching for pattern matches or performing string substitutions.

    # Module regex is imported using __import __ ().

    import re

     
    # compile () creates a regular character class expressions [ae],
    # which is equivalent to [abcde].
    # class [abcde] will with match the line with & # 39; a & # 39;, & # 39; b & # 39;, & # 39; c & # 39;, & # 39; d & # 39;, & # 39; e & # 39;.

    p = re. compile ( ’[ae]’ )

     
    # findall () looks for a regular expression and returns a list if it finds

    print (p.findall ( "Aye , said Mr. Gibenson Stark " ))

    Output:

     [’e’,’ a’, ’d’,’ b’, ’e’,’ a’] 

    Understanding the output:
    The first occurrence of — is "e" in "Yes", not "A" as it is case sensitive.
    The next occurrence is — "a" in "said", then "D" in "said", then "b" and "e" in "Gibenson", the last "a" corresponds to "Stark".


    The blackslash metacharacter & # 39; / & # 39; plays a very important role as it signals different sequences. If the black slash is to be used without its special meaning as a metacharacter, use "//"

     d Matches any decimal digit, this is equivalent to the set class [0-9]. D Matches any non-digit character. s Matches any whitespace character. S Matches any non-whitespace character w Matches any alphanumeric character, this is equivalent to the class [a-zA-Z0-9_]. W Matches any non-alphanumeric charac ter. 

    The given class [/ s,.] will match any space character, & # 39;, & # 39; or, & # 39;. & # 39; ,

    import re

     
    # / d is equivalent to [0-9].

    p = re. compile ( ’d’ )

    print (p.findall ( "I went to him at 11 AM on 4th July 1886" ))

     
    # / d + will match the group on [0-9], one size or larger group

    p = re. compile ( ’d +’ )

    print (p.findall ( " I went to him at 11 AM on 4th July 1886 " ))

    Output:

     [’1’,’ 1’, ’4’,’ 1’, ’8’,’ 8’, ’6’] [’ 11’, ’4’,’ 1886’] 

    import re

     
    # / w is equivalent to [a-zA-Z0-9_].

    p = re. compile ( ’ w’ )

    print (p.findall ( "He s aid * in some_lang. " ))

      
    # / w + matches a group of alphanumeric characters.

    p = re. compile ( ’ w + ’ )

    print (p.findall ( "I went to him at 11 AM, he said *** in some_language." ))

     
    # / W matches non-alphanumeric characters.

    p = re. compile ( ’W’ )

    print (p.findall ( "he said *** in some_language." ))

    Output:

     [’H’,’ e’, ’s’,’ a’, ’i’,’ d’, ’i’,’ n’, ’s’,’ o’, ’m’,’ e’ , ’_’,’ l’, ’a’,’ n’, ’g’] [’ I’, ’went’,’ to’, ’him’,’ at’, ’11’,’ A’, ’M’,’ he’, ’said’,’ in’, ’some_language’] [’ ’,’ ’,’ * ’,’ * ’,’ * ’,’ ’,’ ’,’ .’] 

    import re

      
    # & # 39; * & # 39; replaces the character’s appearance number.

    p = re. compile ( ’ab *’ )

    print (p.findall ( "ababbaabbb" ))

    Output:

     [’ab’,’ abb’, ’a’,’ abbb’] 

    Understanding the output:
    Our RE — it is ab *, which is followed by any no. from & # 39; b & # 39;, starting at 0.
    Output & # 39; ab & # 39; is valid because of a single & # 39; a & # 39; followed by a single & # 39; b & # 39 ;. 
    Output & # 39; abb & # 39 ;, is valid because of the single & # 39; a & # 39; followed by 2 & # 39; b & # 39 ;. 
    Output & # 39; a & # 39;, let’s say because single & # 39; a & # 39; followed by 0 & # 39; b & # 39 ;. 
    Output & # 39; abbb & # 39 ;, is valid due to one & # 39; a & # 39; followed by 3 & # 39; b & # 39 ;.

  • split () function
    Split the string by the occurrence of a character or pattern, when this pattern is found, the remaining characters from the string are returned as part of the resulting list. 
    Syntax :
     re.split (pattern, string, maxsplit = 0, flags = 0) 

    The first parameter, pattern denotes a regular expression, the string — it is the given string where the pattern will be searched and where the splitting occurs, maxsplit if not provided is considered to be zero & # 39; 0 & # 39;, and if any non-zero value is provided, then Maximum that many splits occur. If maxsplit = 1, then the string will be split only once, resulting in a list of length 2. Flags are very useful and can help to shorten the code, they are not necessary parameters, for example: flags = re.IGNORECASE, In this split, the case will be ignored .

    from re import split

     
    # / W + stands for not alphanumeric characters or a group of characters
    # After finding & # 39;, & # 39; or space & # 39; & # 39; split () splits the line at this point

    print (split ( ’W +’ , ’Words, words, Words’ ))

    print (split ( ’W +’ , "Word’s words Words" ))

     
    # Here & # 39;: & # 39 ;, & # 39; & # 39;, & # 39;, & # 39; are not alphanumeric, so the point where splitting occurs

    print (split ( ’W +’ , ’On 12th Jan 2016, at 11:02 AM’ ))

     
    # / d + denotes numeric characters or a group of characters
    # Splitting occurs only in "12", "2016", "11", "02"

    print (split ( ’d +’ , ’On 12th Jan 2016, at 11:02 AM’ ))

    Output:

     [’Words’,’ words’, ’Words’] [’ Word’, ’s’,’ words’, ’Words’] [’ On’, ’12th’,’ Jan’, ’2016’,’ at’ , ’11’,’ 02’, ’AM’] [’ On ’,’ th Jan ’,’, at ’,’: ’,’ ​​AM’] 

    import re

     
    # Splitting will happen only once, at & # 39; 12 & # 39;, the returned list will be 2

    print (re.split ( ’d +’ , ’On 12th Jan 2016, at 11:02 AM’ , 1 ))

     
    # & # 39 ; Boy & # 39; and & # 39; boy & # 39; will be treated the same when flags = re.IGNORECASE

    print (re.split ( ’[af] +’ , ’Aey, Boy oh boy, come here’ , flags = re.IGNORECASE))

    print (re.split ( ’[af] +’ , ’Aey, Boy oh boy, come here’ ))

    Output:

     [’On’, ’th Jan 2016, at 11:02 AM’] [’ ’,’ y, ’,’ oy oh ’,’ oy, ’,’ om’, ’h’,’ r’, ’’] [’A’ , ’y, Boy oh’, ’oy,’, ’om’,’ h’, ’r’,’ ’] 
  • sub () function
    Syntax :
     re.sub (pattern, repl, string, count = 0, flags = 0) 

    & # 39; Sub & # 39; in the function denotes SubString, the specified string is searched for a specific regular expression pattern (3rd parameter), and after finding the substring pattern is replaced with repl (2nd parameter), the counter checks and maintains the number of times this happens.

    import re

      
    # Regular expression pattern & # 39; ub & # 39; matches a string in "Subject" and & quot; Uber & quot ;.
    # Since CASE was ignored when using Flag, & # 39; ub & # 39; must match the string twice
    # After matching, "ub" is replaced by "~ *" in "Subject", and "Ub" is replaced in "Uber".

    print (re.sub ( ’ub’ , ’~ *’ , ’Subject has Uber booked already’ , flags = re.IGNORECASE))

     
    # Note the sensitivity case, “Ub "In" Uber "will not be restored.

    print (re.sub ( ’ub’ , ’ ~ * ’ , ’ Subject has Uber booked already’ ))

     

     
    # Since the account is set to 1, the maximum replacement time is 1

    print (re.sub ( ’ub’ , ’~ *’ , ’Subject has Uber booked already’ , count = 1 , flags = re.IGNORECASE))

     
    # & # 39; r & # 39; before the pattern denotes RE, / s is the beginning and end of the line.

    print (re .sub (r ’sANDs’ , ’ & amp; ’ , ’Baked Beans And Spam’ , flags = re.IGNORECASE))

    Exit

     S ~ * ject has ~ * er booked already S ~ * ject has Uber booked already S ~ * ject has Uber booked already Baked Beans & amp; Spam 
  • subn () function
    Syntax :
     re.subn (pattern, repl, string, count = 0, flags = 0) 

    subn () is similar in every respect to sub (), except for the way it is displayed. It returns a tuple with the total number of replacements and a newline, not just a string.

    Quit

     (’S ~ * ject has Uber booked already’, 1) (’ S ~ * ject has ~ * er booked already ’, 2) Length of Tuple is: 2 S ~ * ject has ~ * er booked already 
  • escape () function
    Syntax :
     re.escape (string) 

    Returns a string with all non-alphanumeric backslashes, this is useful if you want to match an arbitrary literal string that can contain regex metacharacters .

  • import re

    print (re.subn ( ’ub’ , ’~ *’ , ’Subject has Uber booked already’ ))

    t = re.subn ( ’ub’ , ’~ *’ , ’Subject has Uber booked already’ , flags = re.IGNORECASE)

    print (t)

    print ( len (t))

     
    # This will give the same output as sub ()

    print (t [ 0 ])

    < tr>

    import re

     
    # escape () returns a string with BackSlash & # 39; / & # 39; before every non-alphanumeric character
    # In the 1st case only & # 39; & # 39;, not alphanumeric
    # In the second case & # 39; & # 39 ;, the caret & # 39; ^ & # 39 ;, & # 39; - & # 39 ;, & # 39; [] & # 39 ;, & # 39; / & # 39; are not alphanumeric

    print (re.escape ( "This is Awseome even 1 AM" ))

    print (re.escape ( "I Asked what is this [a-9], he said ^ WoW" ))

    Exit

     This is Awseome even 1 AM I Asked what is this [a-9], he said ^ WoW 

Related article:
http://espressocode.top/regular-expressions-python-set-1-search-match-find/

Link:
https://docs.python.org/2/library/re.html

This article is provided by Lena Piyush Doorvar . If you are as Python.Engineering and would like to contribute, you can also write an article using contribute.python.engineering or by posting an article contribute @ python.engineering. See my article appearing on the Python.Engineering homepage and help other geeks.

Please post comments if you find anything wrong or if you would like to share more information on the topic discussed above.