How to input a regex in string.replace?

StackOverflow

I need some help on declaring a regex. My inputs are like the following:

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>

The required output is:

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. 
and there are many other lines in the txt files
with such tags

I"ve tried this:

#!/usr/bin/python
import os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(), "*.txt")):
    for line in reader: 
        line2 = line.replace("<[1> ", "")
        line = line2.replace("</[1> ", "")
        line2 = line.replace("<[1>", "")
        line = line2.replace("</[1>", "")
        
        print line

I"ve also tried this (but it seems like I"m using the wrong regex syntax):

        line2 = line.replace("<[*> ", "")
        line = line2.replace("</[*> ", "")
        line2 = line.replace("<[*>", "")
        line = line2.replace("</[*>", "")

I dont want to hard-code the replace from 1 to 99.

Answer rating: 691

This tested snippet should do it:

import re
line = re.sub(r"</?[d+>", "", line)

Here"s a commented version explaining how it works:

line = re.sub(r"""
  (?x) # Use free-spacing mode.
  <    # Match a literal "<"
  /?   # Optionally match a "/"
  [   # Match a literal "["
  d+  # Match one or more digits
  >    # Match a literal ">"
  """, "", line)

Regexes are fun! But I would strongly recommend spending an hour or two studying the basics. For starters, you need to learn which characters are special: "metacharacters" which need to be escaped (i.e. with a backslash placed in front - and the rules are different inside and outside character classes.) There is an excellent online tutorial at: www.regular-expressions.info. The time you spend there will pay for itself many times over. Happy regexing!





Get Solution for free from DataCamp guru