Method # 1 :
In this method we will use
re.search (pattern, string, flags = 0) . Here, pattern refers to the pattern we want to find. It accepts a string with the following values:
- / w matches alphanumeric characters
- / d matches numbers, which means 0-9
- / s matches whitespace
- / S matches non-whitespace characters
- , matches any character other than the newline character / n
- * matches 0 or more instances of the pattern
- In the above code, we are using a for loop to iterate over the movie data so that we can work with each movie in turn. We create a movie dictionary that will store all the details of every detail such as rating and title.
- We then find the complete Name field using
re.search (). . means any character except / n and * expands it to the end of the line. Assign that to the variable name_field .
- But the data isn’t always straightforward. This may contain surprises. For example, what if there is no Name: field? The script will give an error and break. We anticipate errors from this script and check for " No" .
- We again use the re.search () function to retrieve the last required row from the name_field. For the name, we use / w * to represent the first word, / s to represent the space between them, and / w * for the second word.
- Do the same for the year and rankings and get the final required vocabulary.
Method # 2:
To split a string, we will use
Series.str.extract (pat, flags = 0, expand = True) . Here pat refers to the pattern we want to find.