Extracting text from Wikipedia infobox in Python

Python Methods and Functions

Thus, the information block Wikipedia — it is a fixed-format table, usually added in the upper-right corner of articles to represent summary articles for that wiki page and sometimes to improve navigation to other related articles. 
[To learn more about infobox, click here ]

Requests" — is an Apache2 HTTP license library written in Python. Requests will allow you to send HTTP / 1.1 requests using the Python language. With it, you can add content such as headers, form data, multipart files, and parameters through simple Python libraries. It also allows you to access Python response data in the same way. 
For more information on this,
click here

I used Python 2.7 is here,

Make sure these modules are installed on your machine.
If not, then at the console or prompt, you can install it using pip

# importing modules

import requests

from lxml import etree

 
# manually save the desired URL

url = ` https://en.wikipedia.org/wiki/Delhi_Public_School_Society `

  
# fetching its URL through the request module

req = requests.get (url) 

  

store = etree.fromstring (req.text)

 
# this will give the motto part above
# Wikipedia page URL info box

output = store.xpath ( ` // table [@ class = "infobox vcard"] / tr [th / text () = "Motto"] / td / i`

 
# print part of the text

print output [ 0 ]. text 

 
# Run this program with Python installed or
# on your local system using cmd or any IDE.

Look at this link, it will display the Motto section of the informational page of this wikipedia (as shown in (see this screenshot). 

Write your code first

Now, finally, after starting the program, you get

You can also change the URL and store.xpath to get different sections of the infobox. 
If you would like to learn more about web scrubbing, follow these links,
1) Web Scraping 1
2) Web Scraping 2





Get Solution for free from DataCamp guru