Downloading files from the Internet using Python

Requests — it is a versatile python HTTP library with various applications. One of its applications is to download a file from the Internet using the URL of the file. 
Installation: First of all, you need to download the requests library. You can install it directly using pip by entering the following command:

 pip install requests 

Or download it directly from here and install manually.

Downloading files

# imported the requests library

import requests

image_url = " https://www.python.org/static/community_logos/python-logo-master -v3-TM.png "

  
# Image URL for upload is defined as image_url

r = requests.get (image_url) # create an HTTP response object

 
# send an HTTP request to the server and save
# HTTP response in response object named r

with open ( "python_logo.png" , ` wb` ) as f:

  

  # Save the resulting content as a png file

# binary

 

# write response content (r.content)

  # to a new file in binary mode .

f.write (r.content)

This little piece of code written above will download the following image from the Internet. Now check your local directory (the folder where this script is located) and you will find this image:

All we need is the URL of the image source. (You can get the URL of the image source by right-clicking on the image and selecting the View Image option.)

Download Large Files

HTTP response content ( r.content ) — it is nothing more than a string that file data is stored in. Thus, it will not be possible to keep all data in one line in case of large files. To overcome this problem, we are making some changes to our program:

  • Since all file data cannot be stored in one line, we use the r.iter_content method to load data into chunks specifying the chunk size.
  •  r = requests.get (URL, stream = True) 

    Setting the stream parameter to True will only download the response headers and the connection will remain open. This avoids concurrent reading of content into memory for large responses. The fixed chunk will be loaded every time the iteration of r.iter_content is running.

    Here`s an example:

    import requests

    file_url = " http://codex.cs.yale.edu/avi/db-book/db4/slide-dir/ch1-2.pdf "

     

    r = requests.get (file_url, stream = True )

     

    with open ( < code class = "string"> “python.pdf” , "wb" ) as pdf:

      for chunk in r.iter_content (chunk_size = 1024 ):

     

    # write one piece at a time to a pdf

    if chunk:

      pdf.write (chunk)

    Video download

    In this example, we are interested in downloading all the video lectures available on this ve b-page . All archives for this lecture are available here . So, first we clean up the web page to extract all the links to the video, and then we load the video one at a time.

    import requests

    from bs4 import BeautifulSoup

     
    "" "
    URL of the archive web page with a link to
    all video lectures. It would be tedious
    download each video manually.
    In this example, we first scan a web page to extract
    all links and then download the video.
    "" "

     
    # enter the archive address here

    archive_url = " http://www-personal.umich.edu/ ~ csev / books / py4inf / media /"

     

    def get_video_links ():

     

    # create object response

    r = requests.get (archive_url)

     

    # create a nice soup object

    soup = BeautifulSoup (r.content, `html5lib` )

     

    # find all links on a web page

    links = soup.findAll ( `a` )

     

    # filter link submissions from using .mp4

    video_links = [archive_url + link [ ` href` ] for link in links if link [ `href` ]. endswith ( ` mp4` )]

     

      return video_links

     

     

    def download_video_series (video_links):

    < p>  

    for link in video_links:

      

      & # 39; & # 39; & # 39; iterate over all links in video_links

    and download them one by one

     

    # get the filename by splitting the URL and getting

    # last line

    file_name = link.split ( `/` ) [ - 1

     

    print "Downloading file:% s" % file_name

      

    # create response object

    r = requests.get (link, stream = True )

     

    # download started

    with open (file_name, `wb` ) as f:

      for chunk in r.iter_content (chunk_size = 1024 * 1024 ):

    if chunk:

    f.write (chunk)

     

    print "% s downloaded! " % file_name

      

    print "All videos downloaded!"

    return

      

     

    if __ name__ = = " __ main__ " :

     

    # get all video links

    video_links = get_video_links ()

      

    # download all videos

    download_video_series (video_links)

     

    Benefits of using a query library to load web files:

    • You can easily download web directories by recursively browsing the website!
    • This is a browser independent method and much faster!
    • You can just flush the webpage to get all the file urls on the webpage, and hence load all files in one command.