How to download a file over HTTP?

StackOverflow

I have a small utility that I use to download an MP3 file from a website on a schedule and then builds/updates a podcast XML file which I"ve added to iTunes.

The text processing that creates/updates the XML file is written in Python. However, I use wget inside a Windows .bat file to download the actual MP3 file. I would prefer to have the entire utility written in Python.

I struggled to find a way to actually download the file in Python, thus why I resorted to using wget.

So, how do I download the file using Python?

Answer rating: 1172

One more, using urlretrieve:

import urllib
urllib.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

(for Python 3+ use import urllib.request and urllib.request.urlretrieve)

Yet another one, with a "progressbar"

import urllib2

url = "http://download.thinkbroadband.com/10MB.zip"

file_name = url.split("/")[-1]
u = urllib2.urlopen(url)
f = open(file_name, "wb")
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)

file_size_dl = 0
block_sz = 8192
while True:
    buffer = u.read(block_sz)
    if not buffer:
        break

    file_size_dl += len(buffer)
    f.write(buffer)
    status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
    status = status + chr(8)*(len(status)+1)
    print status,

f.close()

Answer rating: 511

Use urllib.request.urlopen():

import urllib.request
with urllib.request.urlopen("http://www.example.com/") as f:
    html = f.read().decode("utf-8")

This is the most basic way to use the library, minus any error handling. You can also do more complex stuff such as changing headers.

On Python 2, the method is in urllib2:

import urllib2
response = urllib2.urlopen("http://www.example.com/")
html = response.read()

Answer rating: 375

In 2012, use the python requests library

>>> import requests
>>> 
>>> url = "http://download.thinkbroadband.com/10MB.zip"
>>> r = requests.get(url)
>>> print len(r.content)
10485760

You can run pip install requests to get it.

Requests has many advantages over the alternatives because the API is much simpler. This is especially true if you have to do authentication. urllib and urllib2 are pretty unintuitive and painful in this case.


2015-12-30

People have expressed admiration for the progress bar. It"s cool, sure. There are several off-the-shelf solutions now, including tqdm:

from tqdm import tqdm
import requests

url = "http://download.thinkbroadband.com/10MB.zip"
response = requests.get(url, stream=True)

with open("10MB", "wb") as handle:
    for data in tqdm(response.iter_content()):
        handle.write(data)

This is essentially the implementation @kvance described 30 months ago.

Answer rating: 164

import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
with open("test.mp3","wb") as output:
  output.write(mp3file.read())

The wb in open("test.mp3","wb") opens a file (and erases any existing file) in binary mode so you can save data with it instead of just text.

Answer rating: 148

Python 3

  • urllib.request.urlopen

    import urllib.request
    response = urllib.request.urlopen("http://www.example.com/")
    html = response.read()
    
  • urllib.request.urlretrieve

    import urllib.request
    urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
    

    Note: According to the documentation, urllib.request.urlretrieve is a "legacy interface" and "might become deprecated in the future" (thanks gerrit)

Python 2

  • urllib2.urlopen (thanks Corey)

    import urllib2
    response = urllib2.urlopen("http://www.example.com/")
    html = response.read()
    
  • urllib.urlretrieve (thanks PabloG)

    import urllib
    urllib.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
    




Get Solution for free from DataCamp guru