What is a zip file?
ZIP — is an archive file format that supports lossless data compression. By lossless compression, we mean that the compression algorithm allows you to completely recover the original data from the compressed data. Thus, the ZIP file — it is a single file containing one or more compressed files, offering an ideal way to reduce the size of large files and keep related files together.
Why do we need ZIP files?
- To reduce storage requirements.
- To improve transfer speed over standard connections.
To work with zip files using python, we we will use a built-in python module called zipfile .
1. Unzip the zip file
|
The above program extracts a zip file named "my_python_files.zip" in the same directory as this Python script.
The output of the above program might look like this:
Let’s try to understand the above code piece by piece:
-
from zipfile import ZipFile
ZipFile — it is a zipfile module class for reading and writing zip files. Here we import only the ZipFile class from the zipfile module.
-
with ZipFile (file_name, ’r’) as zip:
Here the ZipFile object is created by calling the constructor ZipFile, which accepts a zip file name and mode parameters. We create a ZipFile object in READ mode and name it zip code.
-
zip.printdir ()
The printdir () method prints the table of contents for the archive.
-
zip.extractall ()
The extractall method () will extract the entire contents of the zip file to the current working directory. You can also call the extract () method to extract any file by specifying its path in the zip file.
For example:zip.extract (’python_files / python_wiki.txt’)
This will extract only the specified file.
If you want to read some specific file, you can go like this:
data = zip.read (name_of_file_to_read)
2. Writing to a zip file
Consider a directory (folder) with the following format:
Here we will need to scan the entire directory and its subdirectories to get a list of all file paths before writing them to the zip file.
The following program does this by scanning the directory to be archived:
|
Output the above program looks like this:
Let’s try to understand the above code, breaking it down into fragments:
-
def get_all_file_paths (directory): file_paths = [] for root, directories, files in os.walk (directory): for filename in files: filepath = os.path.join ( (root, filename) file_paths.append (filepath ) return file_paths
First of all, to get all the file paths in our directory, we created this function that uses the os.walk () method. At each iteration, all files present in this directory are added to a list named file_paths .
Finally, we return all the file paths. -
file_paths = get_all_file_paths (directory)
Here we pass the directory to archived to function get_all_file_paths () and get a list containing all file paths.
-
with ZipFile (’my_python_files.zip’,’ w ’) as zip:
Here we create a ZipFile object in WRITE mode this time.
-
for file in file_paths: zip.write (file)
Here we write all files to a zip file one by one using the write method.
3. Get all information about a zip file
|
The output of the above program might look like this:
for info in zip.infolist ():
Here the method is infolist () creates an instance of the ZipInfo class which contains all the information about the zip file.
This article contributed by Nikhil Kumar . If you like Python.Engineering and would like to contribute, you can also write an article using contrib.python.engineering, or email your article to [email protected] See my article appearing on the Python.Engineering homepage and help other geeks.
Please post comments if you find anything wrong or if you would like to share more information on the topic discussed above.