I have a text file saved on S3 which is a tab delimited table. I want to load it into pandas but cannot save it first because I am running on a heroku server. Here is what I have so far.
import io import boto3 import os import pandas as pd os.environ["AWS_ACCESS_KEY_ID"] = "xxxxxxxx" os.environ["AWS_SECRET_ACCESS_KEY"] = "xxxxxxxx" s3_client = boto3.client("s3") response = s3_client.get_object(Bucket="my_bucket",Key="filename.txt") file = response["Body"] pd.read_csv(file, header=14, delimiter=" ", low_memory=False)
the error is
OSError: Expected file path name or file-like object, got <class "bytes"> type
How do I convert the response body into a format pandas will accept?
pd.read_csv(io.StringIO(file), header=14, delimiter=" ", low_memory=False) returns TypeError: initial_value must be str or None, not StreamingBody pd.read_csv(io.BytesIO(file), header=14, delimiter=" ", low_memory=False) returns TypeError: "StreamingBody" does not support the buffer interface
UPDATE - Using the following worked
file = response["Body"].read()
pd.read_csv(io.BytesIO(file), header=14, delimiter=" ", low_memory=False)
read_csv, so you should be able to:
import boto data = pd.read_csv("s3://bucket....csv")
If you need
boto3 because you are on
python3.4+, you can
import boto3 import io s3 = boto3.client("s3") obj = s3.get_object(Bucket="bucket", Key="key") df = pd.read_csv(io.BytesIO(obj["Body"].read()))
Since version 0.20.1
s3fs, see answer below.
Now pandas can handle S3 URLs. You could simply do:
import pandas as pd import s3fs df = pd.read_csv("s3://bucket-name/file.csv")
You need to install
s3fs if you don"t have it.
pip install s3fs
If your S3 bucket is private and requires authentication, you have two options:
1- Add access credentials to your
~/.aws/credentials config file
[default] aws_access_key_id=AKIAIOSFODNN7EXAMPLE aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
2- Set the following environment variables with their proper values:
If you can program, you are ready to grapple with Bayesian statistics. In this book, you'll learn how to solve statistical problems using Python code instead of math formulas, using discrete probabili...
Mastering regular expressions by Jeffrey Friedl, 3rd edition. Regular expressions are an extremely powerful tool for manipulating text and data. They are standard features today in a variety of pop...
For many decades, some powerful trends have been in place. Computer hardware has rap- idly been getting faster, cheaper and smaller. Internet bandwidth (that is, its information carrying capacity) has...
Python Crash Course is the world's best-selling guide to the Python programming language. This quick and in-depth introduction to Python programming will get you started writing programs, solving prob...