Python: Ignore “Incorrect padding” error when base64 decoding

| |

👻 Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!

I have some data that is base64 encoded that I want to convert back to binary even if there is a padding error in it. If I use

base64.decodestring(b64_string)

it raises an "Incorrect padding" error. Is there another way?

UPDATE: Thanks for all the feedback. To be honest, all the methods mentioned sounded a bit hit and miss so I decided to try openssl. The following command worked a treat:

openssl enc -d -base64 -in b64string -out binary_data

👻 Read also: what is the best laptop for engineering students?

Python: Ignore "Incorrect padding" error when base64 decoding ones: Questions

Is there a list of Pytz Timezones?

3 answers

I would like to know what are all the possible values for the timezone argument in the Python library pytz. How to do it?

834

Answer #1

You can list all the available timezones with pytz.all_timezones:

In [40]: import pytz
In [41]: pytz.all_timezones
Out[42]: 
["Africa/Abidjan",
 "Africa/Accra",
 "Africa/Addis_Ababa",
 ...]

There is also pytz.common_timezones:

In [45]: len(pytz.common_timezones)
Out[45]: 403

In [46]: len(pytz.all_timezones)
Out[46]: 563

Python: Ignore "Incorrect padding" error when base64 decoding ones: Questions

Python strptime() and timezones?

3 answers

I have a CSV dumpfile from a Blackberry IPD backup, created using IPDDump. The date/time strings in here look something like this (where EST is an Australian time-zone):

Tue Jun 22 07:46:22 EST 2010

I need to be able to parse this date in Python. At first, I tried to use the strptime() function from datettime.

>>> datetime.datetime.strptime("Tue Jun 22 12:10:20 2010 EST", "%a %b %d %H:%M:%S %Y %Z")

However, for some reason, the datetime object that comes back doesn"t seem to have any tzinfo associated with it.

I did read on this page that apparently datetime.strptime silently discards tzinfo, however, I checked the documentation, and I can"t find anything to that effect documented here.

I have been able to get the date parsed using a third-party Python library, dateutil, however I"m still curious as to how I was using the in-built strptime() incorrectly? Is there any way to get strptime() to play nicely with timezones?

198

Answer #1

I recommend using python-dateutil. Its parser has been able to parse every date format I"ve thrown at it so far.

>>> from dateutil import parser
>>> parser.parse("Tue Jun 22 07:46:22 EST 2010")
datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzlocal())
>>> parser.parse("Fri, 11 Nov 2011 03:18:09 -0400")
datetime.datetime(2011, 11, 11, 3, 18, 9, tzinfo=tzoffset(None, -14400))
>>> parser.parse("Sun")
datetime.datetime(2011, 12, 18, 0, 0)
>>> parser.parse("10-11-08")
datetime.datetime(2008, 10, 11, 0, 0)

and so on. No dealing with strptime() format nonsense... just throw a date at it and it Does The Right Thing.

Update: Oops. I missed in your original question that you mentioned that you used dateutil, sorry about that. But I hope this answer is still useful to other people who stumble across this question when they have date parsing questions and see the utility of that module.

Python: Ignore "Incorrect padding" error when base64 decoding ones: Questions

Fitting empirical distribution to theoretical ones with Scipy (Python)?

3 answers

INTRODUCTION: I have a list of more than 30,000 integer values ranging from 0 to 47, inclusive, e.g.[0,0,0,0,..,1,1,1,1,...,2,2,2,2,...,47,47,47,...] sampled from some continuous distribution. The values in the list are not necessarily in order, but order doesn"t matter for this problem.

PROBLEM: Based on my distribution I would like to calculate p-value (the probability of seeing greater values) for any given value. For example, as you can see p-value for 0 would be approaching 1 and p-value for higher numbers would be tending to 0.

I don"t know if I am right, but to determine probabilities I think I need to fit my data to a theoretical distribution that is the most suitable to describe my data. I assume that some kind of goodness of fit test is needed to determine the best model.

Is there a way to implement such an analysis in Python (Scipy or Numpy)? Could you present any examples?

Thank you!

159

Answer #1

Distribution Fitting with Sum of Square Error (SSE)

This is an update and modification to Saullo"s answer, that uses the full list of the current scipy.stats distributions and returns the distribution with the least SSE between the distribution"s histogram and the data"s histogram.

Example Fitting

Using the El Niño dataset from statsmodels, the distributions are fit and error is determined. The distribution with the least error is returned.

All Distributions

All Fitted Distributions

Best Fit Distribution

Best Fit Distribution

Example Code

%matplotlib inline

import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
from scipy.stats._continuous_distns import _distn_names
import matplotlib
import matplotlib.pyplot as plt

matplotlib.rcParams["figure.figsize"] = (16.0, 12.0)
matplotlib.style.use("ggplot")

# Create models from data
def best_fit_distribution(data, bins=200, ax=None):
    """Model data by finding best fit distribution to data"""
    # Get histogram of original data
    y, x = np.histogram(data, bins=bins, density=True)
    x = (x + np.roll(x, -1))[:-1] / 2.0

    # Best holders
    best_distributions = []

    # Estimate distribution parameters from data
    for ii, distribution in enumerate([d for d in _distn_names if not d in ["levy_stable", "studentized_range"]]):

        print("{:>3} / {:<3}: {}".format( ii+1, len(_distn_names), distribution ))

        distribution = getattr(st, distribution)

        # Try to fit the distribution
        try:
            # Ignore warnings from data that can"t be fit
            with warnings.catch_warnings():
                warnings.filterwarnings("ignore")
                
                # fit dist to data
                params = distribution.fit(data)

                # Separate parts of parameters
                arg = params[:-2]
                loc = params[-2]
                scale = params[-1]
                
                # Calculate fitted PDF and error with fit in distribution
                pdf = distribution.pdf(x, loc=loc, scale=scale, *arg)
                sse = np.sum(np.power(y - pdf, 2.0))
                
                # if axis pass in add to plot
                try:
                    if ax:
                        pd.Series(pdf, x).plot(ax=ax)
                    end
                except Exception:
                    pass

                # identify if this distribution is better
                best_distributions.append((distribution, params, sse))
        
        except Exception:
            pass

    
    return sorted(best_distributions, key=lambda x:x[2])

def make_pdf(dist, params, size=10000):
    """Generate distributions"s Probability Distribution Function """

    # Separate parts of parameters
    arg = params[:-2]
    loc = params[-2]
    scale = params[-1]

    # Get sane start and end points of distribution
    start = dist.ppf(0.01, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.01, loc=loc, scale=scale)
    end = dist.ppf(0.99, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.99, loc=loc, scale=scale)

    # Build PDF and turn into pandas Series
    x = np.linspace(start, end, size)
    y = dist.pdf(x, loc=loc, scale=scale, *arg)
    pdf = pd.Series(y, x)

    return pdf

# Load data from statsmodels datasets
data = pd.Series(sm.datasets.elnino.load_pandas().data.set_index("YEAR").values.ravel())

# Plot for comparison
plt.figure(figsize=(12,8))
ax = data.plot(kind="hist", bins=50, density=True, alpha=0.5, color=list(matplotlib.rcParams["axes.prop_cycle"])[1]["color"])

# Save plot limits
dataYLim = ax.get_ylim()

# Find best fit distribution
best_distibutions = best_fit_distribution(data, 200, ax)
best_dist = best_distibutions[0]

# Update plots
ax.set_ylim(dataYLim)
ax.set_title(u"El Niño sea temp.
 All Fitted Distributions")
ax.set_xlabel(u"Temp (°C)")
ax.set_ylabel("Frequency")

# Make PDF with best params 
pdf = make_pdf(best_dist[0], best_dist[1])

# Display
plt.figure(figsize=(12,8))
ax = pdf.plot(lw=2, label="PDF", legend=True)
data.plot(kind="hist", bins=50, density=True, alpha=0.5, label="Data", legend=True, ax=ax)

param_names = (best_dist[0].shapes + ", loc, scale").split(", ") if best_dist[0].shapes else ["loc", "scale"]
param_str = ", ".join(["{}={:0.2f}".format(k,v) for k,v in zip(param_names, best_dist[1])])
dist_str = "{}({})".format(best_dist[0].name, param_str)

ax.set_title(u"El Niño sea temp. with best fit distribution 
" + dist_str)
ax.set_xlabel(u"Temp. (°C)")
ax.set_ylabel("Frequency")

How can I open multiple files using "with open" in Python?

5 answers

I want to change a couple of files at one time, iff I can write to all of them. I"m wondering if I somehow can combine the multiple open calls with the with statement:

try:
  with open("a", "w") as a and open("b", "w") as b:
    do_something()
except IOError as e:
  print "Operation failed: %s" % e.strerror

If that"s not possible, what would an elegant solution to this problem look like?

788

Answer #1

As of Python 2.7 (or 3.1 respectively) you can write

with open("a", "w") as a, open("b", "w") as b:
    do_something()

In earlier versions of Python, you can sometimes use contextlib.nested() to nest context managers. This won"t work as expected for opening multiples files, though -- see the linked documentation for details.


In the rare case that you want to open a variable number of files all at the same time, you can use contextlib.ExitStack, starting from Python version 3.3:

with ExitStack() as stack:
    files = [stack.enter_context(open(fname)) for fname in filenames]
    # Do something with "files"

Most of the time you have a variable set of files, you likely want to open them one after the other, though.

open() in Python does not create a file if it doesn"t exist

5 answers

What is the best way to open a file as read/write if it exists, or if it does not, then create it and open it as read/write? From what I read, file = open("myfile.dat", "rw") should do this, right?

It is not working for me (Python 2.6.2) and I"m wondering if it is a version problem, or not supposed to work like that or what.

The bottom line is, I just need a solution for the problem. I am curious about the other stuff, but all I need is a nice way to do the opening part.

The enclosing directory was writeable by user and group, not other (I"m on a Linux system... so permissions 775 in other words), and the exact error was:

IOError: no such file or directory.

778

Answer #1

You should use open with the w+ mode:

file = open("myfile.dat", "w+")

Difference between modes a, a+, w, w+, and r+ in built-in open function?

5 answers

In the python built-in open function, what is the exact difference between the modes w, a, w+, a+, and r+?

In particular, the documentation implies that all of these will allow writing to the file, and says that it opens the files for "appending", "writing", and "updating" specifically, but does not define what these terms mean.

721

Answer #1

The opening modes are exactly the same as those for the C standard library function fopen().

The BSD fopen manpage defines them as follows:

 The argument mode points to a string beginning with one of the following
 sequences (Additional characters may follow these sequences.):

 ``r""   Open text file for reading.  The stream is positioned at the
         beginning of the file.

 ``r+""  Open for reading and writing.  The stream is positioned at the
         beginning of the file.

 ``w""   Truncate file to zero length or create text file for writing.
         The stream is positioned at the beginning of the file.

 ``w+""  Open for reading and writing.  The file is created if it does not
         exist, otherwise it is truncated.  The stream is positioned at
         the beginning of the file.

 ``a""   Open for writing.  The file is created if it does not exist.  The
         stream is positioned at the end of the file.  Subsequent writes
         to the file will always end up at the then current end of file,
         irrespective of any intervening fseek(3) or similar.

 ``a+""  Open for reading and writing.  The file is created if it does not
         exist.  The stream is positioned at the end of the file.  Subse-
         quent writes to the file will always end up at the then current
         end of file, irrespective of any intervening fseek(3) or similar.

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News


Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method