👻 *Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!*

Optparse, the old version just ignores all unrecognised arguments and carries on. In most situations, this isn"t ideal and was changed in argparse. But there are a few situations where you want to ignore any unrecognised arguments and parse the ones you"ve specified.

For example:

```
parser = argparse.ArgumentParser()
parser.add_argument("--foo", dest="foo")
parser.parse_args()
$python myscript.py --foo 1 --bar 2
error: unrecognized arguments: --bar
```

Is there anyway to overwrite this?

👻 *Read also: what is the best laptop for engineering students?*

## Python argparse ignore unrecognised arguments ones: Questions

Is there a list of Pytz Timezones?

3 answers

I would like to know what are all the possible values for the timezone argument in the Python library pytz. How to do it?

Answer #1

You can list all the available timezones with `pytz.all_timezones`

:

```
In [40]: import pytz
In [41]: pytz.all_timezones
Out[42]:
["Africa/Abidjan",
"Africa/Accra",
"Africa/Addis_Ababa",
...]
```

There is also `pytz.common_timezones`

:

```
In [45]: len(pytz.common_timezones)
Out[45]: 403
In [46]: len(pytz.all_timezones)
Out[46]: 563
```

## Python argparse ignore unrecognised arguments ones: Questions

Python strptime() and timezones?

3 answers

I have a CSV dumpfile from a Blackberry IPD backup, created using IPDDump.
The date/time strings in here look something like this
(where `EST`

is an Australian time-zone):

```
Tue Jun 22 07:46:22 EST 2010
```

I need to be able to parse this date in Python. At first, I tried to use the `strptime()`

function from datettime.

```
>>> datetime.datetime.strptime("Tue Jun 22 12:10:20 2010 EST", "%a %b %d %H:%M:%S %Y %Z")
```

However, for some reason, the `datetime`

object that comes back doesn"t seem to have any `tzinfo`

associated with it.

I did read on this page that apparently `datetime.strptime`

silently discards `tzinfo`

, however, I checked the documentation, and I can"t find anything to that effect documented here.

I have been able to get the date parsed using a third-party Python library, dateutil, however I"m still curious as to how I was using the in-built `strptime()`

incorrectly? Is there any way to get `strptime()`

to play nicely with timezones?

Answer #1

I recommend using python-dateutil. Its parser has been able to parse every date format I"ve thrown at it so far.

```
>>> from dateutil import parser
>>> parser.parse("Tue Jun 22 07:46:22 EST 2010")
datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzlocal())
>>> parser.parse("Fri, 11 Nov 2011 03:18:09 -0400")
datetime.datetime(2011, 11, 11, 3, 18, 9, tzinfo=tzoffset(None, -14400))
>>> parser.parse("Sun")
datetime.datetime(2011, 12, 18, 0, 0)
>>> parser.parse("10-11-08")
datetime.datetime(2008, 10, 11, 0, 0)
```

and so on. No dealing with `strptime()`

format nonsense... just throw a date at it and it Does The Right Thing.

**Update**: Oops. I missed in your original question that you mentioned that you used `dateutil`

, sorry about that. But I hope this answer is still useful to other people who stumble across this question when they have date parsing questions and see the utility of that module.

## Python argparse ignore unrecognised arguments ones: Questions

Fitting empirical distribution to theoretical ones with Scipy (Python)?

3 answers

**INTRODUCTION**: I have a list of more than 30,000 integer values ranging from 0 to 47, inclusive, e.g.`[0,0,0,0,..,1,1,1,1,...,2,2,2,2,...,47,47,47,...]`

sampled from some continuous distribution. The values in the list are not necessarily in order, but order doesn"t matter for this problem.

**PROBLEM**: Based on my distribution I would like to calculate p-value (the probability of seeing greater values) for any given value. For example, as you can see p-value for 0 would be approaching 1 and p-value for higher numbers would be tending to 0.

I don"t know if I am right, but to determine probabilities I think I need to fit my data to a theoretical distribution that is the most suitable to describe my data. I assume that some kind of goodness of fit test is needed to determine the best model.

Is there a way to implement such an analysis in Python (`Scipy`

or `Numpy`

)?
Could you present any examples?

Thank you!

Answer #1

# Distribution Fitting with Sum of Square Error (SSE)

This is an update and modification to Saullo"s answer, that uses the full list of the current `scipy.stats`

distributions and returns the distribution with the least SSE between the distribution"s histogram and the data"s histogram.

## Example Fitting

Using the El Ni√±o dataset from `statsmodels`

, the distributions are fit and error is determined. The distribution with the least error is returned.

### All Distributions

### Best Fit Distribution

### Example Code

```
%matplotlib inline
import warnings
import numpy as np
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
from scipy.stats._continuous_distns import _distn_names
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams["figure.figsize"] = (16.0, 12.0)
matplotlib.style.use("ggplot")
# Create models from data
def best_fit_distribution(data, bins=200, ax=None):
"""Model data by finding best fit distribution to data"""
# Get histogram of original data
y, x = np.histogram(data, bins=bins, density=True)
x = (x + np.roll(x, -1))[:-1] / 2.0
# Best holders
best_distributions = []
# Estimate distribution parameters from data
for ii, distribution in enumerate([d for d in _distn_names if not d in ["levy_stable", "studentized_range"]]):
print("{:>3} / {:<3}: {}".format( ii+1, len(_distn_names), distribution ))
distribution = getattr(st, distribution)
# Try to fit the distribution
try:
# Ignore warnings from data that can"t be fit
with warnings.catch_warnings():
warnings.filterwarnings("ignore")
# fit dist to data
params = distribution.fit(data)
# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]
# Calculate fitted PDF and error with fit in distribution
pdf = distribution.pdf(x, loc=loc, scale=scale, *arg)
sse = np.sum(np.power(y - pdf, 2.0))
# if axis pass in add to plot
try:
if ax:
pd.Series(pdf, x).plot(ax=ax)
end
except Exception:
pass
# identify if this distribution is better
best_distributions.append((distribution, params, sse))
except Exception:
pass
return sorted(best_distributions, key=lambda x:x[2])
def make_pdf(dist, params, size=10000):
"""Generate distributions"s Probability Distribution Function """
# Separate parts of parameters
arg = params[:-2]
loc = params[-2]
scale = params[-1]
# Get sane start and end points of distribution
start = dist.ppf(0.01, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.01, loc=loc, scale=scale)
end = dist.ppf(0.99, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.99, loc=loc, scale=scale)
# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = dist.pdf(x, loc=loc, scale=scale, *arg)
pdf = pd.Series(y, x)
return pdf
# Load data from statsmodels datasets
data = pd.Series(sm.datasets.elnino.load_pandas().data.set_index("YEAR").values.ravel())
# Plot for comparison
plt.figure(figsize=(12,8))
ax = data.plot(kind="hist", bins=50, density=True, alpha=0.5, color=list(matplotlib.rcParams["axes.prop_cycle"])[1]["color"])
# Save plot limits
dataYLim = ax.get_ylim()
# Find best fit distribution
best_distibutions = best_fit_distribution(data, 200, ax)
best_dist = best_distibutions[0]
# Update plots
ax.set_ylim(dataYLim)
ax.set_title(u"El Ni√±o sea temp.
All Fitted Distributions")
ax.set_xlabel(u"Temp (¬∞C)")
ax.set_ylabel("Frequency")
# Make PDF with best params
pdf = make_pdf(best_dist[0], best_dist[1])
# Display
plt.figure(figsize=(12,8))
ax = pdf.plot(lw=2, label="PDF", legend=True)
data.plot(kind="hist", bins=50, density=True, alpha=0.5, label="Data", legend=True, ax=ax)
param_names = (best_dist[0].shapes + ", loc, scale").split(", ") if best_dist[0].shapes else ["loc", "scale"]
param_str = ", ".join(["{}={:0.2f}".format(k,v) for k,v in zip(param_names, best_dist[1])])
dist_str = "{}({})".format(best_dist[0].name, param_str)
ax.set_title(u"El Ni√±o sea temp. with best fit distribution
" + dist_str)
ax.set_xlabel(u"Temp. (¬∞C)")
ax.set_ylabel("Frequency")
```

Why use argparse rather than optparse?

1 answers

I noticed that the Python 2.7 documentation includes yet another command-line parsing module. In addition to `getopt`

and `optparse`

we now have `argparse`

.

Why has yet another command-line parsing module been created? Why should I use it instead of `optparse`

? Are there new features that I should know about?

Answer #1

As of python `2.7`

, `optparse`

is deprecated, and will hopefully go away in the future.

`argparse`

is better for all the reasons listed on its original page (https://code.google.com/archive/p/argparse/):

- handling positional arguments
- supporting sub-commands
- allowing alternative option prefixes like
`+`

and`/`

- handling zero-or-more and one-or-more style arguments
- producing more informative usage messages
- providing a much simpler interface for custom types and actions

More information is also in PEP 389, which is the vehicle by which `argparse`

made it into the standard library.