Change language

Pandas | Parsing a JSON dataset

A JSON parser that converts JSON text to another representation must accept all texts that conform to the JSON grammar. It can accept non-JSON forms or extensions. An implementation can set the following:

  • limits on the size of texts it accepts,
  • limits on maximum nesting depth,
  • limits on range and precision of numbers ,
  • set limits on the length and character content of strings.

Working with large JSON datasets can be degraded, especially if they are too large to fit in memory. In such cases, a combination of command line tools and Python can provide an efficient way to explore and analyze the data.

Importing JSON files:

The JSON is manipulated using the analysis library Python data structure called pandas.

 import pandas as pd 

You can now read the JSON and save it as a pandas data structure using the read_json command.

pandas.read_json (path_or_buf = None, orient = None, typ = ’frame’, dtype = True, convert_axes = True, convert_dates = True, keep_default_dates = True, numpy = False, precise_float = False, date_unit = None, encoding = None, lines = False, chunksize = None, compression = ’infer’)

import pandas as pd

# Create Dataframe

df = pd.DataFrame ([[ ’ a’ , ’b’ ], [ ’c’ , ’ d’ ]],

index = [ ’row 1’ , ’row 2’ ],

  columns = [ ’col 1’ , ’col 2’ ])

 
# Specify the expected JSON string format

print (df.to_json (orient = ’split’ ))

  

print (df.to_json (orient = ’index’ ))

Output:

 {"columns": ["col 1", "col 2"], "index": ["row 1", "row 2"], "data" : [["a", "b"], ["c", "d"]]} {"row 1": {"col 1": "a", "col 2": "b"}, " row 2 ": {" col 1 ":" c "," col 2 ":" d "}} 

Convert the object to a JSON string using dataframe.to_json :

DataFrame.to_json (path_or_buf = None, orient = None, date_format = None, double_precision = 10, force_ascii = True, date_unit = ’ms’, default_handler = None, lines = False, compression = ’infer’, index = True)

Read the JSON file directly from the dataset:

import pandas as pd

 

data = pd.read_json ( http://api.population.io/1.0/population/India/today-and-tomorrow/?format = json’ )

print (data)

Exit :

 total_population 0 {’date’:’ 2019-03-18’, ’population’: 1369169250} 1 {’date’:’ 2019-03-19’, ’populatio n’: 1369211502} 

Nested JSON parsing with pandas:

Nested JSON files can be time consuming and difficult to process and load into Pandas. 
We use nested " raw_nyc_phil.json " to create flattened pandas dataframe from one nested array, and then unpack the deeply nested array.

Code # 1:
Let’s unpack the works column into a separate dataframe. We’ll also take flat columns.

import json 

import pandas as pd 

from pandas.io.json import json_normalize 

 

with open ( https://github.com/a9k00r/python-test/blob/master/raw_nyc_phil.json ) as f:

d = json.load (f)

 
# allows you to put data in panda df
# by clicking on raw_nyc_phil.json in the Input Files section
# tells us that the parent node is “programs”

nycphil = json_normalize (d [ ’programs’ ])

nycphil.head ( 3 )

Output:

Code # 2:
Let’s unpack the works column into a separate dataframe, using json_normaliz .

works_data = json_normalize (data = d [ ’ programs’ ],

record_path = ’works’

meta = [ ’id’ , ’ orchestra’ , ’programID’ , ’ season’ ])

works_data.head ( 3 )

Output:

Code # 3:

Let’s smooth out the "soloists" here by passing in a list. Since the soloists are invested in the work.

soloist_data = json_normalize (data = d [ ’programs’ ],

record_path = [ ’works’ , ’soloists’ ],

  meta = [ ’id’ ])

 

soloist_data.head ( 3 )

Output:

Shop

Gifts for programmers

Learn programming in R: courses

$FREE
Gifts for programmers

Best Python online courses for 2022

$FREE
Gifts for programmers

Best laptop for Fortnite

$399+
Gifts for programmers

Best laptop for Excel

$
Gifts for programmers

Best laptop for Solidworks

$399+
Gifts for programmers

Best laptop for Roblox

$399+
Gifts for programmers

Best computer for crypto mining

$499+
Gifts for programmers

Best laptop for Sims 4

$

Latest questions

PythonStackOverflow

Common xlabel/ylabel for matplotlib subplots

1947 answers

PythonStackOverflow

Check if one list is a subset of another in Python

1173 answers

PythonStackOverflow

How to specify multiple return types using type-hints

1002 answers

PythonStackOverflow

Printing words vertically in Python

909 answers

PythonStackOverflow

Python Extract words from a given string

798 answers

PythonStackOverflow

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

606 answers

PythonStackOverflow

Python os.path.join () method

384 answers

PythonStackOverflow

Flake8: Ignore specific warning for entire file

360 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically