Pandas | Parsing a JSON dataset

A JSON parser that converts JSON text to another representation must accept all texts that conform to the JSON grammar. It can accept non-JSON forms or extensions. An implementation can set the following:

  • limits on the size of texts it accepts,
  • limits on maximum nesting depth,
  • limits on range and precision of numbers ,
  • set limits on the length and character content of strings.

Working with large JSON datasets can be degraded, especially if they are too large to fit in memory. In such cases, a combination of command line tools and Python can provide an efficient way to explore and analyze the data.

Importing JSON files:

The JSON is manipulated using the analysis library Python data structure called pandas.

 import pandas as pd 

You can now read the JSON and save it as a pandas data structure using the read_json command.

pandas.read_json (path_or_buf = None, orient = None, typ = `frame`, dtype = True, convert_axes = True, convert_dates = True, keep_default_dates = True, numpy = False, precise_float = False, date_unit = None, encoding = None, lines = False, chunksize = None, compression = `infer`)

import pandas as pd

# Create Dataframe

df = pd.DataFrame ([[ ` a` , `b` ], [ `c` , ` d` ]],

index = [ `row 1` , `row 2` ],

  columns = [ `col 1` , `col 2` ])

 
# Specify the expected JSON string format

print (df.to_json (orient = `split` ))

  

print (df.to_json (orient = `index` ))

Output:

 {"columns": ["col 1", "col 2"], "index": ["row 1", "row 2"], "data" : [["a", "b"], ["c", "d"]]} {"row 1": {"col 1": "a", "col 2": "b"}, " row 2 ": {" col 1 ":" c "," col 2 ":" d "}} 

Convert the object to a JSON string using dataframe.to_json :

DataFrame.to_json (path_or_buf = None, orient = None, date_format = None, double_precision = 10, force_ascii = True, date_unit = `ms`, default_handler = None, lines = False, compression = `infer`, index = True)

Read the JSON file directly from the dataset:

import pandas as pd

 

data = pd.read_json ( ` http://api.population.io/1.0/population/India/today-and-tomorrow/?format = json` )

print (data)

Exit :

 total_population 0 {`date`:` 2019-03-18`, `population`: 1369169250} 1 {`date`:` 2019-03-19`, `populatio n`: 1369211502} 

Nested JSON parsing with pandas:

Nested JSON files can be time consuming and difficult to process and load into Pandas. 
We use nested “ raw_nyc_phil.json ” to create flattened pandas dataframe from one nested array, and then unpack the deeply nested array.

Code # 1:
Let`s unpack the works column into a separate dataframe. We`ll also take flat columns.

import json 

import pandas as pd 

from pandas.io.json import json_normalize 

 

with open ( ` https://github.com/a9k00r/python-test/blob/master/raw_nyc_phil.json ` ) as f:

d = json.load (f)

 
# allows you to put data in panda df
# by clicking on raw_nyc_phil.json in the Input Files section
# tells us that the parent node is “programs”

nycphil = json_normalize (d [ `programs` ])

nycphil.head ( 3 )

Output:

Code # 2:
Let`s unpack the works column into a separate dataframe, using json_normaliz .

works_data = json_normalize (data = d [ ` programs` ],

record_path = `works`

meta = [ `id` , ` orchestra` , `programID` , ` season` ])

works_data.head ( 3 )

Output:

Code # 3:

Let`s smooth out the “soloists” here by passing in a list. Since the soloists are invested in the work.

soloist_data = json_normalize (data = d [ `programs` ],

record_path = [ `works` , `soloists` ],

  meta = [ `id` ])

 

soloist_data.head ( 3 )

Output: