Change language

Pandas | Parsing a JSON dataset

| | |

A JSON parser that converts JSON text to another representation must accept all texts that conform to the JSON grammar. It can accept non-JSON forms or extensions. An implementation can set the following:

  • limits on the size of texts it accepts,
  • limits on maximum nesting depth,
  • limits on range and precision of numbers ,
  • set limits on the length and character content of strings.

Working with large JSON datasets can be degraded, especially if they are too large to fit in memory. In such cases, a combination of command line tools and Python can provide an efficient way to explore and analyze the data.

Importing JSON files:

The JSON is manipulated using the analysis library Python data structure called pandas.

 import pandas as pd 

You can now read the JSON and save it as a pandas data structure using the read_json command.

pandas.read_json (path_or_buf = None, orient = None, typ = ’frame’, dtype = True, convert_axes = True, convert_dates = True, keep_default_dates = True, numpy = False, precise_float = False, date_unit = None, encoding = None, lines = False, chunksize = None, compression = ’infer’)

import pandas as pd

# Create Dataframe

df = pd.DataFrame ([[ ’ a’ , ’b’ ], [ ’c’ , ’ d’ ]],

index = [ ’row 1’ , ’row 2’ ],

  columns = [ ’col 1’ , ’col 2’ ])

 
# Specify the expected JSON string format

print (df.to_json (orient = ’split’ ))

  

print (df.to_json (orient = ’index’ ))

Output:

 {"columns": ["col 1", "col 2"], "index": ["row 1", "row 2"], "data" : [["a", "b"], ["c", "d"]]} {"row 1": {"col 1": "a", "col 2": "b"}, " row 2 ": {" col 1 ":" c "," col 2 ":" d "}} 

Convert the object to a JSON string using dataframe.to_json :

DataFrame.to_json (path_or_buf = None, orient = None, date_format = None, double_precision = 10, force_ascii = True, date_unit = ’ms’, default_handler = None, lines = False, compression = ’infer’, index = True)

Read the JSON file directly from the dataset:

import pandas as pd

 

data = pd.read_json ( http://api.population.io/1.0/population/India/today-and-tomorrow/?format = json’ )

print (data)

Exit :

 total_population 0 {’date’:’ 2019-03-18’, ’population’: 1369169250} 1 {’date’:’ 2019-03-19’, ’populatio n’: 1369211502} 

Nested JSON parsing with pandas:

Nested JSON files can be time consuming and difficult to process and load into Pandas. 
We use nested " raw_nyc_phil.json " to create flattened pandas dataframe from one nested array, and then unpack the deeply nested array.

Code # 1:
Let’s unpack the works column into a separate dataframe. We’ll also take flat columns.

import json 

import pandas as pd 

from pandas.io.json import json_normalize 

 

with open ( https://github.com/a9k00r/python-test/blob/master/raw_nyc_phil.json ) as f:

d = json.load (f)

 
# allows you to put data in panda df
# by clicking on raw_nyc_phil.json in the Input Files section
# tells us that the parent node is “programs”

nycphil = json_normalize (d [ ’programs’ ])

nycphil.head ( 3 )

Output:

Code # 2:
Let’s unpack the works column into a separate dataframe, using json_normaliz .

works_data = json_normalize (data = d [ ’ programs’ ],

record_path = ’works’

meta = [ ’id’ , ’ orchestra’ , ’programID’ , ’ season’ ])

works_data.head ( 3 )

Output:

Code # 3:

Let’s smooth out the "soloists" here by passing in a list. Since the soloists are invested in the work.

soloist_data = json_normalize (data = d [ ’programs’ ],

record_path = [ ’works’ , ’soloists’ ],

  meta = [ ’id’ ])

 

soloist_data.head ( 3 )

Output:

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News


Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method