Analyzing Mobile Data Rate from TRAI with Pandas

| | | | | | | | | | | | | | | |

👻 Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!

Let’s use a real dataset from TRAI to analyze mobile data rates and try to see the average speeds for a particular operator or state this month. It will also show how easily Pandas can be used on any real data to produce interesting results.

About the dataset —
Telecommunications Regulatory Authority of India (TRAI ) releases a monthly dataset of internet speeds measured by it via the app MySpeed ‚Äã‚Äã(TRAI) . This includes user-initiated speed tests or periodic background tests performed by the application. We will try to analyze this dataset and see the average speeds for a specific operator or state this month.

Checking the raw data structure:

  • Go to TRAI MySpeed ‚Äã‚ÄãPortal and upload the last month’s CSV file in the Download section. You can also download the CSV file used in this article: sept18_publish.csv or sept18_publish_drive.csv

  • Open this spreadsheet file.
    NOTE . Since the dataset is huge, the software may warn you that it cannot load all the rows. It is perfectly. Also, if you are using Microsoft Excel, there may be a warning about opening the SYLK file. This error can be ignored as it is a common error in Excel.
    Now let’s look at the location of the data-

    Column names in the dataset

    1st column is of the Network Operator - JIO, Airtel etc.
    2nd column is of the Network Technology - 3G or 4G .
    3rd column is the Type of Test initiated - upload or download .
    4th column is the Speed ‚Äã‚ÄãMeasured in Kilobytes per second.
    5th column is the Signal Strength during the measurement.
    6th column is the Local Service Area (LSA) , or the circle where the test was done - Delhi, Orissa etc. We will refer to this as simply ’states’.

  • NOTE. Signal strength can be na (Not Available) due to the fact that some devices cannot intercept the signal. We will ignore the use of this parameter in our calculations to simplify the process. However, this can easily be added as a condition when filtering.

    Packages required -

    Pandas - a popular data analysis toolkit. Very powerful for crunching large sets of data.
    Numpy - provides fast and efficient operations on arrays of homogeneous data. We will use this to along with pandas and matplotlib.
    Matplotlib - is a plotting library. We will use its bar plotting function to make bar graphs.

    Let’s start analyzing the data.

    Step # 1: Import packages and define some constants.

    import pandas as pd

    import numpy as np

    import matplotlib.pyplot as plt

    # define some constants

    # CSV dataset name

    DATASET_FILENAME = ’sept18_publish.csv’

    # define an operator to filter.


    # define the state to filter.

    CONST_STATE = ’Delhi’

    # define technology for filtering


    Step # 2: Define multiple lists that will store the final computed results so they can be easily passed to function of building a histogram. The state (or operator), download speed, and download speed will be stored sequentially, so the index, state (or operator) and their corresponding download and upload speeds can be accessed.

    For example, final_states [2 ], final_download_speeds [2] and final_upload_speeds [2] will give the corresponding values ‚Äã‚Äãfor the 3rd state.

    # define lists

    final_download_speeds = []

    final_upload_speeds = []

    final_states = []

    final_operators = []

    Step # 3: Import the file using read_csv () Pandas read_csv () and save it to & # 39; df & # 39 ;. This will create a DataFrame of CSV content that we will work on.

    df = pd.read_csv (DATASET_FILENAME)

    # assign headers for each column based on the data
    # this allows us to easily access the columns

    df.columns = [ ’Service Provider’ , ’ Technology’ , ’Test Type’ ,

    ’Data Speed’ , ’Signal Strength’ , ’ State’ ]

    Step # 4: First, let’s find all the unique states and statements in this dataset and store them in their respective lists of states and statements.

    We will use

    # find and display unique states

    states = df [ ’ State’ ]. unique ()

    print ( ’ STATES Found: ’ , states)

    # find and display unique operators

    operators = df [ ’Service Provider’ ]. unique ()

    print ( ’OPERATORS Found:’ , operators)


 STATES Found: [’Kerala’’ Rajasthan’ ’Maharashtra’’ UP East’’ Karnataka’ nan ’Madhya Pradesh’’ Kolkata’’ Bihar’’ Gujarat’’ UP West, Orissa, Tamil Nadu, Delhi, Assam, Andhra Pradesh, Haryana, Punjab, North East, Mumbai, Chennai, Himachal Pradesh, Jammu & amp; Kashmir’’ West Bengal’] OPERATORS Found: [’IDEA’’ JIO’ ’AIRTEL’’ VODAFONE’’ CELLONE’] 

Step # 5: Define the fixed_operator function, which will keep the statement constant and iterate over all the available states for that statement. We can build a similar function for a fixed state.

# filter operator and technologies
# during -first, it will be common to all

filtered = df [(df [ ’ Service Provider’ ] = = CONST_OPERATOR)

& amp; (df ‚Äã‚Äã[ ’Technology’ ] = = CONST_TECHNOLOGY)]

# iterate over each state

for state in states:

# create a new data frame that contains

# current state data only

base = filtered [filtered [ ’ State’ ] = = state]

# only filter download speeds based on test type

down = base [base [ ’Test Type’ ] = = ’download’ ]

# filter only download speed based on test type

up = base [base [ ’ Test Type’ ] = = ’ upload’ ]

# calculate average speed in Data Speed ‚Äã‚Äã

# column using Pandas. mean () method

avg_down = down [ ’Data Speed’ ]. mean ()

# calculate average speed

# in the Data Rate column

avg_up = up [ ’Data Speed’ ]. mean ()

# discard values ‚Äã‚Äãif the average is not a number (nan)

# and add only valid ones

if (pd.isnull (avg_down) or pd.isnull (avg_up)):

down, up = 0 , 0

else :

final_stat es.append (state)

final_download_speeds.append (avg_down)

final_upload_speeds.append (avg_up)

# prints up to 2 decimal places

print ( str (state) + ’- Avg. Download: ’ +

str ( ’% .2f’ % avg_down) +

’ Avg. Upload: ’ + str ( ’% .2f’ % avg_up))

Exit :

 Kerala - Avg. Download: 26129.27 Avg. Upload: 5193.46 Rajasthan - Avg. Download: 27784.86 Avg. Upload: 5736.18 Maharashtra - Avg. Download: 20707.88 Avg. Upload: 4130.46 UP East - Avg. Download: 22451.35 Avg. Upload: 5727.95 Karnataka - Avg. Download: 16950.36 Avg. Upload: 4720.68 Madhya Pradesh - Avg. Download: 23594.85 Avg. Upload: 4802.89 Kolkata - Avg. Download: 26747.80 Avg. Upload: 5655.55 Bihar - Avg. Download: 31730.54 Avg. Upload: 6599.45 Gujarat - Avg. Download: 16377.43 Avg. Upload: 3642.89 UP West - Avg. Download: 23720.82 Avg. Upload: 5280.46 Orissa - Avg. Download: 31502.05 Avg. Upload: 6895.46 Tamil Nadu - Avg. Download: 16689.28 Avg. Upload: 4107.44 Delhi - Avg. Download: 20308.30 Avg. Upload: 4877.40 Assam - Avg. Download: 5653.49 Avg. Upload: 2864.47 Andhra Pradesh - Avg. Download: 32444.07 Avg. Upload: 5755.95 Haryana - Avg. Download: 7170.63 Avg. Upload: 2680.02 Punjab - Avg. Download: 14454.45 Avg. Upload: 4981.15 North East - Avg. Download: 6702.29 Avg. Upload: 2966.84 Mumbai - Avg. Download: 14070.97 Avg. Upload: 4118.21 Chennai - Avg. Download: 20054.47 Avg. Upload: 4602.35 Himachal Pradesh - Avg. Download: 7436.99 Avg. Upload: 4020.09 Jammu & amp; Kashmir - Avg. Download: 8759.20 Avg. Upload: 4418.21 West Bengal - Avg. Download: 16821.17 Avg. Upload: 3628.78 

Plotting the data —

Use the arange () method of Numpy, which returns evenly spaced values ‚Äã‚Äãover a given interval. Here, passing the length of the list final_states , we get values ‚Äã‚Äãfrom 0 to the number of states in the list, for example [0, 1, 2, 3 ...]
Then we can use these indices to build a bar in this place. The second bar is constructed by offsetting the location of the first bar by the width of the bar.

fig, ax = plt.subplots ()

# width of each bar

bar_width = 0.25

# opacity of each bar

opacity = 0.8

# save positions

index = np.arange ( len (f inal_states))

# () takes position
# columns, data for plotting,
# width of each bar and some others
# optional parameters such as opacity and color

# build a download schedule

bar_download = (index, final_download_speeds,

bar_width, alpha = opacity,

color = ’b’ , label = ’Download’ )

# build a download schedule

bar_upload = (index + bar_width, final_upload_speeds,

bar_width, alpha = opacity, color = ’g’ ,

label = ’Upload’ )

# chart name

plt.title ( ’Avg. Download / Upload speed for ’


# X-axis label

plt.xlabel ( ’States’ )

# Y-axis label

plt.ylabel ( ’Average Speeds in Kbps’ )

# a label under each of the columns,
# matches states

plt.xticks (index + bar_width, final_states, rotation = 90 )

# draw legend
plt .le gend ()

# make the chart layout rigid
plt.tight_layout ()

# show graph ()

Histogram of estimated speeds

Comparing data from two months —

Let’s also take data from another month and plot it together to see the difference in data rates.

In this example, the dataset from the previous month will be the same as sept18_publish.csv, and next month’s dataset — oct18_publish.csv .

We just need to follow the same steps again. Read the data for another month. Filter it on subsequent data frames and then plot it using a slightly different method. During the construction of the columns, we will grow the 3rd and 4th columns (corresponding to the upload and download of the second file) by 2 and 3 times the width of the columns so that they are in their correct positions.

Offset logic when plotting 4 bars

Below is the implementation for comparison data for 2 months:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import time

Senior month #


# new month





# read panda file and save as dataframe

df = pd.read_csv (DATASET_FILENAME)

df2 = pd.read_csv (DATASET_FILENAME2)

# assign column names

df .columns = [ ’Service Provider’ , ’Technology’ , ’ Test Type ’ ,

’ Data Speed’ , ’Signal Strength’ , ’State’ ]

df2.columns = [ ’Service Provider’ , ’Technology’ , ’ Test Type’ ,

’Data Speed’ , ’Signal Strength’ , ’ State’ ]

# find and display unique states

states = df [ ’State’ ]. unique ()

print ( ’STATES Found:’ , states)

# find and display unique operators

operators = df [ ’Service Provider’ ]. unique ()

print ( ’OPERATORS Found : ’ , operators)

# define lists

final_download_speeds = []

final_upload_speeds = []

final_download_speeds_second = []

final_upload_speeds_second = []

final_states = []

final_operators = []

# assign column names to data

df.columns = [ ’Service Provider’ , ’Technology’ , ’ Test Type’ ,

’Data Speed’ , ’Signal Strength’ , ’ State’ ]

df2.columns = [ ’Service Provider’ , ’ Technology’ , ’Test Type’ ,

’Data Speed’ , ’ Signal Strength’ , ’State’ ]

print ( ’Comparing data for’ + str (CONST_OPERATOR))

filtered = df [(df [ ’Service Provider’ ] = = CONST_OPERATOR)

& amp; (df ‚Äã‚Äã[ ’Technology’ ]

👻 Read also: what is the best laptop for engineering students?

Analyzing Mobile Data Rate from TRAI with Pandas __del__: Questions

How can I make a time delay in Python?

5 answers

I would like to know how to put a time delay in a Python script.


Answer #1

import time
time.sleep(5)   # Delays for 5 seconds. You can also use a float value.

Here is another example where something is run approximately once a minute:

import time
while True:
    print("This prints once a minute.")
    time.sleep(60) # Delay for 1 minute (60 seconds).


Answer #2

You can use the sleep() function in the time module. It can take a float argument for sub-second resolution.

from time import sleep
sleep(0.1) # Time in seconds

Analyzing Mobile Data Rate from TRAI with Pandas __del__: Questions

How to delete a file or folder in Python?

5 answers

How do I delete a file or folder in Python?


Answer #1

Path objects from the Python 3.4+ pathlib module also expose these instance methods:

We hope this article has helped you to resolve the problem. Apart from Analyzing Mobile Data Rate from TRAI with Pandas, check other __del__-related topics.

Want to excel in Python? See our review of the best Python online courses 2022. If you are interested in Data Science, check also how to learn programming in R.

By the way, this material is also available in other languages:

Manuel Danburry

Warsaw | 2022-12-08

Simply put and clear. Thank you for sharing. Analyzing Mobile Data Rate from TRAI with Pandas and other issues with find was always my weak point 😁. Will use it in my bachelor thesis

Cornwall Porretti

Paris | 2022-12-08

Maybe there are another answers? What Analyzing Mobile Data Rate from TRAI with Pandas exactly means?. Will get back tomorrow with feedback

Xu Chamberlet

Singapore | 2022-12-08

Maybe there are another answers? What Analyzing Mobile Data Rate from TRAI with Pandas exactly means?. Checked yesterday, it works!


Learn programming in R: courses


Best Python online courses for 2022


Best laptop for Fortnite


Best laptop for Excel


Best laptop for Solidworks


Best laptop for Roblox


Best computer for crypto mining


Best laptop for Sims 4


Latest questions


Common xlabel/ylabel for matplotlib subplots

12 answers


How to specify multiple return types using type-hints

12 answers


Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers


Flake8: Ignore specific warning for entire file

12 answers


glob exclude pattern

12 answers


How to avoid HTTP error 429 (Too Many Requests) python

12 answers


Python CSV error: line contains NULL byte

12 answers


csv.Error: iterator should return strings, not bytes

12 answers


Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python


How to specify multiple return types using type-hints


Printing words vertically in Python


Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries


Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically