Change language

Python Readability Index (NLP)

| | |

This article illustrates the various traditional readability formulas available to estimate readability score. Natural language processing sometimes requires analyzing words and sentences to determine the complexity of the text. Readability metrics — these are, as a rule, the grading levels on specific scales that rate the text in relation to the complexity of that particular text. It helps the author to improve the text to make it understandable for a wider audience, which makes the content attractive.

Various methods available for determining the Readabilty / Formaulae score: —

1) Dale - Challa formula
2) Gunning fog formula
3) Graph readability fry
4) McLaughlin SMOG formula
5) FORECAST formula
6) Newspaper readability and readability
7) Flash Points
Read about more accessible readability formulas here .

The implementation of the readability formulas is shown below.
Dale Chall’s formula

To apply the formula:

Select multiple 100 word swatches throughout the text.
Calculate the average length of a sentence in words (divide the number of words by the number of sentences).
Calculate the percentage of words NOT in Dale-Chall’s 3000 simple word list.
Calculate this equation

 Raw score = 0.1579 * (PDW) + 0.0496 * (ASL) + 3.6365 Here, PDW = Percentage of difficult words not on the Dale – Chall word list. ASL = Average sentence length 

Gunning Mist Formula

 Grade level = 0.4 * ((average sentence length) + (percentage of Hard Words)) Here, Hard Words = words with more than two syllables. 

Smog Formula

 SMOG grading = 3 + √ (polysyllable count). Here, polysyllable count = number of words of more than two syllables in a sample of 30 sentences. 

Flash Formula

 Reading Ease score = 206.835 - (1.015 × ASL) - (84.6 × ASW) Here, ASL = average sentence length (number of words divided by number of sentences) ASW = average word length in syllables (number of syllables divided by number of words) 

Benefits of formula readability:

1 Readability formulas measure the level of readership must be in order to read a given text. Thus, the author of the text receives much-needed information to reach his target audience.

2. Know in advance if the target audience can understand your content.

3 . Easy to use.

4. Readable text attracts more audience.

Disadvantages of readability formulas:

1. Due to With many readability formulas, there is an increasing likelihood of getting wide variations in the results of the same text.

2. Applies math to literature, which is not always a good idea.

3. Can’t measure complexity words or phrases to determine exactly where to fix them.

import spacy

from textstat.textstat import textstatistics, easy_wo rd_set, legacy_round

 
# Splits text into sentences using
Segmentation of the Spacy proposal that can
# can be found at https://spacy.io/usage/spacy-101

def break_sentences (text):

nlp = spacy.load ( ’en’ )

  doc = nlp (text)

return doc.sents

 
# Returns the number of words in the text

def word_count (text) :

sentences = break_sentences (text)

words = 0

for sentence in sentences:

words + = len ([token for token in sentence])

  return words

 
# Returns the number of sentences in the text

def sentence_count (text):

sentences = break_sentences (text)

return len (sentences)

 
# Returns the average sentence length

def avg_sentence_length (text):

words = word_count (text)

sentences = sentence_count (text)

average_sentence_length = float (words / sentences)

return average_sentence_length

 
# Textstat is a Python package for calculating statistics
# text to determine readability,
# complexity and level of configuration of a particular corpus.
# The package can be found at https://pypi .python.org / pypi / textstat

def syllables_count (word):

return textstatistics (). syllable_count ( word)

 
# Returns the average number of syllables per
# word in the text

def avg_syllables_per_word (text) :

syllable = syllables_count (text)

words = word_count (text)

ASPW = flo at (syllable) / float (words)

return legacy_round (ASPW, 1 )

  
# Return the total number of compound words in the text

def difficult_words (text):

 

# Find all words in the text

  words = []

  sentences = break_sentences (text)

for sentence in sentences:

words + = [ str (token) for token in sentence]

 

# compound words are those that have syllables" = 2

# easy_word_set is provided by Textstat as

# list of common words

diff_words_set = set ()

 

for word in words:

syllable_count = syllables_count (word)

if word not in easy_ word_set and syllable_count" = 2 :

diff_words_set.add (word)

 

return len (diff_words_set)

 
# A word is polysyllabic if it has more than 3 syllables
# this function returns the count of all such words
# present in the text

def poly_syllable_count (text):

count = 0

words = []

sentences = break_sentences (text)

for sentence in sentences:

words + = [token for token in sentence]

 

 

for wo rd in words:

syllable_count = syllables_count (word)

if syllable_count" = 3 :

  count + = 1

return count

 

 

def flesch_reading_ease (text):

"" "

Implements Flesch Formula:

Ease of reading = 206.835 - (1.015 × ASL) - (84.6 × ASW)

Here,

ASL = average sentence length (number of words

  divided by the number of sentences)

ASW = average word length in syllables (number of syllables

divided by the number of words)

"" "

  FRE = 206.835 - float ( 1.015 * avg_sentence_length (text)) -

float ( 84.6 * avg_syllables_per_word (text))

return legacy_round (FRE, 2 )

 

 

def gunning_fog (text ):

per_diff_words = (difficult_words (text) / word_count (text) * 100 ) + 5

grade = 0.4 * (avg_sentence_length (text) + per_diff_words)

return grade

 

 

def smog_index (text):

"" "

  Implements SMOG Formula / Grading

SMOG grade = 3 +?

Here,

number of multi-word words = number of words more

  than two syllables in a sample of 30 sentences.

"" "

 

if sentence_count (text)" = 3 :

poly_syllab = poly_syllable_count (text)

  SMOG = ( 1.043 * ( 30 * (poly_syllab / sentence_count (text))) * * 0.5 )

+ 3.1291

return legacy_round (SMOG, 1 )

else :

return 0

  

 

def dale_chall_readability_score (text):

“” ”

  Implements the Dale Challe Formula:

Raw invoice = 0.1579 * (PDW) + 0.0496 * (ASL) + 3.6365

Here,

  PDW = percentage of difficult words.

ASL = average sentence length

"" "

  words = word_count (text)

  # Number of words that are not called difficult

  count = word_count - difficult_words (text)

if words" 0 :

 

# Percentage of words not in the list of difficult words

 

per = float (count) / float (words) * 100

 

# diff_words stores the percentage of difficult words

diff_words = 100 - per

  

raw_score = ( 0.1579 * diff_words) +

  ( 0.0496 * avg_sentence_length (text))

  

  # If the percentage of difficult words is more than 5%, then;

# Adjusted grade = Raw grade + 3.6365,

  # otherwise adjusted grade = raw estimate

  

if diff_words" 5

 

raw_score + = 3.6365

 

return legacy_round (score, 2 )


Source:

https://en.wikipedia.org/wiki/Readability

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically