Python Readability Index (NLP)

Michael Zippo 18.07.2021

This article illustrates the various traditional readability formulas available to estimate readability score. Natural language processing sometimes requires analyzing words and sentences to determine the complexity of the text. Readability metrics — these are, as a rule, the grading levels on specific scales that rate the text in relation to the complexity of that particular text. It helps the author to improve the text to make it understandable for a wider audience, which makes the content attractive.

Various methods available for determining the Readabilty / Formaulae score: —

1) Dale - Challa formula
2) Gunning fog formula
3) Graph readability fry
4) McLaughlin SMOG formula
5) FORECAST formula
6) Newspaper readability and readability
7) Flash Points
Read about more accessible readability formulas here .

The implementation of the readability formulas is shown below.
Dale Chall’s formula

To apply the formula:

Select multiple 100 word swatches throughout the text.
Calculate the average length of a sentence in words (divide the number of words by the number of sentences).
Calculate the percentage of words NOT in Dale-Chall’s 3000 simple word list.
Calculate this equation

 Raw score = 0.1579 * (PDW) + 0.0496 * (ASL) + 3.6365 Here, PDW = Percentage of difficult words not on the Dale – Chall word list. ASL = Average sentence length

Gunning Mist Formula

 Grade level = 0.4 * ((average sentence length) + (percentage of Hard Words)) Here, Hard Words = words with more than two syllables.

Smog Formula

 SMOG grading = 3 + √ (polysyllable count). Here, polysyllable count = number of words of more than two syllables in a sample of 30 sentences.

Flash Formula

 Reading Ease score = 206.835 - (1.015 × ASL) - (84.6 × ASW) Here, ASL = average sentence length (number of words divided by number of sentences) ASW = average word length in syllables (number of syllables divided by number of words)

Benefits of formula readability:

1 Readability formulas measure the level of readership must be in order to read a given text. Thus, the author of the text receives much-needed information to reach his target audience.

2. Know in advance if the target audience can understand your content.

3 . Easy to use.

4. Readable text attracts more audience.

Disadvantages of readability formulas:

1. Due to With many readability formulas, there is an increasing likelihood of getting wide variations in the results of the same text.

2. Applies math to literature, which is not always a good idea.

3. Can’t measure complexity words or phrases to determine exactly where to fix them.

import spacy

from textstat.textstat import textstatistics, easy_wo rd_set, legacy_round

# Splits text into sentences using
Segmentation of the Spacy proposal that can
# can be found at https://spacy.io/usage/spacy-101

def break_sentences (text):

nlp = spacy.load ( ’en’ )

doc = nlp (text)

return doc.sents

# Returns the number of words in the text

def word_count (text) :

sentences = break_sentences (text)

words = 0

for sentence in sentences:

words + = len ([token for token in sentence])

return words

# Returns the number of sentences in the text

def sentence_count (text):

sentences = break_sentences (text)

return len (sentences)

# Returns the average sentence length

def avg_sentence_length (text):

words = word_count (text)

   sentences   =   sentence_count (text) 
   average_sentence_length   =   float   (words   /   sentences) 
   return   average_sentence_length 
   
  # Textstat is a Python package for calculating statistics  
  # text to determine readability,  
  # complexity and level of configuration of a particular corpus.  
  # The package can be found at  https://pypi .python.org / pypi / textstat  
   def   syllables_count (word):  
   return   textstatistics (). syllable_count ( word) 
   
  # Returns the average number of syllables per  
  # word in the text  
   def   avg_syllables_per_word (text) : 
   syllable   =   syllables_count (text) 
   words   =   word_count (text) 
   ASPW   =   flo at   (syllable)   /   float   (words) 
   return   legacy_round (ASPW,   1  ) 
    
  # Return the total number of compound words in the text  
   def   difficult_words (text): 
   
   # Find all words in the text  
     words   =   [] 
     sentences   =   break_sentences (text) 
   for   sentence   in   sentences: 
   words   +   =   [  str   (token)   for   token   in   sentence] 
   
   # compound words are those that have syllables" = 2  
   # easy_word_set is provided by Textstat as  
   # list of common words  
   diff_words_set   =   set   ()  
   
   for   word   in   words:  
   syllable_count   =   syllables_count (word) 
   if   word   not   in   easy_ word_set   and   syllable_count"   =   2  : 
   diff_words_set.add (word) 
   
   return   len   (diff_words_set) 
   
  # A word is polysyllabic if it has more than 3 syllables  
  # this function returns the count of all such words  
  # present in the text  
   def   poly_syllable_count (text): 
   count   =   0  
   words   =   [] 
   sentences   =   break_sentences (text) 
   for   sentence   in   sentences:  
   words   +   =   [token   for   token   in   sentence] 
   
   
   for   wo rd   in   words: 
   syllable_count   =   syllables_count (word)  
   if   syllable_count"   =   3  : 
     count   +   =   1  
   return   count  
   
   
   def   flesch_reading_ease (text): 
   "" " 
   Implements Flesch Formula:  
   Ease of reading = 206.835 - (1.015 × ASL) - (84.6 × ASW)  
   Here,  
   ASL = average sentence length (number of words  
     divided by the number of sentences)  
   ASW = average word length in syllables (number of syllables  
   divided by the number of words)  
   "" " 
     FRE   =   206.835   -   float   (  1.015   *   avg_sentence_length (text))   -   
   float   (  84.6   *   avg_syllables_per_word (text)) 
   return   legacy_round (FRE,   2  ) 
   
   
   def   gunning_fog (text ): 
   per_diff_words   =   (difficult_words (text)   /   word_count (text)   *   100  )   +   5  
   grade   =   0.4   *   (avg_sentence_length (text)   +   per_diff_words) 
   return   grade 
   
   
   def   smog_index (text):  
   "" " 
     Implements SMOG Formula / Grading  
   SMOG grade = 3 +?  
   Here,  
   number of multi-word words = number of words more  
     than two syllables in a sample of 30 sentences.  
   "" " 
   
   if   sentence_count (text)"   =   3  : 
   poly_syllab   =   poly_syllable_count (text)

SMOG = ( 1.043 * ( 30 * (poly_syllab / sentence_count (text))) * * 0.5 )

+ 3.1291

return legacy_round (SMOG, 1 )

else :

return 0

def dale_chall_readability_score (text):

“” ”

Implements the Dale Challe Formula:

Raw invoice = 0.1579 * (PDW) + 0.0496 * (ASL) + 3.6365

Here,

PDW = percentage of difficult words.

ASL = average sentence length

"" "

words = word_count (text)

# Number of words that are not called difficult

count = word_count - difficult_words (text)

if words" 0 :

# Percentage of words not in the list of difficult words

per = float (count) / float (words) * 100

# diff_words stores the percentage of difficult words

diff_words = 100 - per

raw_score = ( 0.1579 * diff_words) +

( 0.0496 * avg_sentence_length (text))

# If the percentage of difficult words is more than 5%, then;

# Adjusted grade = Raw grade + 3.6365,

# otherwise adjusted grade = raw estimate

if diff_words" 5 :

raw_score + = 3.6365

return legacy_round (score, 2 )

Source:
https://en.wikipedia.org/wiki/Readability

Shop

Best laptop for Excel

Best laptop for Solidworks

$399+

Best laptop for Roblox

$399+

Best laptop for development

$499+

Best laptop for Cricut Maker

$299+

Best laptop for hacking

$890

Best laptop for Machine Learning

$699+

Raspberry Pi robot kit

$150

Python Readability Index (NLP)

Shop

News

Wiki