Analyzing text in Python3

Python Methods and Functions

In this task we are working with files. Files are everywhere in this universe. In a computer system, files are an integral part. The operating system consists of many files.

Python has two types of files: text files and binaries. 

Text Analysis



 



We are discussing text files here

Here we will focus some important features on files.

  • Word count
  • Character count
  • Average word length 
  • Number of stop words
  • Number of special characters
  • Number of numeric
  • Number of uppercase words
  • Number of words < / li>
  • Number of characters
  • Average word length
  • Number of stop words
 filename = "C: /Users/TP/Desktop/css3.txt" try: with open (filename) as file_object: contents = file_object.read () except FileNotFoundError: message = "sorry" + filename print (message ) else: words = contents.split () number_words = len (words) print ("Total words of" + filename, "is", str (number_words)) 
  • Number of numeric
 Total words of C: /Users/TP/Desktop/css3.txt is 3574 

We have a test file & # 171; css3.txt & # 187;, we are working on this file




Word count

 filename = "C: /Users/TP/Desktop/css3.txt" try: with open (filename ) as file_object: contents = file_object.read () except FileNotFoundError: message = "sorry" + filename print (message) else: words = 0 characters = 0 wordslist = contents.split () words + = len (wordslist) characters + = sum (len (word) for word in wordslist) #print (lineno) print ("TOTAL CHARACTERS IN A TEXT FILE =", characters) 
 TOTAL CHARACTERS IN A TEXT FILE = 17783 
Number of characters
 filename = "C: / Users / TP / Desktop / css3.txt "try: with open (filename) as file_object: contents = file_object.read () except FileNotFoundError: message =" sorry "+ filename print (message) else: words = 0 wordslist = contents.split () words = len (wordslist) average = sum (len (word) for word in wordslist) / words print ("Average =", average) 
 Average = 4.97 



Average word length

 from nltk.corpus import stopwords from nltk.tokenize import word_tokenize my_example_sent = "This is a sample sentence" mystop_words = set (stopwords.words ('english')) my_word_tokens = wo rd_tokenize (my_example_sent) my_filtered_sentence = [ for w in my_word_tokens if not w in mystop_words] my_filtered_sentence = [] for w in my_word_tokens: if w not in mystop_words: my_filtered_sentence.append (w) print (my_word_tokens) print (my_filtered_sentence) "  data-enlighter-language = "python"> import collections as ct filename = "C: /Users/TP/Desktop/css3.txt" try: with open (filename) as file_object: contents = file_object.read () except FileNotFoundError: message = "sorry" + file name print (message) else: words = contents.split () number_words = len (words) special_chars = "#" new = sum (v for k, v in ct.Counter (words) .items () if k in special_chars) print ("Total Special Characters", new) 



Number of stop words

 Total Special Characters 0 



Number of special characters

 filename = "C: / Users / TP /Desktop/css3.txt "try: with open (filename) as file_ob ject: contents = file_object.read () except FileNotFoundError: message =" sorry "+ filename print (message) else: words = sum (map (str.isdigit , contents.split ())) print ("TOTAL NUMERIC IN A TEXT FILE =", words) 
 TOTAL NUMERIC IN A TEXT FILE = 2 



Number of uppercase words

 filename = "C: /Users/TP/Desktop/css3.txt" try: with open (filename) as file_object: contents = file_object .read () except FileNotFoundError: message = "sorry" + filename print (message) else: words = sum (map (str.isupper, contents.split ())) print ("TOTAL UPPERCASE WORDS IN A TEXT FILE =", words) 
 TOTAL UPPERCAS E WORDS IN A TEXT FILE = 121 







Get Solution for free from DataCamp guru