Change language

NLP | Storing frequency allocation in Redis

| |

The nltk.probability.FreqDist class is used by many NLTK classes to store and manage frequency distributions. This is pretty useful, but it’s all in memory and doesn’t provide a way to store the data. One FreqDist is also not available for multiple processes. All of this can be changed by creating a FreqDist on top of Redis. 
What is Redis?

  • Redis — it is a data structure server that is one of the most popular NoSQL databases .
  • Among other things, it provides a web-accessible database for storing dictionaries (also called hash maps).
  • Creating a FreqDist interface for the Redis hashmap will allow us to create a persistent FreqDist that will be available to multiple local and remote processes at the same time.

Installation:

  • Install both Redis and Redis-Py. The Redis website is located at http://redis.io/ and includes many documentation resources.
  • To use a hash -maps, install the latest version, which at the time of this writing is 2.8.9.
  • The Python Redis driver, redis-py, can be installed with pip install redis or easy_install redis . The latest version at the moment —  2.9.1 .
  • The Redis-Py home page is at http:// github .com / andymccurdy / redis-py / .
  • Once both are installed and the redis server process is running, you are ready to go. Let’s assume the redis server is running on localhost on port 6379 (default host and port).

How does it work?

  • The FreqDist class extends the standard collection.Counter library, which makes FreqDist a small wrapper with a few additional methods such as N ().
  • Method N () returns the number of fetch results, which is the sum of all values ​​in the
    frequency distribution.
  • API-compatible class is built on top of Redis by extending RedisHashMap and the subsequent implementation of the N () method.
  • RedisHashFreqDist (defined in redisprob.py) sums all the values ​​in the hashmap for the N () method

Code: explaining how it works

from rediscollections   import RedisHashMap

 

class RedisHashFreqDist (RedisHashMap):

def N ( self ):

return int ( sum ( self . values ​​()))

 

def __ missing __ ( self , key):

return 0

  

  def __ getitem __ ( self , key):

return int (RedisHashMap .__ getitem __ ( self , key) or 0 )

  

def values ​​( self ):

return [ int (v) for   v in RedisHashMap.values ​​( self )]

 

def items ( self ):

return [(k, int (v)) for (k, v) in RedisHashMap.items ( self )]

This class can be used in the same way as FreqDist. To create it, pass the Redis connection and our hashmap name. The name must be a unique reference to that particular FreqDist so that it does not conflict with other keys in Redis.

Code :

from redis import Redis

from redisprob import RedisHashFreqDist

  

r = Redis ()

rhfd = RedisHashFreqDist (r, ’ test’ )

print ( len (rhfd))

  

rhfd [ ’foo’ ] + = 1

print (rhfd [ ’foo’ ])

  
rhfd.items ()

print ( len (rhfd))

Output:

 0 1 1 

Most of the work is done in the RedisHashMap class, which extends collection.MutableMapping, and then overrides any methods that require Redis-specific commands.  An outline of each method that uses a specific Redis command:

  • __len __ (): here the hlen command is used to get the cardinality of the hash map
  • __contains __ (): uses the hexists command to check if an item is in the hashmap
  • __getitem __ (): here the hget command is used to get the value from the hashmap
  • __setitem __ (): the hset command is used to set the value in the hashmap
  • __delitem __ (): use the hdel command to remove a value from the hashmap
  • keys (): use the hkeys command to get all the keys in the hashmap.
  • values ​​(): this uses the hvals command to get all the values ​​in the hashmap
  • items (): uses the hgetall command to get a dictionary containing all the keys and values ​​in the hashmap.
  • clear (): This command uses the delete command to remove the entire hashmap from Redis.

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically