NLP | Storing frequency allocation in Redis

The nltk.probability.FreqDist class is used by many NLTK classes to store and manage frequency distributions. This is pretty useful, but it`s all in memory and doesn`t provide a way to store the data. One FreqDist is also not available for multiple processes. All of this can be changed by creating a FreqDist on top of Redis. 
What is Redis?

  • Redis — it is a data structure server that is one of the most popular NoSQL databases .
  • Among other things, it provides a web-accessible database for storing dictionaries (also called hash maps).
  • Creating a FreqDist interface for the Redis hashmap will allow us to create a persistent FreqDist that will be available to multiple local and remote processes at the same time.

Installation:

  • Install both Redis and Redis-Py. The Redis website is located at http://redis.io/ and includes many documentation resources.
  • To use a hash -maps, install the latest version, which at the time of this writing is 2.8.9.
  • The Python Redis driver, redis-py, can be installed with pip install redis or easy_install redis . The latest version at the moment —  2.9.1 .
  • The Redis-Py home page is at http: // github .com / andymccurdy / redis-py / .
  • Once both are installed and the redis server process is running, you are ready to go. Let`s assume the redis server is running on localhost on port 6379 (default host and port).

How does it work?

  • The FreqDist class extends the standard collection.Counter library, which makes FreqDist a small wrapper with a few additional methods such as N ().
  • Method N () returns the number of fetch results, which is the sum of all values ​​in the
    frequency distribution.
  • API-compatible class is built on top of Redis by extending RedisHashMap and the subsequent implementation of the N () method.
  • RedisHashFreqDist (defined in redisprob.py) sums all the values ​​in the hashmap for the N () method

Code: explaining how it works

from rediscollections   import RedisHashMap

 

class RedisHashFreqDist (RedisHashMap):

def N ( self ):

return int ( sum ( self . values ​​()))

 

def __ missing __ ( self , key):

return 0

  

  def __ getitem __ ( self , key):

return int (RedisHashMap .__ getitem __ ( self , key) or 0 )

  

def values ​​( self ):

return [ int (v) for   v in RedisHashMap.values ​​( self )]

 

def items ( self ):

return [(k, int (v)) for (k, v) in RedisHashMap.items ( self )]

This class can be used in the same way as FreqDist. To create it, pass the Redis connection and our hashmap name. The name must be a unique reference to that particular FreqDist so that it does not conflict with other keys in Redis.

Code :

from redis import Redis

from redisprob import RedisHashFreqDist

  

r = Redis ()

rhfd = RedisHashFreqDist (r, ` test` )

print ( len (rhfd))

  

rhfd [ `foo` ] + = 1

print (rhfd [ `foo` ])

  
rhfd.items ()

print ( len (rhfd))

Output:

 0 1 1 

Most of the work is done in the RedisHashMap class, which extends collection.MutableMapping, and then overrides any methods that require Redis-specific commands.  An outline of each method that uses a specific Redis command:

  • __len __ (): here the hlen command is used to get the cardinality of the hash map
  • __contains __ (): uses the hexists command to check if an item is in the hashmap
  • __getitem __ (): here the hget command is used to get the value from the hashmap
  • __setitem __ (): the hset command is used to set the value in the hashmap
  • __delitem __ (): use the hdel command to remove a value from the hashmap
  • keys (): use the hkeys command to get all the keys in the hashmap.
  • values ​​(): this uses the hvals command to get all the values ​​in the hashmap
  • items (): uses the hgetall command to get a dictionary containing all the keys and values ​​in the hashmap.
  • clear (): This command uses the delete command to remove the entire hashmap from Redis.