The nltk.probability.FreqDist class is used by many NLTK classes to store and manage frequency distributions. This is pretty useful, but it’s all in memory and doesn’t provide a way to store the data. One FreqDist is also not available for multiple processes. All of this can be changed by creating a FreqDist on top of Redis.
What is Redis?
- Redis — it is a data structure server that is one of the most popular NoSQL databases .
- Among other things, it provides a web-accessible database for storing dictionaries (also called hash maps).
- Creating a FreqDist interface for the Redis hashmap will allow us to create a persistent FreqDist that will be available to multiple local and remote processes at the same time.
- Install both Redis and Redis-Py. The Redis website is located at http://redis.io/ and includes many documentation resources.
- To use a hash -maps, install the latest version, which at the time of this writing is 2.8.9.
- The Python Redis driver, redis-py, can be installed with pip install redis or easy_install redis . The latest version at the moment — 2.9.1 .
- The Redis-Py home page is at http:// github .com / andymccurdy / redis-py / .
- Once both are installed and the redis server process is running, you are ready to go. Let’s assume the redis server is running on localhost on port 6379 (default host and port).
How does it work?
- The FreqDist class extends the standard collection.Counter library, which makes FreqDist a small wrapper with a few additional methods such as N ().
- Method N () returns the number of fetch results, which is the sum of all values in the
- API-compatible class is built on top of Redis by extending RedisHashMap and the subsequent implementation of the N () method.
- RedisHashFreqDist (defined in redisprob.py) sums all the values in the hashmap for the N () method
Code: explaining how it works
This class can be used in the same way as FreqDist. To create it, pass the Redis connection and our hashmap name. The name must be a unique reference to that particular FreqDist so that it does not conflict with other keys in Redis.
0 1 1
Most of the work is done in the RedisHashMap class, which extends collection.MutableMapping, and then overrides any methods that require Redis-specific commands. An outline of each method that uses a specific Redis command:
- __len __ (): here the hlen command is used to get the cardinality of the hash map
- __contains __ (): uses the hexists command to check if an item is in the hashmap
- __getitem __ (): here the hget command is used to get the value from the hashmap
- __setitem __ (): the hset command is used to set the value in the hashmap
- __delitem __ (): use the hdel command to remove a value from the hashmap
- keys (): use the hkeys command to get all the keys in the hashmap.
- values (): this uses the hvals command to get all the values in the hashmap
- items (): uses the hgetall command to get a dictionary containing all the keys and values in the hashmap.
clear (): This command uses the delete command to remove the entire hashmap from Redis.