NLP | Distributed Marking with Execnet — part 2

The remote_exec () method of the gateway takes one argument, which can be one of the following three types:

  • Line of code for remote execution
  • The name of the pure function to be serialized and executed remotely
  • The name of the pure module to be run remotely

Code: Using the remote_tag.py module with three options

import pickle

 

if __ name__ = = `__channelexec__` :

tagger = pickle.loads (channe l.receive ())

for sentence in channel:

channel. send (tagger.tag (sentence))

What is a clean module?

  • A clean module — it is a standalone module: it can only access the Python modules that are available where it runs, and does not have access to any variables or states that exist where the gateway was originally created.
  • Exactly also a pure function — it is a stand-alone function with no external dependencies.
  • To determine that a module is being executed by execnet, check the __name__ variable. If it is __channelexec__ , then it is used to create a remote channel.
  • This is similar to execution if __name__ == & # 39; __ main __ & # 39;, to check if the module is executing on
    command line.
  • The first thing to do is call channel.receive () to get a serialized tagger that is loaded with pickle .loads ().
  • Notice that the channel is not being imported anywhere because it is included in the module`s global namespace. Any module that is executed remotely by execnet has access to the channel variable to communicate with the channel creator.
  • Once the tag is present, tag () each clause is tokenized iteratively from the channel.
  • This is allows the user to mark as many sentences as the sender wants to send, since the iteration will not stop until the channel is closed.
  • This creates a computational node for part-of-speech tagging that devotes 100% of its resources to tag any sentences that he receives. As long as the channel remains open, the node is available for processing.

Execnet can do much more, such as open multiple channels to increase parallel processing, and open gateways to remote hosts over SSH to perform distributed processing .

Create multiple channels
Create multiple channels, one per gateway, to make processing more parallel. Each gateway creates a new subprocess (or a remote interpreter if using an SSH gateway) and one channel per gateway is used for communication. Once the two channels have been created, they can be combined using the MultiChannel class, which allows the user to iterate through the channels and create a receive queue to receive messages from each channel. 
After each channel is created and a tagger is sent, the channels are looped through to send an even number of suggestions to each channel for tagging. Then all responses are collected from the queue. The queue.get () call will return 2 tuples (channel, message), if you need to find out from which channel the message came from. After all the marked offers are collected, you can easily exit the gateways.

Code:

import itertools

 

gw1 = execnet.makegateway ()

gw2 = execnet.makegateway ()

 

ch1 = gw1.remote_exec (remote_tag)

ch1.send (pickled_tagger)

ch2 = gw2.remote_exec (remote_tag)

ch2. send (pickled_tagger)

  

mch = execnet.MultiChannel ([ch1, ch2])

queue = mch.make_receive_queue ()

channels = itertools.cycle (mch)

 

for sentence in treebank.sents () [: 4 ]:

channel = next (channels)

channel.send (sentence)

tagged_sentences = []

 

for i in range ( 4 ):

channel, tagged_sentence = queue.get ()

tagged_sentences.append (tagged_sentence)

 

print ( "Length:" , len (tagged_sentences))

 
gw1.exit ()
gw2 .exit ()

Exit:

 Length: 4 

In the example code, only four sentences are sent, but in real life, thousands need to be sent. One computer can tag four sentences very quickly, but when there are thousands or hundreds of thousands of sentences to tag, sending offers to multiple computers can be much faster than waiting for one computer to do all of it.