Change language

Attention Mechanism In a nutshell

Attention Mechanism In a nutshell

Hi, I’m Mohammad Namvarpour, and I’m going to give you an intuition about attention mechanism in deep learning Attention mechanism Is inspired by human visual processing system.

When you read a page in a book The majority of what is in your field of vision is actually disregarded.

And you pay more attention to the word that you are currently reading.

This allows your brain to focus on what matters most, while ignoring everything else.

In order to imitate the same effect in our deep learning models, we assign an attention weight to each of our inputs.

These weights represent the relative importance of each input element to the other input elements.

This way, we guide our model to pay greater attention to particular inputs that are more critical to performing the task at hand.

Attention mechanism in deep learning was first used by Bahdanau for machine translation.

The traditional method of machine translation was to use sequence to sequence models.

In these models, We pass the input sentence to a RNN that functions as an encoder.

RNNs, as you may know, have a hidden state in addition to their output, which I represented with the letters h for the encoder and s for the decoder, which is also an RNN.

These hidden states can contain information from all the previous words in our sentence.

Using this capability of hidden states, a context vector is constructed from the last Hidden state in encoder RNN, which actually includes the content of the source sentence.

This is then passed to Decoder, so that the Decoder can translate the words into the target language.

The challenge with this approach was that if the sentence was long, all of the information could not be compressed in the last hidden state.

and hence, Our translation would be incorrect and inaccurate if the input sentence was long and detailed.

The main idea of ​​attention, which Bahdanau also used in his paper, is that we give context vector access to the entire input sequence, instead of just the last hidden state.

In this way, even if the length of the sentence increases, the context vector can still contain the content of the sentence.

Now, We just need to assign an attention weight on each of those inputs so that the decoder can focus on the relevant positions in the input sequence.

But how can this be achieved? Well, as you may have noticed, in this Slide, we have translated only two words.

Lets see how the third word is translated using our new attention based model...

we should take the current decoder hidden state and every encoder hidden state, And feed them into a score function.

What does this function do? The idea behind score function is to measure the similarity between two vectors. using the score function allows our model to selectively concentrate on helpful parts of the input sequence and thereby learn the alignments between them.

There are many ways to calculate score.

Here I’ve outlined four options for doing so: dot, general, concat and a location-based function in which the alignment scores are computed from solely the target decoder state next step is to calculate the alignment vector.

As the formula in this slide shows, we should simply use softmax function to convert our score values into probabilities.

Now we have The attention weights we were searching for.

Given the alignment vector as weights, the context vector is computed as the weighted average over all the source hidden states.

Now we can pass the context vector into the decoder, So that our decoder can access the entire input sequence and also focus on the relevant positions in the input sequence So, to put it simply, attention model works like an accounting notebook For every query, which in our example, was the last hidden state of decoder The attention gives us a table which shows us how much attention we owe to each of the keys, which in our case, were the encoder hidden states as you can see here, Attention is employed in deep learning in a variety of way.

If youre interested in learning more about them, I recommend watching my video on the review paper "an attentive survey of attention models," which dives deeper into the topic.

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically