Change language

NLP | IOB tags

| | |

What are chunks?
Chunks are made up of words, and word types are defined using part-of-speech tags. It is even possible to define a pattern or words that cannot be part of Chuck, and such words are known as slits.

What are IOB tags?
This is a chunk format. These tags are similar to part-of-speech tags, but provide can denote the inner, outer, and beginning of a passage. Not just a nominal phrase is allowed here, but several different types of fragments.

Example: this is a fragment from the conll2000 corpus . Each word has a part of speech tag followed by an IOB tag on a separate line:

 Mr. NNP B-NP Meador NNP I-NP had VBD B-VP been VBN I-VP executive JJ B-NP vice NN I-NP president NN I-NP of IN B-PP Balcor NNP B-NP 

What does this mean?
B-NP: the beginning of a noun phrase
I-NP: describes that the word is inside the current nominal phrase. 
O: end of the sentence. 
B-VP and I-VP: beginning and inside a verb phrase.

Code # 1: How it works — break words into parts using IOB tags.

# Loading libraries

from nltk.corpus.reader import ConllChunkCorpusReader

 
# Initialization

reader = ConllChunkCorpusReader (

’ .’ , r ’. *. iob’ , ( ’NP’ , ’ VP’ , ’PP’ ))

  
reader.chunked_words ()

 
reader.iob_words ()

Output:

 [Tree (’NP’, [(’ Mr.’, ’NNP’), (’ Meador’, ’NNP’)]), Tree ( ’VP’, [(’ had’, ’VBD’), (’ been’, ’VBN’)]), ...] [(’ Mr.’, ’NNP’,’ B-NP’), ( ’Meador’,’ NNP’, ’I-NP’), ...] 

Code # 2: How it works — fragmentation of a sentence with IOB tags.

# Loading libraries

from nltk.corpus.reader import ConllChunkCorpusReader

 
# Initialization

reader = ConllChunkCorpusReader (

’.’ , r ’. *. iob’ , ( ’NP’ , ’VP’ , ’ PP’ ))

  
reader.chunked_sents ()

 
reader.iob_sents ()

Output:

 [Tree (’S’, [Tree (’ NP’, [(’Mr.’,’ NNP’), (’Meador’,’ NNP’)] ), Tree (’VP’, [(’ had’, ’VBD’), (’ been’, ’VBN’)]), Tree (’ NP’, [(’executive’,’ JJ’), (’ vice’, ’NN’), (’ president’, ’NN’)]), Tree (’ PP’, [(’of’,’ IN’)]), Tree (’NP’, [(’ Balcor’ , ’NNP’)]), (’ .’, ’.’)])] [[(’ Mr.’, ’NNP’,’ B-NP’), (’Meador’,’ NNP’, ’I -NP’), (’had’,’ VBD’, ’B-VP’), (’ been’, ’VBN’,’ I-VP’), (’executive’,’ JJ’, ’B-NP ’), (’ vice’, ’NN’,’ I-NP’), (’president’,’ NN’, ’I-NP’), (’ of’, ’IN’,’ B-PP’) , (’Balcor’,’ NNP’, ’B-NP’), (’ .’, ’.’,’ O’)]] 

Let’s look at the code above:

  • The ConllChunkCorpusReader class is used to read the IOB corpus.
  • There is no paragraph separation, and each sentence is separated by a blank line, so the method s para_ * are not available.
  • A tuple or list indicating the types of chunks in the file, such as (& # 39; NP & # 39 ;, & # 39; VP & # 39;, & # 39; PP & # 39 ;), is the third argument to ConllChunkCorpusReader.
  • The iob_words () and iob_sents () methods return lists of three tuples (word, pos, iob)

Code # 3: Leaves of trees — those. tagged tokens

# Loading libraries

from nltk.corpus.reader import ConllChunkCorpusReader

 
# Initialization

reader = ConllChunkCorpusReader (

’.’ , r ’. *. iob’ , ( ’NP’ , ’ VP’ , ’PP’ ))

 

reader.chunked_words () [ 0 ]. leaves ()

 

reader.chunked_sents () [ 0 ]. leaves ()

 

reader.chunked_paras () [ 0 ] [ 0 ]. Leaves ()

Output:

 [(’Earlier’,’ JJR’), (’staff-reduction’,’ NN’), (’moves’ , ’NNS’)] [(’ Earlier’, ’JJR’), (’ staff-reduction’, ’NN’), (’ moves’, ’NNS’), (’ have’, ’VBP’), ( ’trimmed’,’ VBN’), (’about’,’ IN’), (’300’,’ CD’), (’jobs’,’ NNS’), (’,’, ’,’), ( ’the’,’ DT’), (’spokesman’,’ NN’), (’said’,’ VBD’), (’.’,’ .’)] [(’Earlier’,’ JJR’), (’staff-reduction’,’ NN’), (’moves’,’ NNS’), (’have’,’ VBP’), (’trimmed’,’ VBN’), (’about’,’ IN’ ), (’300’,’ CD’), (’jobs’,’ NNS’), (’,’, ’,’), (’the’,’ DT’), (’spokesman’,’ NN’ ), (’said’,’ VBD’), (’.’,’ .’)] 

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically