Syntactic Parsing or Dependency Parsing is the task of recognizing a sentence and assigning a syntactic structure to it. The most widely used syntactic structure is the parse tree which can be generated using some parsing algorithms. These parse trees are useful in various applications like grammar checking or more importantly it plays a critical role in the semantic analysis stage. For example to answer the question “Who is the point guard for the LA Laker in the next game ?” we need to figure out its subject, objects, attributes to help us figure out that the user wants the point guard of the LA Lakers specifically for the next game.
Now the task of Syntactic parsing is quite complex due to the fact that a given sentence can have multiple parse trees which we call as ambiguities. Consider a sentence “Book that flight.” which can form multiple parse trees based on its ambiguous part of speech tags unless these ambiguities are resolved. Choosing a correct parse from the multiple possible parses is called as syntactic disambiguation. Parsing algorithms like the Cocke-Kasami-Younger (CKY), Earley algorithm or the Chart parsing algorithms uses a dynamic programming approach to deal with the ambiguity problems.
In this post, we will actually try to implement a few Syntactic parsers from different libraries:
spaCy dependency parser provides token properties to navigate the generated dependency parse tree. Using the dep attribute gives the syntactic dependency relationship between the head token and its child token. The syntactic dependency scheme is used from the ClearNLP. The generated parse tree follows all the properties of a tree and each child token has only one head token although a head token can have multiple children. We can obtain the head token with the
token.head property and its children by the
token.children property. A subtree of a token can also be extracted using the
token.subtree property. Similarly, ancestors for a token can be obtained with
token.ancestors. To obtain the rightmost and leftmost token of a token’s syntactic descendants the
token.left_edge can be used. It is also worth mentioning that to extract the neighboring token we can use
token.nbor. spaCy doesn’t provide an inbuilt tree representation although you can use the NLTK’s tree representation. Here’s a code snippet for it:
return "_".join([tok.orth_, tok.tag_, tok.dep_])
if node.n_lefts + node.n_rights > 0:
return Tree(tok_format(node), [to_nltk_tree(child) for child in node.children])
command = "Submit debug logs to project lead today at 9:00 AM"
en_doc = en_nlp(u'' + command)
[to_nltk_tree(sent.root).pretty_print() for sent in en_doc.sents]
Here’s the output format (
Token_POS Tags_Dependency Tag):-
Let’s try extracting the headword from a question to understand how dependency works. A headword in a question can be extracted using various dependency relationships. But for now, we will try to extract the Nominal Subject
nsubj from the question as the headword. Here’s how you can get a subject from the sentence.
head_word = "null"
question = "What films featured the character Popeye Doyle ?"
en_doc = en_nlp(u'' + question)
for sent in en_doc.sents:
for token in sent:
if token.dep == nsubj and (token.pos == NOUN or token.pos == PROPN):
head_word = token.text
elif token.dep == attr and (token.pos == NOUN or token.pos == PROPN):
head_word = token.text
Here we get the output with headword as “films” which is pretty close and you can improve its accuracy by detecting more dependency relationships and headword rules:
What films featured the character Popeye Doyle ? (films)
spaCy also has a dependency visualizer displaCy here is the demo with our input question:
To install spaCy refer this Setting up Natural Language Processing Environment with Python
Further Reading :
- How to get the dependency tree with spaCy? [Stack Overflow discussion]
- Using the dependency parse [spaCy Documentation]
- Parsing English in 500 Lines of Python [Parsing a simple tutorial]
- displaCy: dependency parse tree visualization with CSS [Making the visualizer]
- Syntactic Dependency Parsing usage [A Reddit thread]
- Syntactic Dependency Parsing Annotations [pdf on CLEAR NLP style]
Working on a similar problem statement? Want to discuss it? Get in touch with me firstname.lastname@example.org
As originally published on December 23rd, 2016; shirishkadam.com