A Cognitive study of Lexicons in Natural Language Processing.

What are Lexicons?

4 min readMar 30, 2019

A word in any language is made of a root or stem word and an affix. These affixes are usually governed by some rules called orthographic rules. These orthographic rules define the spelling rules for a word composition in Morphological Parsingphase. A lexicon is a list of such stem words and affixes and is a vital requirement to construct a Morphological Parser. Morphological parsing involves building up or breaking down a structured representation of component morphemes to form a meaningful word or a stem word. It is a necessary phase in spell checking, search term disambiguation in Web Search engines, part of speech tagging, machine translation.

A simple lexicon would usually just consist of a list of every possible word and stem + affix combination in the language. But this is an inconvenient approach in real-time applications as search and retrieval of a specific word would become a challenge owing to the unstructured format of the lexicon. If a proper structure is provided to the lexicon consisting of the stem and affixes then building a word from this lexicon becomes a bit simple. So, what kind of structure are we talking about here? The most common structure used in morphotactics modeling is the Finite-State Automaton.

Let us look at a simple finite-state model for English nominal inflection:

As stated in this FSM the regular noun is our stem word and is concatenated with a plural suffix –s,

eg. regular_noun(bat) + plural_suffix(s) = bats

Now this FSM will fail at some exceptions like : foot => feet, mouse => mice, company => companies, etc. This is where orthographic rules come in action. It defines these specific spelling rules for particular a stem which is supposed to be the exception. According to this, the FSM can be improved.

Cognitive Computing:

“It addresses complex situations that are characterized by ambiguity and uncertainty; in other words it handles human kinds of problems.” – Cognitive Computing Consortium

“Cognitive computing refers to systems that learn at scale, reason with purpose and interact with humans naturally. Rather than being explicitly programmed, they learn and reason from their interactions with us and from their experiences with their environment.” – Dr. John E. Kelly III; Computing, cognition and the future of knowing, IBM.

If cognitive computing is the simulation of the human thought process in a computerized model, then this solves most of our ambiguity issues faced in Natural Language Processing. Let us first try to reason how a human mind tries to resolve ambiguity in Morphological Parsing.

How does our human mind construct its Mental Lexicons?

Let us say I give you a word — ‘cat’. The human brain immediately recognizes that given word is a noun relating to a cute little animal with fur. It also is able to recall its pronunciation. But sometimes it is unable to recognize the given word and recall all the information relating to it, say for example if you see the word ‘wug’, your mind might be able to figure out its pronunciation but it would fail to label a part of speech to it or assign a meaning to it. But if I tell you that it is a Noun and is a small creature, you can use it in a sentence and you would know its Part of Speech, eg. “I saw a wug today.”

Similarly, a word like ‘cluvious’ even if you don’t know its meaning, you may be able to infer some information about it because most words in English that have this form are Adjectives (ambitious, curious, anxious, envious…). Which might help you predict their meaning when they occur in sentences, example “You look cluvious today”. From the example sentence, one can easily interpret that ‘cluvious’ informs about the physical appearance of an entity.

You can even reason about words that you haven’t seen before, like ‘traftful’ and ‘traftless’ and figure out that they are most likely opposites. This is because the given pair of words resembles with many pairs of words in English that have this particular structure and an antonym relationship.

With the observations stated as above, one can build a Morphological Parser with higher efficiency. You can also read my other post on how to set up Natural Language Processing environment in Python.

A Cognitive study of Lexicons in Natural Language Processing.

What are Lexicons?

Cognitive Computing:

How does our human mind construct its Mental Lexicons?

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Shirish Kadam

No responses yet

More from Shirish Kadam

Dependency Parsing in NLP

Syntactic Parsing or Dependency Parsing is the task of recognizing a sentence and assigning a syntactic structure to it. The most widely…

NLP: Question Classification using Support Vector Machines [spacy][scikit-learn][pandas]

The past couple of months I have been working on a Question Answering System and in my upcoming blog posts, I would like to share some…

‘Pattern Recognition’ a tool for your Go To Market strategy planning | PM Notes

Tom Morkes, discusses in detail with a use case, how he identifies the pain points or the needs of his target audience and based on those…

How to create & document User Personas, a framework | PM Notes

Exploring user persona framework to understand user needs, goals, pain-points, routines.

Recommended from Medium

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Sentiment Analysis of Online Reviews with Different Lexicons using R

This is the third article in a series that explores the topic of sentiment analysis using R. Sentiment analysis is a powerful technique…

Lists

Predictive Modeling w/ Python

Practical Guides to Machine Learning

Coding & Development

Natural Language Processing

Laziness Does Not Exist

Psychological research is clear: when people procrastinate, there's usually a good reason

Testing 18 RAG Techniques to Find the Best

crag, HyDE, fusion and more!

LLM Architectures Explained: NLP Fundamentals (Part 1)

Deep Dive into the architecture & building of real-world applications leveraging NLP Models starting from RNN to the Transformers.

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.