In this article we would like to show why NLI (Natural Language Interface) is often so hard to do. To illustrate this idea I’ll use a semi-trivial example to show that even for this simple use case the natural language interface presents a formidable problem to solve.
For our example let’s imagine we want to build (yet another) weather bot. Our bot will answer weather-related questions for a given city and a date range. It will also support past, present and future (forecast) weather requests. Our goal is to support natural language interface to our bot as close to human cognition as possible, i.e. as if our users would be talking to a real human being — trying to achieve that elusive free-form natural language comprehension.
Let’s start with simple and obvious examples of the requests we need to support:
What’s the current weather in New York?
Show me San Jose forecast for the next 5 days
These requests seems rather trivial to encode using common intent-based matching. You basically have to detect three main entities:
Once you’ve built the model (in whatever tool you prefer) to detect all three types of entities you can relatively quickly build an action (i.e. an intent callback) that would return a weather information for a given city and given date range.
Most of the tutorials and examples stop right here. However, this is far from being even remotely equal to how real humans converse about weather...
Pretty obvious initial improvements one would need to make is to assume that city and date range elements are optional. Indeed, if city isn’t present the user is likely asking about her current location, and if date is not present she’s asking about the current date. These seem to be reasonable assumptions:
What’s my current weather?
What’s Chicago’s weather?
Any chance of snow this Friday?
However, these assumptions have to be processed in a special way by conversation management.
Another thing you’ll notice right away is that you need to support conversational context. Frequently, when people inquiry about weather they don’t just ask a single question but often have followups. For example:
What’s the current Moscow weather?
Hm, what about tomorrow?
Any chance of rain?
While in everyday life these seem rather trivial, the programmatic logic for supporting this type of conversation management is far from trivial. For example:
Depending on the framework you use this can be a significant project on its own.
Yet another problem you’ll discover pretty quickly as you let users play with your bot is that your current model does not distinguish between these two sentences:
What’s the local Moscow weather?
What’s the local weather?
In the first example user is clearly asking about current Moscow weather, while in the second she’s likely asking about her current location. But then it conflicts with the conversation support we discussed above because city element “Moscow” is optional and we can pick it up from conversation context which should make second example equal to first! We have a contradiction...
That’s where things get complicated and naive conversation management doesn’t cut it anymore. The one rule you can possibly come up with to bypass this dilemma is this: if there is a word “local” (or its semantic siblings) and there’s no city in the current sentence — then user is asking about the weather at her current location; otherwise — fall back to default conversation management.
As a side note your NLP toolkit should clearly disambiguate between New York (state) and New York (city), Moscow (Russia) and Moscow (USA, ID), etc. It should also support common slang and abbreviations like LA (for Los Angeles and not for State of Louisiana), Big Apple, NYC, SF, etc.
Another, more subtle, problem arises when we try to deal with date ranges. Look at these examples:
What’s my current forecast?
What’s the precipitation forecast for Sep 25 — Sep 30th?
What was the ice storm forecast last week?
All examples have word “forecast” meaning future by default. However, the second example also specifies an explicit data range. Yet third example has word “forecast” but is asking about past date range. The situation gets even more confusing when we account for conversational context.
You can probably come up with some basic set of rules:
Another complication is about weather request indicator we’ve mentioned at the very beginning.
Essentially, weather request is some form of a question about meteorological condition. In our example model for this type of bot we have almost 10,000 different ways to express that... Which makes it almost impossible to just train the model in supervised fashion. You need some formalized way to effectively encode this model that would allow for proper versioning, testing, future extension, etc.
Make sure that whatever the tool you select to build this bot you are not asked to list all these 10,000 utterances manually!
If you are somewhat confused by now — it’s absolutely fine. You have to be. The problem is that even for this trivialized example the free-form natural language interface is rather a non-trivial task. A lot of people jump head first into creating different NLI/NLU apps and chatbots just to realize that users hate the interaction experience because it doesn’t match the human cognition by a l-o-n-g mile. Technology is still developing in this space.