Below we’ll cover some of the key concepts that are important for NLPCraft. Note that many of them will be covered in more details later in this guide.
The concept of semantic modeling is at the core of NLPCraft capabilities. Semantic modelling (or semantic grammar) defines a formal way of how a natural language sentence can be understood and translated into actions. NLPCraft provides one of the most sophisticated tools and APIs for semantic modelling applications.
Introduced in mid-1970s the semantic grammar differs from the traditional approach that is based on the linguistic grammar which deals only with linguistic categories like nouns, verbs, etc. The easiest way to understand the difference between semantic grammar and a linguistic grammar is to look at the following illustration:
In Fig. 2 the lower and upper sentences are the same but they are parsed differently. Lower part is parsed using traditional linguistic analysis where each word is tagged with a PoS (Point-of-Speech) tag (e.g.
NN for nous,
JJ for adjective, and so on). The upper part, however, is parsed using semantic grammar and instead of strictly individual words being tagged, potentially multiple words form high-level semantic groups like
In a bit more formal way the semantic grammar is a type of grammar whose non-terminals are not generic structural or linguistic categories like nouns or verbs but rather semantic categories like
ENTITY. Unlike linguist grammar the semantic grammar allows to easily resolve the standard ambiguities prevailing in linguistic grammar, e.g.:
In Fig. 3. even though the linguistic signatures of both sentences are almost the same, the semantic meaning is completely different. The resolution of such ambiguity in a linguistic approach will require very sophisticated context analysis (if and when it is available) and in many cases simply impossible to do deterministically. Semantic grammar on the other hand allows to cleanly resolve this ambiguity in a simpler and fully deterministic way.
An astute reader can notice the similarities between classic Named Entity Resolution (NER) and model elements. Indeed, both try to identify semantic categories rather than generic structural or linguistic categories.
However, there is a number of important distinctions:
It's important to note that some NLP toolkits provide a variation of NER sometimes called a Normalized Named Entity Resolution (NNER) that is based on rule-based logic or semantic modeling. For example, DATE or CURRENCY named entities can be normalized by Stanford CoreNLP toolkit (which is one of the base NLP engines that NLPCraft can be configured with).
Assuming a simplified NER terminology, one could argue that Semantic Modeling allows you to build your own reliable named entity resolver specific to your own domain area.
NLPCraft provides automatic conversation context management right out of the box. Conversation management is based on the idea of short-term-memory (STM). STM is automatically maintained by NLPCraft per each user and data model. Essentially, NLPCraft "remembers" the context of the conversation and can supply the currently missing elements from its memory (i.e. from STM). STM implementation is also conveniently integrated with intent-based intent solver utility class.
Maintaining conversation state is necessary for effective context resolution, so that users could ask, for example, the following sequence of questions using example weather model:
User gets the current London’s weather.
STM is empty at this moment so NLPCraft expects to get all necessary information from the user sentence. Meaningful parts of the sentence get stored in STM.
User gets the current Berlin’s weather.
The only useful data in the user sentence is name of the city Berlin. But since NLPCraft now has data from the previous question in its STM it can safely deduce that we are asking about
London in STM.
User gets the next week forecast for Berlin.
Again, the only useful data in the user sentence is
next week and
forecast. STM supplies
Next week override
weather in STM.
Note that STM is maintained per user and per data model. Conversation management implementation is also smart enough to clear STM after certain period of time, i.e. it “forgets” the conversational context after few minutes of inactivity. Note also that conversational context can also be cleared explicitly via REST API.
When working with NLPCraft you’ll be most often dealing with two concepts you have already seen above:
If you haven’t done it already - we highly recommend to look over Getting Started guide which will give you a quick dive into how these concepts interconnect.
Data Model is essentially a small piece of Java/Scala/Groovy (or any other JVM-based language) code that you will need to develop that defines how to translate user input into specific actions for a specific data source. Even though it may sounds complex - in reality most models are fairly simple and can be developed rather quickly. Data model does not necessarily have to work with only one type of endpoint but it is a good practice to have different types of endpoints handled by different models.
Data Probe is an application whose main purpose is to deploy and host user-defined data models. A separate application is required since often NLPCraft will be connected to a private data source (like corporate database, for example) and open connectivity to this database from the outside will not be possible - hence a need for a secure proxy type of application in between. Data probe is a secure application that employs end-to-end encryption, HTTP tunneling and router ingress-only connectivity. Each data probe can host multiple models, and you can have multiple data probes. Data probes can be deployed anywhere as long as there is an outbound connectivity, and are typically deployed in DMZ or close to your private data sources. Data probe can be launched in-process for easier development and testing or as a command line tool for production usage.