4

Modelling Natural Language Expressions with Ontologies

 

In this chapter we will discuss the ontological approach at representing natural language expressions to reveal the underlying conceptual structure. Subsequently, we will refer to the status of ontologies in the field of Natural Language Processing (NLP) to mention the relationship that exists between the natural language and the ontology from the point of view of NLP.

 

Humans require words and expressions or at least symbols to talk about things in the world and to communicate efficiently. However, the mapping of words to things is only indirectly possible. It is done by creating concepts that refer to things. As Kuhn in [81] puts it nicely, “meaning is about languages (in information systems or elsewhere), not about the world. Languages are about conceptualizations, and conceptualizations are about the world”. However, meaning of an expression depends on how a speaker or listener understands the expression. This understanding, therefore the interpreted meaning, is strongly related to the socio-cultural context of the speakers and of the listeners.

 

Same situations and same things in the world can be conceptualized in multiple ways, which is the result of context dependent multiple understandings of same situations and things. This can be best observed by the expressions of natural language, which exhibit many forms of talking about the same phenomena. For example, “Beetle”, can have at least two different conceptualizations. First one is a particular type of insect and the second one is the famous car from the 70s. Ontologies, predominately domain ontologies, provide ways to explicate the conceptualizations behind the expressions of natural languages. In other words, from the point of view of NLP, ontologies are models about the meaning of expressions used in the languages. As such, they predict or prescribe the use of an expression to refer to a conceptualization of something in the world. Put another way, they reduce the one-to-many relationship between the language expressions and the concepts they denote, to a one-to-one relationship for a given language.

 

Given this relationship between the natural language and the concepts they denote, Bateman in [11] discusses how the relationship is considered in terms of ontological models. Accordingly, he distinguishes between three types of ontologies.

The first type of ontology assumes that there is no theoretical difference between the lexical information and the non-lexical information. That is, both expressions of natural language and the concepts they denote are treated the same way. To be more correct, in this type of ontology, lexical information is simply subordinated to the non-lexical information.

The second and the third types of ontologies do draw a distinction between lexical and non-lexical information. Accordingly, if the information is non-lexical and is related to psychology, then it is conceptual. If it is related to sociology, then it is contextual.

In [12] representatives for each type of ontology are introduced and discussed in detail, where it is also stated that the second and the third types of ontology have the most representatives and are suggested.

 

Bateman, who defends the co-existence of both lexical and conceptual levels, accounts for the third type of ontology. This is the one that distinguishes between the lexical and the conceptual information and that relates the conceptual information to the socio-cultural reality; “But the commonsense world, where humans live is just as much, if not more, the world of social reality than it is the perceptual world of direct interaction and it is this socially-infused commonsense world for which accounts are necessary when more sophisticated behavior is to be explained or modelled” and “Each language and corresponding culture will have its own particular classes and combinations, its ways of giving meanings to the ground ontological attributes.” Thus, Bateman defends the idea that the commonsense world and the associated conceptualizations go hand in hand with the social-reality, which are among others, reflected through the language of the associated culture.

 

John Sowa’s approach to the relationship between natural language and the conceptualizations is more towards the approach taken for the first type of ontology, therefore he has been heavily criticized by Bateman. John Sowa in [64] argues that languages, be it natural or artificial, are made up of symbols organized in well defined syntactic structures but the real world is made up of an endless variety of things. Therefore, it is not possible to capture the full richness of the world by means of languages. So, linguistic knowledge is subordinated to real world knowledge. However, there is still one point where Bateman and Sowa unite. That is, John Sowa also agrees that a language represents the concepts that exist in the environment and in the culture of the people who speak the language. Thus, the expressions of natural language about one domain deliver information about the underlying conceptual structure of the domain w.r.t. to some culture or society.

 

In [14] Madsen, Thomsen and Vikmar argue that natural language expressions themselves do not suffice to learn about the meaning as they merely lexicalize concepts. In order to understand the meaning we need to look at the conceptual structure that is lexicalized by the expressions. Yet, the relation between the conceptual structure and its linguistic representation is rather complex. As we have mentioned before, same expressions or same word forms may have multiple conceptualizations, which brings us into a state of ambiguity. Lenci in [41] identifies at least three reasons that give rise to ambiguities; (i) heterogeneous and implicitly structured nature of natural language, (ii) polysemy i.e. one word having multiple meanings and (iii) one word sense not clearly denoting the concept.

 

Let us look at a few examples. The concept Day, according to the Merriam Webster’s Dictionary has eight different senses:

1.      the time of light between one night and the next .

2.      the period of rotation of a planet (as earth) or a moon on its axis.

3.      the mean solar day of 24 hours beginning at mean midnight.

4.      a specified day or date.

5.      a specified time or period as in <grandfather’s day>.

6.      the conflict or contention of the day as in <played hard and won the day>.

7.      the time established by usage or law for work, school, or business.

8.      a period of existence or prominence of a person or thing.

 

One person, who uses the word “day”, may refer to any one of the eight senses i.e. to the concepts listed above. In other words, one form at the linguistic or lexical level maps into many different concepts at the conceptual level.

 

Now that we know there are multiple conceptualizations of same matters as a result of the associated social-context, that there are multiple word forms, which denote those conceptualizations and that we know only looking at the expressions does not suffice to understand the conceptualizations, we need to confront the question of how to represent the expressions by means of an ontology. In our case, how are we going to represent socio-cultural time expressions related to nations, religions, business life & education by means of an ontology? As we have seen, there are two different approaches. According to the first one, we can subordinate the socio-cultural time expressions to socio-cultural time concepts i.e. we can subordinate the lexical level to the conceptual level. The other approach is to distinguish between the socio-cultural time expressions and the socio-cultural concepts, i.e. to distinguish between the conceptual and the lexical level, to accept that they co-exist and to create the ontological model accordingly.

 

Following the first approach, a single ontology would suffice to model the domain. Second approach, however would require two ontologies to treat the two levels appropriately. In our regard, the second approach would deliver a more accountable model of the domain and it is also the suggested approach as we have seen. Hence, the first ontology, which refers to the lexical level, models the expressions of socio-cultural time that denote the concepts of socio-cultural time. Thus, the first ontology, although it refers to the meaning to a certain extent, does not unambiguously and precisely describe the meaning. Rather, it demonstrates the use of the socio-cultural time expressions in the natural language, thereby represents their ambiguity.

 

Second ontology, models the concepts denoted. Therefore, it does provide an explicit and unambiguous description of the meaning, in that it predicts or prescribes the use of the expression to refer to a conceptualization of something in the world. In other words, the second ontology at the conceptual level determines an interpretation (or a fixed meaning) for the expression. Consequently, the syntactic structure of both ontologies remains similar, whereby the semantic structure is different. In the ontology at the lexical level the expressions are necessarily assigned to multiple categories to exhibit the ambiguity. In the ontology at the conceptual level this is strictly avoided to describe the meaning precisely. Furthermore, only the ontology at the conceptual level can include binary relationships such as subordinate Concept or superordinate Concept, as these relationships explicitly refer to concepts i.e. to the meaning.

 

Finally, such an ontological model would not be a linguistically motivated ontological model. To be linguistically motivated, the purpose of the ontology would have been to explicate the relations between all the lexical forms (words, phrases, collocations etc.) of a given language and all the concepts that each one lexical form denotes. Such an ontology would then represent, for example, the word “day” and all the concepts it denotes i.e. all of the eight senses listed above. Then, it would relate the word “day” to each one of these senses by using semantic relations, for example has Wordsense, synonym antonym etc. WordNet, as mentioned afore, is an example of an ontology, which has this kind of purpose for the English language. The purpose of the ontological model we have described is rather to provide consensus about a given domain by laying down or prescribing one interpretation of a given lexical form among many others. In our regard most domain ontologies follow this principle. For example, a domain ontology about financing would most probably determine the interpretation of “bank” as a financial institute, whereas a domain ontology about carpentry would determine the interpretation of the same word as a piece of (wooden) furniture.

 

Let us summarize all the issues discussed. Humans require words and expressions to communicate and they do this by creating concepts, which refer to things and which are denoted by the expressions. The creation of concepts is highly dependent on the socio-cultural context in which one person lives. Therefore, there can be multiple conceptualizations. Moreover, same expressions may denote multiple concepts. Domain ontologies provide ways to explicate the conceptualizations behind the expressions of natural languages. They do this by predicting or prescribing the use of an expression to refer to a conceptualization of something in the world. Linguistically motivated ontologies list all the expressions and all the concepts they denote and they function as lexical references. From the NLP point of view, there are three types of ontologies, which deal with the situation. The first type does not distinguish between concepts and expressions and assumes that the former is subsumed by the latter. The second and the third types do distinguish between expressions and concepts or expressions and concepts that depend on contexts.

Views agree that only by looking at the expressions, it is not possible to explicate the meaning, this requires a conceptual level.

 

Consequently, the third type of ontology, which has the approach of partitioning the domain into a lexical and a conceptual level, is the suggested approach. Finally, we consider this is as the appropriate approach for modelling socio-cultural time expressions related to nations, religions, business life & education as most of these expressions denote multiple concepts, therefore they do require a conceptual level. This way, we can explicate the meaning and lay down an interpretation to avoid ambiguities concerning the domain. However, the lexical level is also necessary as the expressions of socio-cultural time deliver information about the concepts of socio-cultural time, even though the expressions themselves do not precisely and unambiguously explicate the meaning.

 

This chapter has discussed the NLP viewpoint for representing knowledge about real world by means of ontologies. One point has been the relationship between the conceptual structure of real world knowledge and its representation in the natural language. The second point has been how different ontological approaches consider this relationship. The opinions about conceptual structure of the real world being revealed in the expressions of natural language have also been discussed. We have mentioned the problems about natural language expressions denoting more than one concept.

 

 

BACK TO MAIN PAGE