2

State of the Art in Ontological Engineering

 

Ontology[1] is a term that appears in contexts as diverse as knowledge engineering [45],[52], knowledge representation [32],[64],[60], natural language processing [11],[12], database design [14],[15], information retrieval and extraction [62],[2],[33], knowledge management and organization [53] and multiagent systems [65],[38]. For a very long time, Ontology has been a subject matter to philosophy as a theory that is concerned with the nature of existence. More precisely, Ontology in philosophy describes what kinds of things exist in reality and explicates the relationships between the things existing in the reality [63].

 

In Artificial Intelligence (AI) ontologies have been used to explicitly declare the knowledge embedded in a knowledge-based system and to facilitate knowledge share and re-use. In multiagent systems ontologies describe the context of agents in which they interact with each other. Recently, the term ontology has started to get used more often within the context of the Web. As such, ontologies are understood as devices that bring a machine-readable conceptual structure to the Web, generating the Semantic Web [9],[47],[4].

 

Today, the Web is stated to have three major characteristics; first it has predominately a syntactic structure, second it is intended for human users and third it is a place to find things rather than to do things [68],[66]. Next step in the evolution of the Web is extending the Web with conceptual structuring or with meaning, so that it becomes a place to do things by using the so-called intelligent Web agents, so that it facilitates successful communication between humans and semantic inter-operability between software systems.

 

Providing meaning to Web has been considered as delivering an explicit description of the meaning of the documents found in the Web. More precisely, there should be references for the documents present on the Web that explain the meaning of the elements found in each Web document. Such references will be naturally located in the Web together with the associated Web documents. Thus, any application that has to do with a particular Web document can refer to its reference to understand the meaning of the document at hand. These references have to fulfil certain requirements in order to be understood by the application programs. The basic requirement is that the information contained in the references has to be explicit and well structured so that no ill-definitions or confusions exist. Given this picture, it has become clear that ontology could be a device to design the suggested references as it promises an explicit and unambiguous description and a common understanding of any domain. Consequently, ontology was declared to be capable of bringing the necessary conceptual organization to the Web [4]. Since then, it has become an ever-growing research field, particularly within the context of the Web.

 

The goal of this chapter is to deliver a survey about the state of the art in ontological engineering. Ontological engineering refers to the group of activities that involve the process of ontology development [5].

In the first section, we will begin with the definition of ontology and we will clarify the understanding of Ontology in philosophy and ontology in computer science.

The second section deals with the question of why ontologies are necessary and it briefly discusses the related ideas of communication, inter-operability, re-usability and knowledge sharing.

 

Ontologies could be collected at least under three groups with respect to their purposes: first meta-level ontologies, second commonsense (or general) ontologies, and third domain-specific ontologies. Thus, the third section, Kinds of Ontologies, will discuss the characteristics of the three kinds of ontologies and will briefly introduce representative ontologies for each kind.

In the subsequent sections, we will mention methodologies for building ontologies as well as tools and languages that are used for developing ontologies. So, fourth section introduces an ontology development methodology called METHONTOLOGY [3].

In the fifth section some well-known ontology languages, such as KIF [29], CycL [57], RDFS [72] and OWL [42] are discussed.

Section six aims at familiarizing some ontology development environments focusing on the two ontology editors: OilEd from the University of Manchester, UK and Protégé 2000 from the University of Stanford, USA.

Section seven demonstrates the range of purposes ontologies may serve. These may be as diverse as applications on the Web and applications in multi-agent systems. Among them, we will outline three application areas for ontologies; natural language processing, multiagent architectures and the vision of Semantic Web. For each application area we mention the role of ontologies.

Section eight deliberates three ontologies by comparing and contrasting them depending on three dimensions; what is the ontology for, how is the hierarchy of the ontology organized and what are the applications in which the ontology is used.

Finally, in section nine we will conclude the brief survey about the state of the art in ontological engineering by a discussion that evaluates the current situation on the basis of the provided information.

 

2.1What is Ontology?

 

Many definitions of the term ontology have been provided both in the field of philosophy and in the field of computer science, [31],[34],[5]. Yet, there has not always been consensus about the meaning of this term. In [34] Guarino reports that frequently similar ideas and issues are being addressed in different contexts and fields using different terminology and the term ontology is one of them. Accordingly, the idea behind the term conceptual schema used in data base community would be strongly similar to the idea behind the term ontology. In order to provide some terminological clarification, in this section we will have a closer look at various definitions of Ontology in the philosophical sense and the ontology in the computer science sense.

 

Ontology in its philosophical sense denotes the process of classification of entities in every area of reality [63]. The ontology in computer science, on the other hand, is understood as an abstract view of a part of the real world that is intended to be represented for computational purposes. The representation of this abstract view consists of the description of concepts and relationships that exist for that part of the world. As such, a computer science ontology is “a specification of a conceptualization” [31], and we will refer to this definition soon.

 

Considering the two definitions of the term ontology, three aspects seem to differentiate the understanding of the philosophical Ontology and the computer science ontology.

The first aspect concerns the representation of existence. From the point of view of philosophy, existence is what exists in every area of reality, whereas from the point of view of computer science, what exists is what you can represent.

The second aspect is about different purposes of the philosophical Ontology and the computer science ontology. The purpose of philosophical Ontology is to provide a complete description and explanation of all the goings-on in the universe. The computer science viewpoint on the other hand, says that it is impossible to represent the world in its entirety and in its full richness of detail. Thus, ontology in computer science has the purpose to provide the representation of a part of the world, so that it can be used to accomplish a certain task.

The last aspect is related to the function of philosophical Ontology. The philosophical Ontology is not understood as a device that could be used for a specific application; rather it acts as a reference to provide a better understanding of the reality and the existence. In computer science, however ontology is developed to be deployed for some concrete application.

 

So far we have discussed the use of the term Ontology in philosophy and the term ontology in computer science by comparing and contrasting their intended meanings in both disciplines. In the following subsections we will discuss ontology as a subject matter of research in both disciplines. We start with the ontology in philosophy.

 

2.1.1 Ontology in Philosophy

 

Ontology as a branch of philosophy dates back to times of Aristotle’s, who first worked out a theory of categories. Plato’s theory of forms and Russell’s theory of types are also considered as the earliest theories of Ontology [22]. There are many further definitions of ontology starting from early philosophers such as Gottfried Wilhelm Leibniz (1646-1716), who provides a rather enigmatic description of Ontology: “Ontology or the science of something and of nothing, of being and not-being, of the thing and the mode of the thing, of substance and accident” [1]. Christian Wolff (1679-1754), describes ontology as the first philosophy: “That part of philosophy which treats of being in general and of the general affections of being is called ontology, or first philosophy” [1].

 

More recent definitions of Ontology, like that of Guarino’s, [34] seem to be more concentrated on a comparative view: “In the philosophical sense, we may refer to an ontology as a particular system of categories accounting for a certain vision of the world. As such, this system does not depend on a particular language. On the other hand, in its most prevalent use in AI, an ontology refers to an engineering artefact, constituted by a specific vocabulary used to describe a certain reality” [34].

 

Comparative definitions of the Ontology seem to have become more frequent, especially after the AI community turned its attention to Ontology. In [10], Smith and Welty explain the reason why that has happened. Accordingly, difficulties arose when different groups of data and knowledge-base system designers attempted to share, represent and reuse each other’s frameworks and applications. Individual frameworks and applications have been built by each group using their own terms and concepts that correspond to their own needs and purposes without paying any specific account to the issue of compatibility. It is a fact that different knowledge- and databases employ identical terms for different applications and frameworks or same applications are referred to using different terms. In order to provide a solution to this incompatibility problem, computer science community referred to the Ontology. Thus, the philosophical Ontology has been applied to the context of computer science to provide a common vocabulary and a common understanding of the applications and frameworks about a given domain. In the following subsection we will more closely examine the ontology in computer science.

 

2.1.2 Ontology in Computer Science

 

As we have mentioned, Tom Gruber defines ontology as “a specification of a conceptualization ” [31], whereby he explains a conceptualization as a collection of objects, concepts and other entities that are presumed to exist in some domain and that are tied together with some relationships. As such, a conceptualization for Gruber is a simplified view of the world, a way of thinking about some domain. As we engage with the world daily and as we involve in events, situations or happenings, we deal with such conceptualizations. The main concern of ontology in computer science is the question whether or not its own conceptualizations correspond to the conceptualizations of the real world [63].

 

An extended version of Gruber’s definition is provided by Fensel in [26], who says an ontology is a “formal, explicit specification of a shared conceptualization”. With this definition he brings about three further requirements, namely the explicitness, the formality and the condition of being shared. Accordingly, the conceptualization should be explicitly documented, it should be so formal that it is machine processable and finally it should be shared by a community so that there can be consensus about it. When application systems share an ontology, they are said to commit to that ontology and are expected to take actions, which are consistent with the definitions in the shared ontology.

 

Another definition views ontology as an engineering artefact, which is constituted by a specific vocabulary and by a set of explicit assumptions regarding the intended meaning of the vocabulary [49]. The vocabulary together with the assumptions should describe a certain reality. In the simplest case, an ontology is a hierarchy of concepts related by an is-a relationship, which is also known as taxonomy.

 

We conclude with Deborah Mc Guiness’ discussion about what can be considered as ontology in [47]. Accordingly, specifications that meet the following criteria can be considered as simple ontologies:

 

§         has a finite controlled (extensible) vocabulary.

§         there is a strict hierarchical subclass relationships between classes.

§         has an unambiguous interpretation of classes and term relationships.

 

Following properties are considered to be typical but not mandatory:

§         each class has property specifications.

§         individuals are included in the ontology.

§         each class has value restriction specifications.

 

Finally, the following properties may be desirable but not mandatory nor typical:

§         disjoint classes are specified.

§         arbitrary logical relationships between terms are specified such as inverse and part-

whole relationships.

§         each class has property specifications.

 

In this section we have discussed various definitions of ontology in two different disciplines of computer science and philosophy. We have compared and contrasted the understanding of ‘Ontology’ and ‘ontology’ in philosophy and in computer science, respectively. In the remainder of the thesis we will deal with the ontology in computer science.

 

2.2 Why Ontologies?

 

In this section we are going to discuss why there is a need for ontologies. Ontologies are thought to provide help in three areas at least; communication between humans, inter-operability between heterogeneous computer systems and re-usability and knowledge sharing between these systems [45],[46].

 

Humans necessarily communicate with each other, yet it is not always straightforward. People, depending on their contexts and individualities, have different viewpoints and understandings of the same matters. It is not the case that every time they communicate they can also understand each other. Therefore, humans do need consensus about matters in other to succeed in communication.

 

Software applications are designed according to the context specific requirements of their developers and the intended users. They are coded in different languages, they consist of different software components and they use different terminology to describe their components. It is clear that heterogeneous software application systems need a mediator that facilitates their cooperation with each other.

 

Over and over methods and applications are being developed to solve the same problems and to accomplish the same tasks because existing solutions are unknown or they cannot be re-used. That causes loss of time, loss of effort and loss of resources. Therefore, a medium is needed to make existing solutions accessible to everyone in a re-usable format. Ontologies are considered as a step along the path of finding answers to these needs.

 

In the next subsections we will inspect, how ontologies try to accomplish this.

 

2.2.1 Communication

 

Humans can communicate successfully if they have a shared understanding or a shared viewpoint of some domain. Such a shared understanding can be achieved if the domain is described explicitly without conceptual and terminological confusion so that it can be understood in the same way by everyone. An ontology, in this respect, corresponds to such an explication.

 

Ontologies facilitate communication by providing an explicit specification of how a domain ought to look like, which is also referred as the normative model of a domain [45]. Moreover, ontologies guarantee consistency and lack of ambiguity concerning the description of knowledge about a given domain. As a result, confusions and misunderstandings become hard to take place.

 

A final aspect about how ontologies support communication between humans is that they can integrate different perspectives of the users. When users, who have different perspectives about one domain, share an ontology they have one standardised perspective of the domain. To sum up, we can state that ontologies facilitate human communication because they describe the normative model of a domain, they eliminate any possible misunderstandings and confusions about the domain and because they provide consensus regarding the domain for a community of people.

 

2.2.2 Inter-Operability between Systems

 

Inter-operability between systems refers to the extent, which different software application systems cooperate with each other. As we have mentioned previously, incompatibilities occur between different software application systems as a result of the fact that they have been built by different groups of system designers, who use their own terms and their own concepts depending on their contexts.

 

One of the initial approaches to handle such problems has been to provide ad hoc solutions. In other words, researchers have tried to find appropriate solutions each time an incompatibility occurred. However, as the amount of interaction between different systems increased due to the developing technology, researchers have come to a conclusion that an ultimate solution is needed that would deliver more effective results over trying to find solutions for every one incidence of incompatibility.

 

The ultimate solution has been envisioned as a single ontology, a so-called backbone taxonomy that unifies the underlying conceptual structure of heterogeneous computer and software systems [63]. This idea initially comprised the inter-operability approach. Yet, within the course of time, it has been recognized that design of one ultimate ontology to unify all different computer and software systems would be a far too difficult task to accomplish. Moreover, it was not possible to determine the extent how far a single ontology would be adopted by a broad population of different computer science communities. Thus, inter-operability approach shifted its focus from the design of a backbone ontology to the exploration of integrative ways to support inter-operability.

 

According to the new direction, ontologies support inter-operability in various other ways than that provided by the one single ontology approach. One way is that the ontology can function as interlingua [45] between different systems. That is, they can be used to translate between different application languages and different representation schemes. When two application systems that have different conceptual structures or that are written in different languages need to interact, translations between the two systems become necessary. Moreover, these bi-directional translations need two separate processes. When more than two systems interact, the number of translations increase in due proportion. By deploying an ontology, the number of translations between different application systems can be reduced. In that case, each application would only need to translate its contents to the one language provided by the ontology instead of translating it to all the other languages of other applications and receiving all their translations.

 

The second way ontologies may facilitate inter-operability is the knowledge integration concerning different domains. In other words, with the objective of describing a unified domain or accomplishing a common task, an attempt can be made to integrate ontologies from different domains, each containing different kind of knowledge.

 

A final way how ontologies can support inter-operability is that they can integrate different vocabulary concerning same domains. That is, when several ontologies exist that describe the same domain using different vocabularies, these ontologies could be integrated by means of ontology integration methods to share the same vocabulary. Thus, different tools about the domain could commit to the integrated ontology and share one same vocabulary.

 

2.2.3 Knowledge Sharing and Re-usability 

 

Knowledge sharing refers to the idea that when knowledge about some domain is formally described and documented, it can be made public so that others can also benefit from it [30],[40],[54],[52]. Re-usability and knowledge sharing can be achieved if knowledge components are explicitly specified and agreed upon by a community of agents. The idea behind the re-usability approach is that instead of developing from scratch, new computer and software applications should be built by assembling knowledge components that have already been built by others. The purpose of such a practice is to decrease the high cost of software development and maintenance and to avoid loss of time.

 

Ontologies facilitate knowledge sharing and re-usability because they provide formal and explicit definitions of knowledge components that can be made public, for example on the Web, which can be shared with and reused by the others. The so-called ontology library systems can be used for these purposes. An ontology library system is an easily accessible system that offers various functions for managing, adapting and standardizing groups of ontologies [55],[69],[54],[67]. Thus, when an ontology is ready, it can be uploaded to an ontology library system, where a larger set of developers could have free access to it. If needed, developers may download the ontology from the library and re-use it for their own purposes. Ontolingua[2], WonderWeb,[3] semWebCentral[4], and DAML ontology library[5] are examples of such ontology libraries for the Web. An extensive survey about Ontolingua and other current ontology libraries is available in [55].

 

Focus of this section has been on illuminating the reasons why ontologies are necessary. We have mentioned about major areas, where problems can occur when humans and different software application systems interact. These areas are communication between humans, interoperability between heterogeneous computer application systems and knowledge sharing and re-usability. Later, we have discussed how ontologies help overcoming the problems. To conclude, we summarize these issues. Ontologies support communication when humans agree on a specific ontology. In doing so, humans agree on only one interpretation of some domain, so that any possibility of confusion or misinterpretation is avoided. Ontologies support interoperability when heterogeneous computer application systems commit to an ontology. In that case, the ontology can act as an interlingua between those systems. Ontologies facilitate inter-operability also by facilitating knowledge integration and by preventing duplications of vocabularies. Finally, knowledge captured in ontologies can be shared and re-used, for example by uploading them to the ontology libraries on the Web.

 

2.3 Kinds of Ontologies

 

Ontologists and researchers of AI have distinguished between several kinds of ontologies that have been determined according to varying criteria. Among them three kinds of ontologies seem to be more representative. These are meta-level ontologies, commonsense ontologies and domain ontologies.

 

For Uschold, kinds of ontologies can be determined according to three dimensions “under which the ontology is desired” [67]. These dimensions are formality, the purpose and the subject matter. Consequently, an ontology can be highly informal by being defined in natural language, it can be rigorously formal by being written in a very formal language with formal semantics or it can be at a level in between.

 

An ontology may be generic or less generic in terms of its purpose. For example, ontologies representing very general knowledge such as commonsense knowledge are then considered to be generic. On the other end, ontologies that concentrate themselves on a particular application are less generic.

 

Finally, according to the subject matter an ontology can be a domain ontology, which describes knowledge about some specific area. It can be a task ontology, which is designed to deal with solving a specific problem or it can be a meta-level ontology, which describes information about data.

 

2.3.1 Meta-Level Ontologies

 

Meta-knowledge or meta-data is data about data [26]. Information about who has produced a given information, when it has been produced, in what format the information is and so forth, are all regarded as meta-data. As such, it is believed to be necessary for efficient access and intelligent management of data. For example, a library catalogue card can be considered as metadata; it contains data about the nature and the location of a book so it contains data and it is also data itself.

 

A meta-level ontology, also called metadata ontology is similar. It structures data and as such is itself data. Dublin Core Meta-Level Ontology[6] is an example of a meta-level ontology. It is the outcome of an initiative that is prompted by the need for structuring data present on the Web. The purpose of the Dublin Core Meta-Level Ontology is to facilitate efficient search and efficient information retrieval despite the vast amount of information existing on the Web. Thus, Dublin Core Ontology attempts to organize the Web content by providing bibliographic information for the documents present on the Web. More precisely, Dublin Core Ontology defines some fifteen properties, which can be directly inserted into the HTML code

of the Web pages in form of so-called meta-tags. Using meta-tags such as ‘Creator’, ‘Subject’ or ‘Title’, each Web document can be entered information that gives further information about its contents such as the related people, organizations, the creation time and date, subject matter and so forth.

 

2.3.2 Commonsense Ontologies

 

Merriam Webster’s Dictionary defines two senses for commonsense knowledge: “the unreflective opinions of ordinary people” and “sound and prudent but often unsophisticated judgement”. From AI point of view, commonsense knowledge is the one that is not explicitly stated but is implicitly present in humans’ minds. For example, reading the sentence “Mary saw the dog in the window. She wanted it”, we know by commonsense that it is the dog that Mary wants but not the window, although it is not explicitly stated [60].

 

Commonsense ontologies, also called upper, top-level or general ontologies, have the purpose to make such implicit knowledge explicit, so that it can be understood, shared and reused. The requirement for commonsense ontologies is that the knowledge they describe has to remain domain independent. That allows the construction of domain ontologies based on the domain-specific concepts in commonsense ontology [56].

 

Standard Upper Model Ontology SUO[7] is one example of a commonsense ontology, which is an outcome of the collaborative efforts on creating a general-purpose formal ontology [70]. It is promoted by the IEEE Standard Upper Ontology working group, and it has been officially approved as an IEEE standard in December 2000. Parties taking place in the development of the process were representatives of government, academia, and industry from several countries. There are currently two versions of SUO: the IFF (Information Flow Framework) Foundation Ontology and the SUMO (Suggested Upper Merged Ontology). The purpose of SUMO is to create a comprehensive and consistent top-level ontology from some of the best public resources such as CNR’s mereotopology group, other upper-level ontologies, time theories like that of James Allen’s and others [5]. Another very well-known commonsense ontology is the Cyc[8] ontology, which we will refer to in detail in the forthcoming sections.

 

2.3.3 Domain-Specific Ontologies

 

In literature, domain-specific or domain ontologies are defined as the declarative conceptualizations of terminology and knowledge in one domain [19]. The purpose of domain ontology is to facilitate the use of knowledge across different tasks and applications. That is, if knowledge about some domain is captured in a formal way in a domain ontology, then the knowledge can be easily accessed, distributed and reused just by publishing and deploying the ontology. There are two important criteria for the design of domain ontology. First one is that it should be designed in such a way that it can be integrated into a more general ontology when it is necessary. Second requirement is that it should be possible to integrate the domain ontology with another domain ontology to show how the two domains are related to each other.

 

Today, there is a vast variety of domain ontologies, most of which can be accessed by the ontology libraries we have mentioned previously. Linguistic ontologies, such as WordNet[9] and SENSUS[10], engineering ontologies such as EngMath Ontology[11], and enterprise ontologies such as the Enterprise Ontology[12] are examples of domain ontologies.

 

In this section we have mentioned about various kinds of ontologies that have been determined according to different criteria. We have introduced three kinds of ontologies, which are the meta-level ontologies, the commonsense ontologies and the domain ontologies and we have discussed their purposes. For each kind of ontology we have mentioned representative ontologies briefly.

 

2.4 Methodologies in Ontology Building

 

The existence of vast amount of ontologies developed by different groups with different approaches and techniques have brought about the issue that a systematics for constructing ontologies is necessary. Moreover, some researchers like Charlet et al. or Bench-Capon et al. have criticised the lack of an organization and the lack of standardized activities in ontological engineering and have defined the state of the art as being far from engineering but art [20],[37]. Consequently, some methodologies have been proposed to assist the process of ontology development. In this subsection we will have a brief overview of some existing methodologies, whereby we will focus on the METHONTOLOGY.

 

Most representative methodologies are known as Uschold and King’s Enterprise Methodology [67] Grüninger and Fox’s TOVE (Toronto Virtual Enterprise) methodology [43], and the METHONTOLOGY methodology [3]. Enterprise Methodology has emerged as an outcome of the experience of the two ontologists during the development of the Enterprise Ontology. In a similar way, the TOVE methodology has come into being during the construction of the TOVE ontology. Later, METHONTOLOGY was proposed as an official ontological engineering methodology. A comprehensive survey of these methodologies is provided in [27].

 

Goméz Peréz [71] lists some general principles that should be taken into considerence while developing an ontology. Some of these principles are (i) clarity and objectivity, (ii) completeness and coherence, (iii) maximum monotonic extendibility and (iv) ontological distinction. All these principles together say that the ontology should define the meaning of terms and provide complete definitions, it should deliver a natural language documentation, it should permit consistent inference and adding new knowledge to the ontology should not result in inconsistencies. Additionally, the ontology should make as few claims as possible about the world being modelled, that is, the ontology should not be over-specified. Finally, names in the ontology should be standardized wherever possible.

 

In most cases, the methodologies or ontology development approaches seem to have made their appearance from the experience of developing an ontology for some purpose related to a given domain like in the Enterprise or TOVE ontologies. Therefore, we assume they can provide good support when developing ontologies for similar purposes in similar domains. However, they may fall short in demonstrating the same efficiency for other purposes in other domains. METHONTOLOGY, however seems to constitute an exception, which has been developed as a stand-alone, domain independent methodology for ontological engineering purposes. Therefore, we will further investigate this methodology in the following section.

 

2.4.1 METHONTOLOGY

 

METHONTOLOGY is a “well-structured methodology used to build ontologies from scratch” [3] and it enables the construction of ontologies at the knowledge level [50]. It has been developed in Laboratory of Artificial Intelligence at the Polytechnic University of Madrid. METHONTOLOGY is supported by a software tool called Ontological Design Environment (ODE) and it consists of two processes called the ontology development process and the ontology life cycle.

 

The ontology development process consists of three sub-processes that run synchronously. First one is management and it involves activities about determining how much time and resources are needed for the construction of the ontology, about controlling whether the planned tasks are eventually completed and about verifying if the final results are satisfactory.

 

Second one is the technical sub-process. It consists of activities that are about the actual development of the ontology. These activities are specification, conceptualization, formalization, implementation and maintenance.

Specification activity states the purpose and the scope of the ontology. In other words, it states, why the ontology is being built, for what it will be used and who will use it. At the end of the specification activity a Specification Document is produced, which is in form of a table that summarizes the overall information about the ontology.

The conceptualization activity identifies the concepts, instances, relations and properties (attributes) related to the domain and it provides documentation. The conceptualization activity starts with the definition of a Glossary of Terms, which is a table consisting of a name and a description for each term that shows up in the domain of the ontology. Then Concept Classification Trees are built, where concepts of the domain are organized into taxonomies. After that, Concept Dictionary is defined based on the concepts organized in the concept classification trees. The Concept Dictionary is a table that consists of the meanings, attributes, instances of all concepts in the classification trees. Each instance of every concept in the concept dictionary is also listed in separate tables, which show the characteristics of the instances. At the end of the conceptualization activity, it is guaranteed that every sort of information that is present in the ontology is explicitly defined and written down so that it is clear what the ontology is exactly going to talk about.

As next, the whole information is formalized and implemented in a computational language.

Finally maintenance activity concerns the continuous carrying out of knowledge acquisition, evaluation and documentation of the ontology.

Last part of the development process is the support sub-process, which as the name suggests, is about providing support to the management and technical sub-processes.

 

The second and the last process of METHONTOLOGY is the ontology life cycle process. As such, it is an ordering relation over each sub-process defined in the ontology development process. In other words, ontology life cycle process determines which sub-processes occur first and which activities are to be carried out primarily.

 

METHONTOLOGY seems to be a highly detailed and systematic methodology that provides clear-cut guidelines for the construction of ontologies. METHONTOLOGY has been adopted for the construction of, among others, the CHEMICALS, the Environmental Pollutants ontologies and the Reference-Ontology. [27] METHONTOLOGY has been recommended by the FIPA Foundation for Intelligent Physical Agents for the process ontology construction[13].

 

In this section we have mentioned how methodologies for development of ontologies came about and have discussed several methodologies focusing on the METHONTOLOGY.

 

2.5 Languages for Writing Ontologies

 

Ontologies are formal theories about a specific domain; therefore they require a formal logical language to express them. Most languages for formalizing ontologies seem to have emerged based on two approaches; first-order predicate logic (FOL) and XML-RDF. In this section we will describe the ideas behind both approaches and mention their most representative languages. Concerning the first approach, we will examine KIF and CycL. Before we delve into the details of the following languages, we will provide an overview of XML and RDF as a language and a modelling structure, respectively. With regard to the second approach, we will look at RDFS and OWL. In doing so, most emphasis will be devoted to OWL[14]. This ontology language was proposed by World Wide Web Consortium (W3C) as the standard for developing Web-based ontologies, in February 2004.

 

2.5.1 First Order Predicate Based Languages: KIF and CycL

 

This subsection will discuss the ideas behind the development of KIF and CycL that are known as ontology languages. Both languages extend FOL by using second order concepts.

FOL based knowledge representation languages have emerged from the ideas of a community of mathematicians and computer scientists, who wanted to define some expressive languages related to computer systems. Their motivation has been the shortfall of existing database or object-oriented languages in describing information in its highest generality. Their goal was to define such languages that are as expressive as natural languages but that do not suffer from imprecision or from ambiguity of the natural language. Thus, being persuaded by the expressivity and power of FOL, they set out for defining FOL based computer systems languages. KIF and CycL are languages that base themselves on this kind of approach. KIF initially was not designed as an ontology language but as a language for knowledge interchange. Yet, it is considered to qualify as an ontology language because of its high level of generality. CycL on the other hand, was developed as an ontology language in particular to represent the knowledge embedded in CYC common-sense ontology.

 

KIF (Knowledge Interchange Format) is a computer oriented language that was introduced by Genesereth and Fikes in 1992 [29]. It is designed as a knowledge exchange format between different computer systems to facilitate knowledge share. As such, it can be applied for the specification of ontologies, for software agent communication, for automated deduction and for constraint satisfaction. Typically, KIF works the following way: a program reads a knowledge base in KIF and it converts what it has read into its own internal implementation language. The program does all computation about the information using its own language. Later, when the program needs to communicate with another program, it maps its data that is in its own implementation language back into KIF. The communication partner this time translates the data from KIF into its own implementation language in a similar way. Thus KIF acts as an interlingua between different computer application systems.

 

KIF specification identifies three major characteristics for the language. First, it has a declarative semantics i.e. the meaning of expressions in the representation language can be understood as is. Second, it is logically comprehensive i.e. any sentence in the first-order predicate calculus can be expressed. Third, it provides for the representation of knowledge about knowledge. Hence, users of the language can introduce new knowledge representation constructs without modifying the language. KIF language consists of constants, expressions, definitions and forms; a KIF knowledge base is a finite, unordered set of forms. Conjunctions, disjunctions, implications, equivalences and quantification can be defined in KIF using the appropriate operators. KIF extends FOL by using second order concepts such as reification (statement about a statement) of formulas as terms in other formulas.

 

CycL[15] ontology representation language has been developed as a part of the Cyc project, which aims at constructing the largest knowledge base present to provide common-sense to computers. We will refer to Cyc in the forthcoming sections. As such, CycL is the medium of representing the Cyc ontology. Like KIF, CycL is also based on FOL and it also extends FOL by using some second-order features such as reification, equality, default reasoning, non-monotonic reasoning.

 

One specific aspect of CycL is the presence of so-called microtheories, which are also called contexts. They are Cyc constants denoting assertions, which are grouped together because they share a set of assumptions. Thus, a microtheory consists of assertions and each assertion must be explicitly stated to be true in at least one microtheory. An example of a microthery in Cyc is the naive theory of physics (NTP), which says -using Cyc expressions- that if something is not supported it is going to fall [57]. As this theory cannot always hold (e.g. balloons, astronauts), microtheories in CycL come with additional specifications of when and where they should be applied. CycL extends FOL by allowing reification (predicates and formulas in CycL are treated as terms that can show up in other formulas) and by defining contexts to assert the truth of formulas.

 

2.5.2 XML and RDF Based: RDFS and OWL

 

XML [72] was developed to define a machine readable language that allows the syntactic structuring of documents. Therefore, XML cannot handle the issues concerning the semantics of the documents such as explicating the meaning of Web documents and providing terminological consensus on the Web.

 

Thus, researchers set out for developing languages that support semantics and that built on XML to benefit from its advantages such as the syntactic structure. RDFS and OWL languages are outcomes of such an attempt. Both languages are based on RDF (Resource Description Framework), which is a data model developed for describing Web resources with metadata. As such RDF is not a language but a data model that is independent of any domain or implementation [9], [24].

 

As a data model RDF is graph based and it consists of nodes and edges. Nodes correspond to objects or resources and the edges correspond to properties. The labels on the nodes and on the edges are Uniform Resource Identifiers (URIs). Resources are all things being described by RDF expressions. A resource may be an HTML document, it can be a part of a Web page e.g. a specific HTML or XML element within the document source or it can be a collection of pages e.g. an entire Web site. Properties are specific attributes that describe resources and they have a defined meaning.

 

A property together with its value for a specific resource makes a statement about that resource. Statements consist of a specific resource together with a named property plus the value of that property for that resource. Thus, an RDF statement is a triple, whose parts are the subject, the predicate, and the object. The object of a statement, that is the property value, can be another resource, it can be a literal for example a resource specified by a URI, it can be a simple string or some other primitive datatype defined by XML. Reification is possible in RDF, so statements can be made about statements. A detailed documentation of RDF can be found at World Wide Web Consortium (W3C) RDF Primer [24]. As such, RDF itself does not define any primitives for creating ontologies, it provides basis for several other ontology definition languages such as RDFS.

 

RDF Schema or RDFS [72] has been developed in order to define the vocabulary used in RDF data models by specifying which kinds of properties apply to which kinds of objects, what values the objects can take and what kinds of relations between those objects exist. Therefore, RDFS is considered as a first move towards an ontology language for the Web.

 

RDFS offers a fix set of modelling primitives such as rdfs:Class, rdf:Property or the rdfs:subClassOf relationship to define RDF vocabularies for some specific application. In RDFS it is possible to define classes of classes, classes of properties, classes of literals that are strings, integers, booleans and so forth and classes of statements. Using RDFS properties, which are rdf:type, rdfs:subClassOf and rdfs:subPropertyOf, it is possible to define instanceOf relationship between resources and classes, subsumption relationship between classes and subsumption relationship between properties, respectively. Using rdfs:domain and rdfs:range properties it is possible to restrict the resources that can be subjects or objects of the property.

  

As we have mentioned, RDFS is regarded as only a first move towards an ontology language because it is considered to be not expressive enough to qualify as a full ontology language. There are a number of things that cannot be said in RDFS. For example, disjoint, union, intersection and complement classes cannot be defined, cardinality restrictions are not present and properties cannot be declared as transitive, symmetric or inverse of each other. Yet, researchers have determined that such features are essential for an ontology language if it is to provide efficient reasoning support. Therefore, they have set out for the development of a more expressive ontology language.

 

OWL (Web Ontology Language)[16] has been developed with such a motivation. It is an outcome of the collaborative efforts of US American and European researchers, whose goal has been to develop an ontology language other than RDFS that can be commonly adopted and that will facilitate the semantic inter-operability on the Web. The Web Ontology Working Group of World Wide Web Consortium (W3C) describes OWL as “a language designed for use by applications that need to process the content of information instead of just presenting information to humans” [42].

 

Influences on OWL language has been, besides its predecessor DAML+OIL [18], Description Logics [23], the Frames paradigm and RDFS. As such, OWL language has three species: OWL Full, OWL DL (for Description Logics) and OWL Lite.

 

First language, OWL Full, is the most expressive of the three species. It is upward compatible with RDF, thus every valid OWL Full document is an RDF document. Yet, it is undecidable, therefore it is not possible to perform automated reasoning on OWL Full.

 

OWL DL, as the name suggests, is closely related to Description Logics and it constraints OWL Full with ideas from Description Logics. OWL DL is computationally complete and decidable, hence it is possible to automatically compute the classification hierarchy and check for inconsistencies in an OWL DL ontology.

 

OWL Lite is the least expressive sublanguage and its intended use concerns situations, where only a simple class hierarchy and simple constraints are needed. Yet, as a result of its restricted expressivity, OWL Lite can provide effective reasoning support. In [42] a detailed description of OWL and the characteristics of OWL ontologies are provided.

 

OWL ontologies have three components. These are classes, individuals, also called instances, and properties. In other formalisms properties are sometimes called as roles, relations, or attributes.

 

OWL classes are interpreted as sets that contain individuals. Classes can be organised into a superclass-subclass hierarchy. When a class is declared to be the subclass of another, then every instance of the first class will also be the instance of the second one. In OWL DL, the superclass-subclass relationships can be computed automatically by an automatic inference mechanism. Classes can be declared to be union, intersection and complement classes. They can also be equivalent to each other. Finally, there are enumerative classes in OWL, which are classes that are defined by precisely listing the individuals that are the members of the class. Exactly these individuals make up the class. For example, the class Kansas City Jazz Musicians can be defined as being made up of exactly the members (the individuals) “Count Basie” and “Dizzy Gillespie”.

 

OWL individuals are the objects of the domain that we are interested in. Referring to the example above “Count Basie” and “Dizzy Gillespie” are some of the individuals of our domain, say, the domain of Jazz Musicians. Further individuals could be then “Billy Holiday”, “Miles Davis”, “Thelonious Monk”, “Duke Ellington” and so forth.

 

OWL properties are binary relations on individuals i.e. they link two individuals together. There are two types of properties in OWL. Object Properties relate objects to other objects like in “Chet Baker” plays Instrument “Trumpet”. Datatype Properties, relate objects to datatype values. For example, “Chet Baker” died at the Age of “59”. Like in RDFS, properties in OWL have also domains and ranges.

 

Similar to the case with classes, OWL properties may have subproperties, so that it is possible to form hierarchies of properties. For example, the property is Jazz Musician may have the more specific property is West Coast Jazz Musician as its subproperty.

 

Restrictions in OWL are the quantifier restrictions, the has-value restriction and the cardinality restrictions. The quantifier restrictions are declared using the two OWL constructs owl:allValuesFrom (semantically equivalent to the universal quantifier “"”) and the owl:someValuesFrom (semantically equivalent to the universal quantifier “$”). The has-value restriction is declared using the construct owl:hasValue (“'”). The owl:hasValue is a restriction on the value that some property can take by exactly specifying what that value is. For example, is the city of Olympic Games 2004 owl:hasValue “Athens”.

Using the cardinality restrictions on properties, we can describe the class of individuals that have at least “ < ”, at most “ > ” or exactly “ = ” a specified number of relationships with other individuals or datatype values. Properties in OWL can be declared to be transitive like in is Older than property, they can be symmetric like in is Married To property or they can be functional, which states that a property has at most one value such as the property age.

 

One benefit of writing ontologies using OWL (more precisely OWL DL or OWL Lite) is that they can be processed by an inference mechanism i.e. by a reasoner. Thus, it is possible for a reasoner to check for subsumption relations in OWL ontologies and to compute the inferred class hierarchy. A reasoner can also check for consistency of OWL ontologies and can determine whether or not it is possible for a class to have any instances. At least two such reasoners, RACER[17] and FacT[18] based on Description Logics, provide reasoning support for OWL ontologies. There are several future extensions that are being discussed for the OWL language such as enabling the definition of rules in OWL, which is currently not possible. Related research is being conducted [36].

 

Let us summarize this section. Throughout this section we have inspected four different ontology languages. Two of them are based on FOL, whereas the other two are based on XML and RDF. We have pointed out that the main distinction between the FOL based ontology languages and the XML based ontology languages seems to be that the former have not been developed with the specific purpose of applying them to the Web. KIF is an interchange format, which enables the exchange of information between different computer application systems and is general enough to qualify as an ontology language. CycL is an ontology language that has been developed to represent the Cyc commonsense ontology. Neither KIF nor CycL are specifically devoted to accomplish a task concerning the Web, whereas this has been the major concern of XML and RDF based ontology languages. RDFS and OWL are examples of such languages. They allow sharing ontologies on the Web. OWL has been declared as the latest standard for Web ontology languages.

 

2.6 Environments for Building Ontologies

 

In this section we will introduce two ontology editors that provide environments for convenient editing of ontologies. The first editor is Protégé 2000, which has been developed by Stanford’s Medical Informatics Section in USA. The second editor is OilEd and it is developed by the Computer Science Department of the University of Manchester, UK.

 

There are many other ontology editors. We have decided to introduce these two editors out of three reasons. First both of them seem to be compatible with the latest standards in the field of ontological engineering. Protégé allows direct editing in OWL, whereas OilEd is capable of (partially) reading and saving in OWL. Both have a well developed import and export mechanism for OWL and for other recent ontology languages. Second, both editors are open source and can be freely obtained from the World Wide Web. Third, both of them can be used on various operating systems such as Windows and Unix/Linux.

 

2.6.1 Protégé 2000

 

Noy et al. in [51] define Protégé 2000 as “a graphical tool for ontology editing and knowledge acquisition that we can adopt to enable conceptual modelling with new and evolving Semantic Web languages”. Protégé is a computer program, which should be installed on the local computer and it can be downloaded as freeware from the Website of Protégé 2000[19]. It is available on different platforms like Windows, Mac OS, Solaris, Linux, Unix and its capabilities can be extended by downloading various plug-ins that are designed for the tool. Protégé 2000 can be used to construct a domain ontology, to customize knowledge acquisition user interface and to enter domain knowledge [44].

 

Classes (or concepts) of the domain to be modelled are visualized in a taxonomic hierarchy in Protégé. It is possible to define the instances of the model, so that for each class associated instances can be created directly in the model. The instances automatically become related to their classes by instanceOf relationship. Slots in Protégé describe properties of classes and instances. Facets specify constraints on allowed slot values. Axioms and rules cannot be explicitly represented, extra plug-ins need to be downloaded for these purposes.

 

Protégé does not allow synchronous editing of an ontology by multi-users, yet it is possible to import and export ontologies in different formats such as text files, database tables and RDF files. Since OWL has become standard ontology language for the Web, Protégé supports the editing of OWL ontologies by an OWL plug-in. This can be separately downloaded and be integrated into the editor. Thus primitives of the OWL language become available for use in Protégé to produce OWL ontologies.

 

The reasoner RACER provides reasoning support for Protégé. This tool can be separately downloaded to on the local computer. When it is run, it checks for the consistency of the ontologies created by Protégé and infers the classification tree of the ontology based on the subclass-superclass relationships. Several mailing lists such as protégé-users, protégé-discussion, protégé-beta exist that are really active and that are helpful for the developers. The International Protégé Workshop brings together researchers developing or using Protégé development methodologies and tools every year.

 

2.6.2 OILEd

 

OilEd is a simple ontology editor that supports the construction of OIL-based ontologies [59]. Developers of OilEd admit that other ontology editors such as Protégé have influenced the design of the tool, whereby OilEd should put more emphasis on efficient reasoning support. OilEd is a freeware computer program that can be downloaded and installed on the local computer[20]. It is available at least on platforms such as Windows, Linux and Unix. Using OilEd, it is possible to create and edit ontologies, to check for the consistency and to infer the classification hierarchy of the ontology. Similar as in Protégé, it visualizes the classes of the domain in a taxonomic hierarchy and it allows the definition of instances in the model. Like in Protégé slots in OilEd describe properties of classes and instances. Facets impose constraints on allowed slot values.

 

Reasoning services for OilEd are currently provided by the FacT system, which is a Description Logics classifier. It tests the consistency of the ontology and infers the classification hierarchy w.r.t. subclass-superclass relationships. FacT reasoner, does not require a separate download as it comes with the editor itself. It can be installed on the local computer and can be connected to at some suitable timepoint to request the verification of the ontology at hand. The reasoner then checks for the consistency and infers the classification hierarchy.

 

OilEd does not provide support for working with multiple ontologies and it does not enable the migration and integration of ontologies. A mailing list called oiled-discussion also exists, however it is not as active as that of Protégé.

 

In this section, we have had a glimpse of two ontology editors Protégé from the Stanford University and OilEd from the University of Manchester. We have briefly discussed how they support the creation and editing of the ontologies for the Web and mentioned about their capabilities. We have provided the resources, where they can be found at.

 

2.7 Some Application Areas of Ontologies

 

So far we have considered what ontologies are, why we need ontologies, what kinds of ontologies exist and which languages and enviroments we can use to build ontologies. In this section we will inspect how the ontologies can be applied concretely. We will refer to three application areas of ontologies; first one concerns the Semantic Web Vision, second area is the natural language processing and the third area concerns multi-agent architectures.

 

2.7.1 The Semantic Web Vision

 

Lee, Lassila and Hendler define the Semantic Web vision as an extension to the current Web, in which information is given well-defined meaning, better enabling computers and people to work on [66]. As such, Semantic Web should be a place, where information can be better discovered, can be automatically processed, can be integrated and shared across various applications. The precondition for the Semantic Web is viewed as providing the documents on today’s Web with machine processable contents. In other words, Web documents should be furnished with information, whose context dependent meaning can be interpreted by software programs and applications.

 

According to Uschold’s argument in [68], today’s Web has a syntax that is defined through a huge a collection of HTML/XML mark-uped documents. It lacks meaning, commonsense, context and adaptability. Moreover, it requires human intervention. Uschold adds, however, that the Web is evolving from a place to find things to a place to do things such as the online-shopping activity.

 

There is a list of expectations from tomorrow’s Semantic Web. Accordingly, it should understand the meaning and user background, it should enable inter-operability between heterogeneous applications and it should provide a platform for intelligent web agents and adaptive web systems to operate on. Eventually, it should require less human intervention. As such the ultimate goal set for Semantic Web is that it should assist human users in their daily on-line activities by exhibiting a higher level intelligence.

 

Ontologies should facilitate Semantic Web in various ways [9]. They can assist Web searches and they can interpret the retrieved information. We will refer to these issues related to natural language processing in the next subsection. Finally, ontologies are thought to be used for establishing communication between agents on the Semantic Web. As such a Semantic Web agent is considered as a software program that works autonomously. An example is Carnegie Mellon University’s Retsina Calendar Web Agent[21]. It receives tasks and preferences from its user and sets out on the Web to find, to collect and to compare information to accomplish the tasks. It does this by communicating with other agents to profit from their information and their capabilities, where agents can reach a shared understanding among each other by exchanging their ontologies.

 

2.7.2 Natural Language Processing

 

Ontologies and natural language processing are referred together within the context of (Semantic) Web mostly to imply the quest for a more efficient Web search [9],[74]. Under more efficient Web search, the fulfilment of at least two requirements is understood. First one is that when we send out a query to the search engine, we want to retrieve the relevant documents as answers to our query and only the relevant documents. Second is that most desirably we would like to see only the relevant parts of the retrieved documents (instead of the entire document). In other words, we are only interested in the answers to our queries.

On today’s Web, however, search looks quite different. First requirement is fulfilled but it is not satisfactory. Second requirement is far from being fulfilled. For example, even if the main relevant pages are retrieved for our query another thousands or may be hundreds of thousands of documents are also retrieved. It also happens that the query returns only a few  relevant pages or none. Another common problem is that the query does not deliver the results we have expected because of different terminology. For example, the person who is looking for information about the jazz music group “Weather Report” may receive all kinds of pages containing weather forecasts, meteorological information, current weather conditions and so forth. The reason for these problems is associated with the insufficiency of keyword search because in most of the cases keywords in Web documents are considered to be too ambiguous to deliver relevant matches. Also, instead of the relevant answers, users retrieve whole documents as results of their queries, each of which needs to be gone through by the users to extract the information need [74].

 

Given this background, ontologies should help overcoming such Web problems by means of enabling the so-called semantic annotation. In [2] Kiryakov et al. define semantic annotation as “a specific metadata generation and usage schema, aiming to enable new information access methods and to extend the existing ones.” The idea behind semantic annotation is to assign the words, phrases or expressions present in the Web documents their semantic descriptions that interpret the meaning. More precisely, the so called semantic tags, whose meanings are precisely defined through concepts and relations in the ontology are to be attached to appropriate words, phrases or expressions in the Web documents to associate them with the meaning provided in the ontology. Hence, search engines can retrieve the Web documents according to the relevancy of this semantic mark-up. As the retrieval of the Web pages will be done on the basis of concepts instead of keywords, the problem of ambiguous keywords can be sidestepped.

 

2.7.3 Multiagent Architectures

 

In [65] Sycara defines multiagent architectures as “systems in which many intelligent agents interact with each other. The agents are considered to be autonomous entities, such as software programs or robots. Their interactions can be either cooperative or selfish. That is, the agents can share a common goal (e.g. an ant colony), or they can pursue their own interests (as in the free market economy).” In a nutshell, multiagent systems are systems that consist of a group of agents that work together to accomplish a common task such as executing a system.

 

DeLoach, DiLeo and Jacobs [38] state that ontologies in multiagent systems define the information domain of the system. The agents in the system interact with each other by passing messages. However, an agent can only make sense of a message if it has information about the context of the message, in other words if the domain of the message is specified. Thus, ontologies specify the domain for the multiagent system to enable successful communication and efficient interaction between the agents. DeLoach, DiLeo and Jacobs add that an ontology is essential to guarantee the re-usability of the multiagent system because it specifies the view of the system on a given domain. Others, who want to reuse the constructed multiagent system, need to ensure that the ontology of the system complies with that of the new system.

 

Concrete use of the ontology in the multiagent system can be observed in the messages of its agents. Messages of the agents include references in form of attributes to the objects of the domain. The ontology defines the types of these objects so that each agent can perform the necessary reasoning. For example, when the sender agent triggers a message to the receiver agent referring to an object, say ‘hole’, the receiver agent can consult the system ontology and infer that ‘hole’ is an obstacle and can perform the appropriate behaviour.

 

This section has summarized some application areas of ontologies among many others. We have first briefly explained the Semantic Web Vision, which is considered the as the third generation Web. We have discussed the role of the ontologies toward the Semantic Web. As a second application area, we have referred to the field of natural language processing and have mentioned that ontologies are used in this context mainly to improve Web search. Finally, we have observed the application of ontologies in a context other than the Web, namely in multiagent architectures. After having cited the definition of these architectures, we have pointed out that ontologies in these systems are deployed to define the context of the system and to aid agent communication.

 

2.8 Examples of Ontologies

 

Here we will present some examples of the ontologies that are currently in use and that can be (partially) accessed via the Web. First ontology is the Cyc commonsense ontology, which is known as the largest knowledge database in the world that formally describes commonsense knowledge. Second ontology is the WordNet ontology that provides lexical reference for the English language. The third and the last ontology, we will discuss is the EFGT Net, which is a resource for systematic representation and organization of so-called named entities. The former ontology is a commonsense ontology, whereby the latter two ontologies are linguistically motivated. For each ontology we will mention its purpose, its hierarchical structure, the classification criteria, the properties and relations, some of its top level categories and its applications if provided.

 

 2.8.1 CYC General Ontology of Common-Sense Knowledge

 Doug Lenat, the founder of Cycorp[22], has initiated the Cyc project in 1994 with the vision of creating the world’s first true artificial intelligence that has both common sense and the ability to reason with it [57]. More precisely, the purpose of the project is to make common-sense knowledge accessible and processable for computer programs. As such, Cyc system consists of a very large ontology (of nearly two hundred thousand terms and several dozen assertions about each term), an inference engine, a representation language, a natural language processing subsystem as well as some other components. New knowledge is entered to Cyc both by human knowledge providers and by the system itself as a product of its inference process. A nice overview of Cyc system is provided in [61].

 

The ontology of Cyc knowledge base is centralized around the categories, also called classes or collections that are organized in a generalization-specialization hierarchy. The structure of the hierarchy corresponds to a directed graph allowing one category to have several direct generalizations i.e. supercategories. Categories have instances, which represent their members and that are specified along the instanceOf relationship.

 

The top level category of Cyc is the category Thing. It is partitioned and further subcategorized by following the so-called distinctions approach. Thus, the top level category has three partitions; Individual Object vs. Collection, Represented Thing vs. Internal Machine Thing and Intangible vs. Tangible Object vs. Composite Tangible Intangible Object. Each further category of the ontology must belong to one and only one of these partitions. New categories can be defined by combining the existing ones. Accordingly, entries such as “The Eiffel Tower” or “Billy Holiday” are assigned to the category of Individual Object and entries such as “Places To See” or “Jazz Singer” are assigned to category of Collection. Internal Machine Thing is the category of everything that is about the internal Cyc system such as strings, numbers and so forth. Represented Thing is everything else. Intangible is anything that has no mass such as numbers, Tangible is anything that has mass and energy e.g an animal. Finally, Composite Tangible Intangible Object is something that has both a physical and an intangible extend such as a particular person who has both a body and a soul.

 

OpenCyc[23] is the open source version of the Cyc ontology, which enables free access to the upper levels of the ontology. It can either be accessed and be browsed online or it can be downloaded on the local computer.

 

2.8.2 WordNet Linguistic Ontology

 

The development of WordNet [74], which is known as an electronic lexical database, started in 1985 by the Cognitive Science Laboratory at Princeton University under the direction of Professor George A. Miller. WordNet can either be accessed and be browsed online, or it can be downloaded to use on the local computer freely. In the meantime, there are several systems that have integrated WordNet in their platforms to provide lexical reference support for their users e.g. the editor Protégé 2000. The OpenCyc is also linked to WordNet.

 

The objective of WordNet is twofold: first is to produce a combination of a dictionary and a thesaurus to facilitate more usability and second is to support automatic text analysis in artificial intelligence applications. The design of WordNet is inspired by current psycholinguistic theories of human lexical memory. It groups English nouns, verbs, and adjectives into sets of synonyms called synsets, and provides short definitions for them.

 

Synsets can also be considered as concepts in an ontology. For example, {living thing, organism}, {person, human being} and {plant, flora} are synsets that consist of nouns with similar meanings. If a word has more than one sense, it will show up in more than one synset. Synsets are related to each other through different semantic relations such as hyponymy, hyperonymy, meronymy, familiarity and so forth. Hyperonymy-hyponymy can also be seen as superclass-subclass relationship that organizes the synsets in a hierarchical order, whereby meronymy corresponds to the part-whole relationship.

As such WordNet is a taxonomy, so it does not have structured concepts or axioms that are typical for an ontology. In other words, concepts in the hierarchy of WordNet do not have any properties or attributes. For adjectives and verbs in WordNet there is another additional organization. Beside synonymy and familiarity relationships, synsets of adjectives are related to each other based on an antonymy relationship. For example, the synset of the adjective ‘dry’ is related to the synset of adjective ‘wet’ through the antonymy relationship. Synsets of verbs in WordNet are related to each other primarily by the entailment relationship. Hence, the verb ‘walk’ for example entails the verb ‘step’.

 

The project EuroWordNet[24] has produced WordNets for several European languages including Dutch, Italian, Spanish, German, French, Czech and Estonian and linked them together, however these are not freely available. The Global Wordnet project attempts to coordinate the production and linking of WordNets for all languages.

 

2.8.3 EFGT Net Resource for Representing Named Entities

 

EFGT Net is a project initiated by Prof. Klaus U. Schulz, Levin Brunner and Felix Weigel at the Computational Linguistics Department of the University of Munich. As such EFGT Net [62] is a linguistically motivated ontology that has the purpose of representing formal knowledge about named entities and organizing them in a systematic way. Named entities are phrases that contain the names of persons, organizations, locations, times and quantities [75]. “Mount Ararat”, “Ella Fitzgerald”, “The Victorian Era” are examples of such named entities.

 

With a systematic description and organization of named entities EFGT Net aims to provide support for semantic annotation, indexing, retrieval, querying of Web documents and text documents. EFGT Net is a free resource that is not yet publicly available and it covers three languages German, English and Bulgarian, whereas the primary language is German.

 

Named entities that appear in a document deliver a picture about the contents of the document and they usually simplify the understanding of the document. In order to be able to benefit from such information in automated document processing, the information embedded in named entities needs to be captured and explicated. Starting out with such motivation Schulz and Weigel describe in [62] a hierarchy for classifying named entities w.r.t. thematic-geographical-temporal relations.

 

The hierarchy of EFGT Net ontology is organized around so called fields or categories and individual entities. As such, a category refers to a set of entities, whereas an individual entity refers to one particular entity. There are four types of categories, which are the category of entities, the category of geographic areas, the category of temporal periods and the category of thematic field. There are three types of individual entities, which are the individual entity, the individual geographic areas and the individual temporal periods. Accordingly, Novelists would be an example of category of entities, “Emily Bronte” an example of individual entity, Centuries an example of category of temporal periods and “17th cc.” an example of an individual temporal period and so forth. Each entity and each category can be of one and only one of these types.

 

The structure of the hierarchy is a directed graph instead of a tree so that one category can have multiple parents. This way each named entity that belongs to a category can be found in the hierarchy starting out from different categories. For example [62], in order to reach the event “Olympic Games Munich 1972” we can start from the category Munich and move down to the subcategories History of Munich, Munich in the 1970s, Events in Munich in the 1970s, Sports Events in Munich in the 1970s eventually arriving at “Olympic Games Munich 1972”. Further paths leading to the event “Olympic Games Munich 1972” could start out from the category Sports or from the category the 1970s.

 

As seen, category combination is allowed in the EFGT Net ontology, thus new categories can be obtained by combining appropriate categories with each other in predefined ways. Both categories and individual entities have properties such as a main name that comes in all three languages, synonym, ID, parents and ancestors as well as children and descendants, URLs, explanation and so forth. The ID property furnishes each category and each individual entity with a unique identification number for organizational purposes. Parents, children, ancestors and descendants properties give information about subcategories and supercategories of the entities for navigational purposes, which can be followed up by the ID property. URLs property provides information about the related Web resources pointing out to the entities and explanation property provides textual data about each entity in natural language. Additionally, the entities in EFGT Net can be linked to the entities in other classification systems by means of various other properties.

 

Relations between the entities in EFGT Net include, besides the generalization-specialization and membership relationships, other relationships such as overlaps, is_capital_of, is_the_of_in, is_location_of and so forth.

 

The relationships can be of various arity between unary and quaternary. As such, these relations link categories and individual entities between and among each other. For example, overlaps relation link two geographical entities to each other such as in “Turkey” and “Europe” overlap. Other examples are “Washington D.C” is_capital_of “USA”, “San Francisco” is-location-of “Golden Gate Bridge” and “Princess Sylvia” is_the “Queen” of “Sweden” in “2004”.

 

Top level category of EFGT Net ontology includes subcategories such as World, Politics, Finances, Sports, Organizations, Events and so forth, each of which include further subcategories. Categories are determined based on encyclopaedic criteria and positioned in the hierarchy based on the combination of two approaches; the analytical approach and the relevance based approach.

 

According to the former approach, categories that have the same analytical status are introduced at the same branching level and depth. According to the latter, categories that are considered as more important are given precedence and defined in the higher levels of the hierarchy. For example, when classifying jazz music in Brazil two entities “Stan Getz” and “Antonio Carlos Jobim” are entered at the same level and depth to the hierarchy due to the analytical ordering because they are both jazz musicians.

 

Based on relevance ordering however, “Antonio Carlos Jobim” would have a higher position because he is a Brazilian jazz musician. Knowledge entry into EFGT Net is currently done by human knowledge providers but the authors point out to the need for semi-automatic methods.

 

Altough other resources exist that have similar motivation such as WordNet, EuroWordNet or the Getty Thesaurus of Geographic Names[25], EFGT Net constitutes at least one distinction by concentrating on named entities. Additionally, most of these resources seem to focus on particular domains such as only geography or only linguistics, whereas EFGT Net aims at formalizing a wider spectrum of encyclopaedic knowledge. Moreover, work is being carried on to support EFGT Net with an inference engine that allows the automatic derivation of new knowledge from the EFGT Net knowledge base.

 

In this section we have referred to three examples of ontologies that are currently in use. For each ontology we have stated the motivation, we have discussed the classification hierarchy, the classification criteria, categories, properties, relations and individuals present in the ontologies. The three ontologies we have presented have different characteristics that distinguish them from each other. Although WordNet and EFGT Net are both linguistically motivated ontologies, they have different subject matters. WordNet is an ontology that has the function of a lexicon and a thesaurus, thus it is focused on words, phrases, collocations and their meanings.

 

EFGT Net on the other hand is focused on named entities that deliver encyclopaedic information. Cyc ontology is a part of the Cyc system that has the purpose of capturing and formalizing real world knowledge or commonsense knowledge for the use of computers.

 

2.9 Discussion

 

Ontology, having wandered from the field of philosophy to the fields of computer science as diverse as AI, Natural Language Processing, Database Systems and Multiagent Systems, is a popular research topic of recent times. As we have seen, there is a variety of definitions for the term ontology and there are different kinds of ontologies, which can be represented using a number of different languages. Ontology development process is in the mean time being considered as an engineering process, therefore methodologies and development tools exist to aid the process. Many ontologies are currently being used in various application areas.

 

 

Although there are many advocates of ontology in computer science [64],[68],[9],[9],[34], who see it as the precondition for knowledge sharing, for knowledge re-use and for the future success of the (Semantic Web), there are also critics of ontology [25],[28]. According to the view of the critics, ontologies do not suffice to guarantee knowledge sharing and re-use since these two practices would require also the sharing of inferences and rules and not only the sharing and re-use of declarative knowledge [25]. They believe this issue has not been taken into account by today’s ontologists and ontologies. Other critical voices point out to the fact that a vast amount of ontologies exits, which use various ontology languages and which have different conceptualizations about the same domains, as a result of which there is still shortfall in common understanding and unity.

 

This difficulty has also been recognized by the ontology advocates themselves. That is why, the state of the art in ontological engineering is now focused on the so-called ontology integration. Ontology integration is a general term that is used to refer to several activities, such as ontology combining, merging, aligning, mapping, translating, transforming and so forth. More concretely, ontology integration corresponds to the process of finding the places in the ontologies, where they overlap, the process of linking the concepts that have related meanings with an equivalency relation and subclass-superclass relation and finally to the process of verifying the consistency of the outcome. A detailed discussion of numerous approaches to ontology integration is provided in [58]. All in all, ontology integration has the goal of bringing the related ontologies together to provide a unified view of a given domain and to facilitate the re-use of the existing ontologies.

 

On another level, ontologies as a classification device have been discussed in relation to controlled vocabularies, to taxonomies and to thesauri [76],[77].

 

A controlled vocabulary is a list of unambiguous terms that have been enumerated and defined explicitly. This list is controlled by a relevant registration authority. There are at least two requirements for a collection of list of terms to qualify as a controlled vocabulary. First, if the same term is commonly used to mean different concepts in different contexts, then its name needs to be explicitly stated to remove this ambiguity. Second, if multiple terms are used to mean the same thing, then one of the terms needs to be identified as the preferred term in the controlled vocabulary.

 

A taxonomy is a collection of controlled vocabulary terms organized into a hierarchical structure. Each term in a taxonomy is in one or more subclass-superclass relationships to other terms in the taxonomy. Subclass-superclass relationships can be of different types such as whole-part, type-instance relationships.

 

Thesauri take taxonomies and build upon them by allowing the definition of other kinds of relationships on top of subclass-superclass relationships. As described in ISO2788 standard, thesauri can be monolingual or as described in ISO5964, they can be multilingual. Some examples of the most important relationships in thesauri, which exist with the subclass-superclass relationships synchronously, are (according to ISO2788 standard) synonymy, broader term (i.e. the term higher in the hierarchy), use (i.e. a term, which is to be preferred over the current one), top term (i.e. the topmost ancestor of the current term), related term (i.e. not a synonym, not a broader or a narrower term of the current term but still related to it) and so forth. Thesauri are not necessarily controlled by a relevant registration authority.

 

A formal ontology is a taxonomy plus a collection of types i.e concepts, properties, relationships, instances and assertions about a domain of interest. It is expressed in an ontology representation language. Among all, ontology has the highest expressivity.

 

Finally, controlled vocabularies, taxonomies, thesauri and ontologies have the following aspects in common; they structure, classify, model, and represent the concepts and relationships about some subject matter of interest to some community. Additionally, they are intended to provide consensus about the subject matter to a community.

 

In conclusion, we can state that ontological engineering has been a popular research field during the recent years not only within the context of AI but also within the context of the Web. Especially, with regard to the vision of Semantic Web there are serious expectations from ontologies [66],[68],[9] as they promise a machine understandable description of knowledge and consensus about the knowledge. Nevertheless, there are several difficulties that the current state of the art of ontological engineering needs to overcome. Ontology construction is a time consuming and a high cost process. Therefore, (semi)automatic methods for ontology construction are being investigated [9] but these are far from being mature. Furthermore, currently there are considerably large numbers of ontologies in the Web that are in different languages and that have different conceptual structures. Therefore, ontology sharing and re-use seems to remain as a challenging task as long as successful integration of ontologies is not possible. However, due to current state in research, we still predict that ontologies’ popularity will be increasing rather than decreasing in the near future.

 

BACK TO MAIN PAGE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[1]In [49] Guarino and Giardetta distinguish between the term “Ontology” (with the capital ‘O’) and “ontology” (with the lowercase ‘o’). Accordingly, the former refers to the ontology in its philosophical sense and the latter to refers to its understanding and its use in AI. Henceforth, we will adapt the same convention and use “Ontology ” (with the capital ‘O’) only when we refer to the philosophical ontology.

[2] http://www.ksl.stanford.edu/software/ontolingua

[3] http://wonderweb.semanticweb.org/index.shtml

[4] http://www.semwebcentral.org/

[5] http://www.daml.org/ontologies/

[6] http://dublincore.org/

[7] http://ontology.teknowledge.com:8080/rsigma/arch.html

[8] http://www.cyc.com/cyc-2-1/cover.html

[9] http://www.cogsci.princeton.edu/~wn

[10] http://mozart.isi.edu:8003/sensus2/

[11] http://www-ksl.stanford.edu/knowledge-sharing/papers/engmath.html

[12] http://www.aiai.ed.ac.uk/project/enterprise/enterprise/ontology.html

[13] http://www.fipa.org/

[14] http://www.w3.org/2002/07/owl

[15] http://www.cyc.com/doc/handbook/oe/02-the-syntax-of-cycl.html

[16]http://www.w3c.org/TR/owl-guide/

[17] http://www.sts.tu-harburg.de/~r.f.moeller/racer/

[18] http://www.cs.man.ac.uk/~horrocks/FaCT/

[19] http://protege.stanford.edu

[20] http://oiled.man.ac.uk/

[21] http://www.daml.ri.cmu.edu/Cal/

[22] http://www.cyc.com/

[23] http://www.opencyc.org

[24] http://www.hum.uva.nl/~ewn/

[25] http://www.getty.edu/research/conducting_research/vocabularies/tgn/