Foundations of Knowledge Systems

with Applications to Databases and Agents

Gerd Wagner

Kluwer Academic Publishers, Boston/Dordrecht/London

Preface	xvii
Introduction	xxiii
I.1 The Success of Database Systems I.2 The Difficulties of `Expert Systems' I.3 What is a Knowledge System ? I.4 Knowledge Systems and Logic I.5 About this Book
Part I Tables and Objects
1. Conceptual Modeling of Knowledge Bases	3
1.1 Introduction	3
1.2 Predicate Logic	4
1.3 Entity-Relationship Modeling	5
1.4 Identifying Qualified Predicates	11
1.5 Identifying Incomplete Predicates	11
1.6 Identifying Intensional Predicates	13
1.7 Entity-Relationship Modeling in 7 Steps	13
1.8 Agent-Object-Relationship Modeling	15
1.9 Summary	17
1.10 Further Reading	17
1.11 Exercises	17
2. Relational Databases	19
2.1 Introduction	20
2.2 Basic Concepts	21
2.3 Query Answering	27
2.4 Conjunctive Queries	30
2.5 General Queries	36
2.6 Integrity Constraints	44
2.7 Transforming an ER Model into a Relational Database Schema	47
2.8 Information Order and Relative Information Distance	48
2.9 Updating Relational Databases	49
2.10 Incomplete Information in Relational Databases	55
2.11 A Relational Database without Nulls Represents a Herbrand Interpretation	59
2.12 Reiter's Database Completion Theory	61
2.13 Database Inference as Preferential Entailment Based on Minimal Models	62
2.14 Deficiencies of the Relational Database Model	63
2.15 Summary	66
2.16 Further Reading	67
2.17 Exercises	67
3. Object-Relational Databases	71
3.1 ADT-Relational Databases	73
3.2 Introducing Objects and Classes	77
3.3 Further Reading	82
3.4 Exercises	82

Part II Adding Rules
4. Reaction Rules	87
4.1 Introduction	88
4.2 Communication Events	91
4.3 Epistemic and Communicative Reactions	93
4.4 Operational Semantics of Reaction Rules	96
4.5 Multidatabases	97
4.6 Summary	100
4.7 Further Reading	100
4.8 Exercises	100
5. Deduction Rules	103
5.1 Introduction	103
5.2 Closure Semantics	109
5.3 Model-Theoretic Semantics	119
5.4 Summary	121
5.5 Further Reading	121
5.6 Exercises	122

Part III Positive Knowledge Systems: Concepts, Properties and Examples
6. Principles of Positive Knowledge Systems	127
6.1 Introduction	127
6.2 Basic Concepts	128
6.3 Formal Properties of Knowledge Systems	132
6.4 Vivid Knowledge Systems	135
6.5 Knowledge Update and Knowledge Integration	135
6.6 Summary	136
7. Temporal Databases	139
7.1 Introduction	139
7.2 The Timestamp Algebra	140
7.3 Temporal Tables	141
7.4 Natural Inference in Temporal Databases	142
7.5 Updates and Information Order	143
7.6 Indexical Temporal Null Values	143
7.7 Temporal Table Operations	145
7.8 Temporal Deduction Rules	147
7.9 Bitemporal Databases	148
7.10 Further Reading	148
7.11 Summary	148
7.12 Exercises	149
8. Fuzzy Databases	151
8.1 Introduction	151
8.2 Certainty Scales	153
8.3 Fuzzy Tables	154
8.4 Natural Inference in Fuzzy Databases	155
8.5 Update, Knowledge Integration and Information Order	156
8.6 Fuzzy Table Operations	157
8.7 Fuzzy Deduction Rules	159
8.8 Further Reading	161
8.9 Summary	161
9. Further Examples of Positive Knowledge Systems	165
9.1 Multi-Level Secure Databases	165
9.2 Lineage Databases	169
9.3 Disjunctive Databases	171
9.4 S5-Epistemic Databases	173

Part IV Admitting Negative Information: From Tables to Bitables
10. Principles of Non-Positive Knowledge Systems	179
10.1 Introduction	179
10.2 Basic Concepts	180
10.3 Standard Logics as Knowledge Systems	183
11. Relational Factbases	187
11.1 Introduction	187
11.2 Birelational Databases	188
11.3 Relational Factbases	191
12. Possibilistic Databases	195
12.1 Introduction	195
12.2 Possibilistic Tables	196
12.3 Natural Inference in Possibilistic Databases	197
12.4 Updates and Information Order	199
12.5 Knowledge Integration	200
12.6 Possibilistic Deductive Databases	200
12.7 Possibilistic Deduction Rules and EMYCIN	202
13. Further Examples of Non-Positive Knowledge Systems	205
13.1 Multi-Level Secure Factbases	205
13.2 Lineage Factbases	207
13.3 Disjunctive Factbases	209

Part V More on Reaction and Deduction Rules
14. Communication and Cooperation	219
14.1 Multi-Knowledge Bases	219
14.2 Distributed Updating and Query Answering	224
14.3 Cooperative Knowledge Bases	229
15. Deductive Knowledge Systems	237
15.1 Introduction	237
15.2 Deductive Knowledge Bases	239
15.3 The Immediate Consequence Operator	245
15.4 Operational versus Constraint Semantics of Rules	247
15.5 Ampliative DKBs	247
15.6 Non-Ampliative DKBs	252
16. Advanced Knowledge and Reasoning Services	255
16.1 Explanations	256
16.2 Diagnoses	257
16.3 Automated Planning	257

Appendices
A Partial Logics with Two Kinds of Negation	263
B Compositional Possibilistic Logic	270
C On the Logic of Temporally Qualified Information	280

Preface

One of the main uses of computer systems is the management of large amounts of symbolic information representing the state of some application domain such as information about all the people I communicate with in my personal address database, or about relevant parts of the outer space in the knowledge base of a NASA space mission. While database management systems offer only the basic services of information manipulation and retrieval, more powerful knowledge systems offer in addition advanced services such as deductive and abductive reasoning for the purpose of finding explanations and diagnoses, or generating plans.

In order to design and understand database and knowledge-based applications it is important to build upon well-established conceptual and mathematical foundations. What are the principles behind database and knowledge systems ? What are their major components ? Which are the important cases of knowledge systems ? What are their limitations ? Addressing these questions, and discussing the fundamental issues of information update, knowledge assimilation, integrity maintenance, and inference-based query answering, is the purpose of this book.

Databases, Logic and SQL

Since relational databases are the most fundamental case of a knowledge system, they are treated in-depth in connection with the standard database language SQL. The chapter on the foundations of relational databases is also both an introduction to mathematical logic, and a course in SQL programming. It is supplemented with an additional chapter on object-relational databases, a new development that dominates the current progress in database technology and SQL.

Various Kinds of Information

Depending on the type of application domain, we have to be able to deal with various kinds of information. The simplest case are business and administrative domains where the information to be processed is complete, and it is assumed that all information sources are perfect (absolutely reliable, honest and competent). But in more empirical domains, such as in outer space, even if all information sources are perfect, there will be various forms of incomplete information, e.g. involving partial predicates, and disjunctively imprecise or gradually uncertain information. In addition to incompleteness, information items may be asserted relative to some time spans in which they are valid and believed, and they may be qualified in a number of ways. In particular, different pieces of information may be assigned different levels of security, and different degrees of reliability.

The book presents a comprehensive framework for the conceptual integration and uniform treatment of all these types of information. This includes the definition of generic knowledge system models capable to handle fuzzy, temporal, confidential, and unreliable information.

Various Types of Rules

Advanced knowledge services (such as deductive query answering, reactive input processing, and the abductive generation of explanations, diagnoses and plans) are based on rules. The book presents an in-depth treatment of two kinds of rules: deduction rules and reaction rules.

In addition to information about the current and possible state of affairs in the form of facts and integrity constraints, a knowledge base may also represent terminological and heuristic knowledge in the form of deduction rules. This type of rule is well-known from deductive databases and logic programs. It is shown how the concept of deduction rules can be applied to the various kinds of information that have to be processed in practical knowledge systems, leading to such powerful concepts like temporal and fuzzy deduction rules. Typically, the semantics of these rules involves nonmonotonic inference on the basis of the Closed-World Assumption.

Reaction rules can be used to specify the reactive behavior of a knowledge base when it receives messages, either from the external world, or from other nodes of a network it participates in. This concerns, in particular, the communication and cooperation taking place in multidatabases and between cooperative knowledge bases.

How to Use this Book

Foundations of Databases and Knowledge Systems covers both basic and advanced topics. It may be used as the textbook of a course offering a broad introduction to databases and knowledge bases, or it may be used as an additional textbook in a course on databases or Artificial Intelligence. Professionals and researchers interested in learning about new developments will benefit from the encyclopedic character of the book providing organized access to many advanced concepts in the theory of databases and knowledge bases.

Gerd Wagner

Berlin, April 1998

Introduction

The Success of Database Systems

In the sixties and seventies, pushed by the need to store and process large data collections, powerful database software based on the file system technology available at that time has been developed. These types of systems have been named "hierarchical" and "network" databases, referring to the respective type of file organization. Although these systems were able to process large amounts of data efficiently, their limitations in terms of flexibility and ease of use were severe. Those difficulties were caused by the unnatural character of the conceptual user-interface of hierarchical and network databases consisting of the rather low-level data access operations that were dictated by their way of implementing storage and retrieval. Thus, both database models have later on turned out to be cognitively inadequate.

Only through the formal conceptualization of relational databases by Codd in the early seventies it became possible to overcome the inadequacy of the database technology prevailing at that time. Only the logic-based formal concepts of the relational database model have led to more cognitive adequacy, and have thus constituted the conceptual basis for further progress (towards object-relational, temporal, deductive, etc. databases). Unlike much of theoretical computer science, Codd's theory of relational databases is an example of a practical theory.

The Difficulties of `Expert Systems'

In the field of expert systems, on the other hand, there has never been a conceptual breakthrough like that of the relational database model. It seems that the field is still at the pre-scientific stage, and there seems to be almost no measurable progress since the classical prototype MYCIN. There is neither a formal definition of an `expert system', nor is there any clear semantics of rules. The field has rather developed a variety of notions (such as `production rules,' or `certainty factors') which have often not been clearly defined, or have not been based on a logical semantics. Lacking a clear (and formally defined) conceptual framework, it is difficult for the expert system community to compare different systems and to identify shortcomings and measure progress.

One may argue that these problems are due to the inherent difficulties of knowledge representation and processing, but it rather seems that the expert system community has failed to establish scientific foundations and has also failed to learn from its neighbor fields of logic programming and deductive databases which have successfully developed a formal concept of rules based on a logical semantics.

What is a Knowledge System ?

Knowledge representation and reasoning systems, or shorter: knowledge systems, are based on two fundamental operations:

New incoming pieces of information must be assimilated into a knowledge base by means of an update operation. This includes simple insertion of new information, deletion (retraction) of old information, and possibly a number of more involved change operations for assimilating new pieces of information which are in conflict with some already-stored pieces. The update operation should satisfy some principle of minimal mutilation.

Queries posed to the knowledge base must be answered by means of an inference-based query answering operation. The inference procedure should be complete with respect to some well-understood logic, and sound with respect to a preferential entailment relation in that logic.

In general, there are no specific restrictions on the internal structure of a knowledge base. It appears, however, that a computational design can be achieved by `compiling' incoming information into some normal form rather than leaving it in the form of complex formulas.

The concept of a knowledge system constitutes a useful framework for the classification and comparison of various computational systems and formalisms like, e.g., relational, temporal and deductive databases. It is more general than that of a logic (i.e. a consequence relation). A standard logic can be viewed as a special kind of knowledge system. On the other hand, by suitably defining the inference and update operations, knowledge systems can serve as the basis for the operational definition of logics.

Knowledge Systems and Logic

The relationship between knowledge systems and logic is only obvious for the inference operation: while in a logical system theorems are proved from (or entailed by) a set of axioms (or 'theory') by means of the underlying consequence relation, queries in a knowledge system are answered relative to a given knowledge base by means of the underlying inference operation. It turns out that knowledge-based inference is nonmonotonic: query answering is not
based on all models of a knowledge base but solely on the set of all intended models. For instance, in the case of relational databases, which can be viewed as the most fundamental knowledge system, the intended models are the minimal ones. Also, unlike a logical theory, a knowledge base is not just a set of well-formed formulas. Rather, it is subject to specific restrictions (such as, e.g., allowing only atomic sentences as in relational databases), or more generally, consisting of facts, rules and integrity constraints. While facts have the form of certain sentences of some suitably restricted language, rules do, in general, not correspond to implicational formulas, but rather to meta-logical expressions requiring a specific semantics. Integrity constraints are sentences which have to be satisfied in all evolving states of a knowledge base. They stipulate meaningful domain-specific restrictions on the class of admissible knowledge bases. In addition to information about the current and possible state of affairs (in the form of facts and integrity constraints), and terminological and heuristic knowledge (in the form of deduction rules), a knowledge base may also represent action knowledge (e.g., in the form of reaction rules), and various types of meta-knowledge.

Knowledge systems and the deficiencies of logic
Knowledge Systems	Logical Systems
facts and deduction rules	axioms
knowledge base	theory
nonmonotonic inference	monotonic consequence relation
confirmed if-queries	theorems
open queries	formulas with free variables
answers	?
update operation	?
integrity constraints	?
reaction rules	?

Logic has provided a model of theory-based reasoning by means of consequence relations, but it has not been concerned with the assimilation of new sentences into a changing theory. For a theory of knowledge systems, however, we need a model of both knowledge-based inference and of knowledge assimilation.

About this Book

Traditionally, books on `expert systems' and knowledge-based systems have been composed of a loose collection of concepts and techniques from Artificial Intelligence and related fields, such as pattern matching, search methods, `rule-based systems', models of temporal, spatial and uncertain reasoning, and their application in diagnosis, design and control. A great number of prototype systems has been described in the literature, but there is hardly any generic concept of knowledge-based systems. On the other hand, knowledge representation research in Artificial Intelligence has proceeded in a very theoretical manner and has been occupied to a great extent with solving ``anomalies within formal systems which are never used for any practical task'' (Brooks 1991). Many of the formalisms proposed make, or follow, conceptual and ontological stipulations which are not grounded in the practice of information and knowledge processing.

The theory of knowledge systems presented in this book should be construed as an attempt of a practical theory which aims at establishing solid conceptual foundations and generic models to be used in the software engineering of knowledge and information systems. It is my firm belief, that knowledge systems have to evolve from, and have to be integrated with, database and information system technology. The starting points in this evolution are the entity-relationship modeling method and the system of relational databases which are presented in Chapter 1 and 2. These and the following chapters on object-relational databases, reaction rules, deduction rules, temporal databases and fuzzy databases are intended for class room use in computer science at the graduate levels. More advanced topics, such as representing negative information using bitables, or elements of a general theory of deduction rules, are covered in the remaining chapters.

The main technical contributions of this book are:

a formal model of object-relational databases;
a general account of reaction rules in connection with the concept of knowledge- and perception-based agents, originally proposed in (Wagner 1996);
a new semantics of deduction rules that is based on stable generated closures and derived from the notion of stable generated models of (Herre and Wagner 1997);
a new generalization of database tables, called bitables, for representing negative information in connection with incomplete predicates (the logical ideas of this generalization have already been presented in (Wagner 1991);
a new model of fuzzy and possibilistic databases based on a new compositional definition of possibilistic logic;
a new definition of inference-based query answering in multi-level secure (MLS) databases; and
a new model of lineage databases for tracking the lineage and reliability of information items.

The integration and uniform treatment of all kinds of databases and knowledge bases in a rigorous conceptual framework appears to be the greatest achievement of the present work. It allows to identify generic concepts such as deduction and reaction rules, and to define them in a way that is independent of the particular choice of knowledge system.

The knowledge system concepts proposed in this book have obvious applications in advanced database and in agent-based systems. An agent system needs to include a knowledge system as one of its main components. In particular, the knowledge systems of MLS factbases for protecting confidential information, and of lineage factbases for tracking and weighting the reliability of different information sources, may be relevant to agent systems. Even more important is the concept of reaction rules that allows the declarative specification of the reactive behavior of agents. This book is for those who want to design knowledge-based agents, and it is for those who want to develop innovative database systems. It can be used both by computer science students and by professionals in computer science and other fields with some background in logic to obtain organized access to many advanced concepts in the theory of databases and knowledge representation.

Those sections that are primarily of interest to logic-oriented readers are indicated with "pure logic" next to the section head. Additional non-classical logic background, not required for understanding the book, is provided in the appendix as an option for the logic-oriented reader to gain deeper insights into the connections between knowledge systems and logic.

This is probably the first book to cover such a broad range of diverse database and knowledge systems. Necessarily, not all relevant questions are addressed and the discussion of certain advanced topics may be incomplete, biased, or even erroneous. As a human with limited resources, I may have overlooked important issues and ignored important related work I should have mentioned. I apologize for this and promise to do my best to provide necessary corrections and supplements on the book's Web page which is http://www.informatik.uni-leipzig.de/~gwagner/ks.html

Notice that this Web page may also contain solutions to exercises and other supplementary material which could not be included in the book.

1 Conceptual Modeling of Knowledge Bases

The task of a knowledge base is to represent a relevant body of knowledge about a specific application domain in order to support knowledge-based application programs and to be able to answer ad-hoc queries about the state of the application domain. It is essential to develop an adequate conceptual model of the application domain as the basis for designing a knowledge base. This involves the formalization of intuitions and of informal descriptions. We discuss the most important methodology for conceptual modeling: entity-relationship modeling which may be viewed as an extension of predicate logic. The emphasis of this chapter is on identifying the different types of predicates needed for capturing more domain knowledge: normal extensional, qualified, incomplete and intensional predicates.

1.1 Introduction

For designing a specific knowledge base, the first activity is to analyze the given application domain, and to develop a conceptual model of it. From this model, which may be established at different levels of abstraction, the knowledge base schema is derived. The schema of a knowledge base defines its type and structure, and the domain-specific vocabulary that is used to represent knowledge and to store and retrieve particular pieces of information in the knowledge base. Conceptual modeling methods (or meta-models) are based on suitable abstractions of the data and processes application domains consist of. Traditional methods, such as predicate logic and entity-relationship modeling, have been exclusively focused on static representations of a domain, that is, they have been only concerned with its state. More recently, the need for integrating both static and dynamic aspects, i.e. both state and behavior, in a meta-model has become obvious. This is documented by the search for an integrated object-oriented modeling method both for general software engineering and for information systems. Although large research communities are occupied with this problem, and the recent proposal of a Unified Modeling Language (UML), a combination of several previously proposed methods, has received great attention in software engineering, there is still no real breakthrough, and it seems that the right abstractions have not been found yet. Apparently, the principles and concepts of object-orientation are not sufficient for capturing all relevant static and dynamic aspects of information systems. Rather, the abstractions sought for may be found in the research areas of multi-agent systems and agent-oriented programming.

1.2 Predicate Logic

Conceptual modeling has been an issue in philosophy long before computers and computerized information systems have been invented. In philosophy of science and in logic, conceptual models of scientific domains have been a major concern. The main result of these principled philosophical investigations was the conceptual meta-model of predicate logic, as defined by Frege and Tarski.

In predicate logic, all basic entities (or individuals) of a given universe of discourse are conceived of as forming a flat set without any further classification. Normally, but not necessarily, they have names called constant symbols (such as 1,2,3, ..., I, II, III, ..., Venus, 007, `James Bond', etc.). An entity, in predicate logic, may have zero, one or more names. For example, `2', `II' and `two' denote the same natural number. Likewise, `007' and `James Bond' denote the same person. Properties of, and relationships among, entities are expressed as sentences (or assertions) with the help of predicate symbols. The meaning of a predicate, in standard predicate logic, is given by its extension, i.e. the set of all tuples to which the predicate applies (forming a relation in the sense of set theory).

Notice that in the meta-model of predicate logic, there is no distinction between properties and relationships, both are expressed by predicates. On the other hand, in first order predicate logic, there is a strict distinction between basic entities (denoted by constants), functions and predicates while in natural discourse, both functions and predicates are treated as entities as well. These simplifications made by the predicate calculus do not pose any problem in the domain of mathematics (in fact, they proved to be very fertile). But in other, more empirical, application domains they are cognitively inadequate. For such domains, standard predicate logic is conceptually too flat, i.e. it does not offer sufficient support for those concepts needed to capture the essential semantics of empirical domains.

Therefore, the research discipline of conceptual modeling in computer science had to come up with new meta-models for being able to capture those parts of domain semantics that are relevant for database and knowledge system applications. The most fundamental of these meta-models is the Entity-Relationship Modeling method proposed by Chen (1976).

1.3 Entity-Relationship Modeling

In entity-relationship (ER) modeling, basic entities are conceptualized in conjunction with their properties and attributes which are (as opposed to predicate logic) strictly distinguished from relationships between entities. It is, however, straightforward to translate an ER model into classical predicate logic: both entity and relationship types correspond to predicates, and values from attribute domains correspond to constants. In fact, this translation is performed whenever an ER model is implemented as a relational database. In the current practice of database application development, this methodology (of first constructing an ER model and then implementing it with a relational database system) is state of the art. Notice, however, that the ER meta-model is more general than the relational database model. Neither does it require elementary attributes nor primary keys. It is therefore also compatible with the object-relational database model (see Chapter 3) where object-IDs replace primary keys, and attributes are no longer required to be elementary but may have complex values or may be derived by method invocation. While the original proposal of Chen (1976), is restricted to the more basic ER constructs that can be easily implemented in relational databases, we enrich the standard ER meta-model in such a way that it can be used for the design of more general databases and knowledge bases (including the important case of object-relational databases).

In addition to defining a semi-formal terminology, the ER modeling method assigns visual representations to its concepts. The resulting ER diagrams are a convenient and concise way of expressing an ER model.

Entities and Entity Types

An entity is something that occurs in our universe of discourse (i.e. it may be the value of a quantified variable) and that can be uniquely identified and distinguished from other entities. Notice that this characterization refers to our universe of discourse and not to the `real world'. Although it is commonly assumed that there is a close correspondence between the two, and in the database literature the universe of discourse (or even the content of a database) is often naively identified with `the real world', we prefer to leave this issue to philosophy, and try to avoid any unnecessary reference to such metaphysical notions like `the real world'.

Obviously, in an information processing system such as a database or a knowledge system, one does not deal with the entity itself (in the ontological sense) but rather with some representation of it. For linguistic simplicity, however, the term `entity', in ER modeling, is used in its epistemic (representational) sense. Examples of entities are the natural numbers 1,2,3,..., Leo (our dog), James Bond (alias 007), Berlin, the planet Venus, my Alfa Romeo automobile, the 200th birthday of Schubert, the letter `K', or the disease hepatitis A.

Entities of the same type share a number of attributes and attribute functions representing their properties or characteristics. An attribute associates with each entity a stored value from its domain. An attribute function associates with each entity a value to be computed by invoking the function. Together, the values of all attributes of an entity form the state of it. There may be different entities having the same state. In that case, one needs an appropriate naming mechanism for identifying and distinguishing entities independently of their state.

A standard name is a unique identifier assigned to an entity during its entire life cycle. If a primary key is defined for an entity type, it may be used as a standard name for entities of that type. An example of a primary key is the social security number assigned to employees. We assume, however, that entities can be identified independently of the existence of a primary key. How this is achieved need not be defined in the ER meta-model. Whenever an ER model is implemented in some practical knowledge system, there has to be a method to assign standard names to entities such as, e.g., primary keys in relational databases, or object IDs in object-relational databases. In contrast to predicate logic, in the ER meta-model and in database theory, it is commonly assumed that all entities have (standard) names.

There may be certain restrictions on the admissible values for attributes, called integrity constraints. A fundamental type of an integrity constraint is a key dependency, that is a minimal set of attributes whose values identify an entity. A key is a constraint since it implies that there must be no pair of entities of the same type having the same values for their key attributes. One of these key dependencies may be designated as the primary key of the entity type. Natural primary keys, such as social security numbers or flight numbers, are often used to identify entities and to access them in an information system. Sometimes there is no natural primary key, or even no key at all, in the definition of an entity type. It is a matter of implementation, and not of conceptual modeling, how to deal with this situation. In a relational database, an artificial primary key attribute has to be added to maintain entity identity while in an object-relational database an immutable object ID is automatically assigned to an entity on its creation.

In summary, an entity type is defined by a name, a list of attributes with associated domains, a list of attribute functions with associated signatures and implementations, and a possibly empty set of integrity constraints including key dependencies. Notice that an entity type, unlike the set-theoretic notion of a data type, is not a purely extensional concept. There may be two different entity types, say Lecturers and Researchers, with exactly the same list of attributes (and attribute functions) and possibly with the same extension. In contrast, two data type definitions, say ULONG and DWORD, having the same extension are equal, by the axioms of set theory.

At any moment, an entity type defined for a particular knowledge base is associated with its current extension consisting of all entities of that type which are currently represented in the knowledge base. An entity type is like a container, and the associated extension is its content. Thus, an entity type corresponds to a table in a natural way: the columns of the table correspond to the attributes, a populated table contains the current extension of the entity type, and a table row, forming an attribute value tuple, corresponds to a specific entity.

Entity types are visually represented in the form of a labeled box which may be partitioned into a head and a body part, if the attribute list is to be included in the diagram. In that case, the name of the entity type is written in the head of the box, and the attributes are listed in its body.

The domain of an attribute is either an intensionally or an extensionally defined set of possible values. Typically, an intensional domain is defined by a formal grammar, while an extensional domain is defined by an explicit set of values. Examples of intensional domains are the integers, strings with at most 20 characters, or JPEG pictures. Examples of extensional domains are the names of foreign currencies or the names of colors. An extensional domain of an attribute of some entity type may be defined by the extension of another (single attribute) entity type.

Notice that different entities may belong to different epistemic categories. There are

concrete physical objects comprising passive objects (such as the Venus or my Alfa Romeo), and physically embodied agents (such as the Star Wars robot R2, our dog Leo, or James Bond),
software objects (such as the word processor window I am currently writing in)
software agents (such as the Microsoft Office97 assistant `Clippit')
events (such as the 200th birthday of Schubert, or the mouse click to the beginning of this line),
locations (such as Berlin),
analytical concepts (such as the natural number 1),
empirical concepts (such as the disease hepatitis A),
symbols (such as the letter `K').

The traditional ER modeling method has not been using any such classification of entity types because it did not seem to be relevant in enterprise modeling for business applications, which is the classical application domain of ER modeling. In other domains, however, some of the above distinctions may be essential for capturing more domain semantics. But even in business domains, the distinction between objects and agents allows to capture more semantics concerning such important concepts like business transactions and roles.

Relationships and Relationship Types

A relationship among entity types is called a relationship type. For example, attends is a relationship type used to name the particular relationship that holds between a student S and a course C if S attends C.

A relationship between two entities is symbolically represented as an ordered pair labeled with the name of the relationship type. For instance, the expression attends(Kim, Logic1)

represents a relationship of type attends between the entity `Kim' of type Student and the entity `Logic 1' of type Course. Mostly, relationships are binary, like attends, but in some cases three or more entities are related which is represented by corresponding n-tuples.

There are two special relationships between entity types which are independent of the application domain and which are therefore distinguished from domain relationships: subclass and aggregation relationships.

The subtype relationship between abstract data types has to be distinguished from the subclass relationship between entity types. Unlike objects in object-oriented programming, entities are not instances of abstract data types (often called `classes' in OO programming) but of entity types. Recall that an entity type is a named container with an associated abstract data type, namely the tuple type defining the attributes and attribute functions of all entities of that type. There may be several distinct entity types associated with the same abstract data type. A tuple type is a specialization (or a subtype) of another tuple type if it has additional attributes. If an entity type E is a specialization of an entity type D in the sense that every entity of type E is also an entity of type D, then E is called a subclass of D. This implies that every attribute of D is also an attribute of E, and E may have additional attributes (i.e. the data type of E is a subtype of that of D). The subclass relationship between two entity types is sometimes called `isa relationship' because we can say ``E isa D''. For instance, a sales engineer is a sales person, and hence the entity type SalesEngineer is a subclass of SalesPerson. We use the set-theoretic notation E isIncludedInD for expressing a subclass relationship and visualize it by placing a subclass below its superclasses and connect them with simple lines.

In an aggregation relationship, a number of part entities belong exclusively to an aggregate entity and do not exist independently of such an assignment. For instance, Room entities belong to a Building entity, which is expressed as Building HAS Room. An aggregation relationship type is visually represented by drawing the part entity box within the aggregate entity box.

Depending on how many entities from one entity type can be associated with how many entities of another entity type, three kinds of relationship types are distinguished:

One-to-one: For each entity of either type there is at most one associated entity of the other type.
Many-to-one from E₁ to E₂: Each entity of type E₂ is associated with zero or more E₁ entities, and each E₁ entity is associated with at most one E₂ entity. A many-to-one relationship from E₁ to E₂ is a (partial) function from E₁ to E₂.
Many-to-many: each entity of either type is related to zero or more entities of the other type.

A relationship type is visually represented using a labeled diamond, where the label consists of the name of the relationship type, connected to the related entity boxes by straight lines. Its functionality type is visualized by drawing crows feet for representing a many-side of the relationship type. So, a many-to-many relationship type is represented by drawing crows feet at the end of both connections to the related entity boxes. Since relationships are also entities (of a higher-level), they may have attributes as well. For example, if a relationship holds only for some time, there may be an attribute for its duration.

Application Domain Predicates

Each name of an entity type and each name of a relationship type denotes an application domain predicate which can be used to express the properties of, and the relationships between, entities in the formal manner of predicate logic. Thus, an ER model of an application domain defines a formal language consisting of

the finite set of application domain predicates established by the entity-relationship analysis, and
the set of all possible attribute values, i.e. the union of all attribute domains, which forms the set of constant symbols.

In the sequel, we also briefly say domain predicate instead of 'application domain predicate'.

ER Modeling Is Object-Oriented

ER modeling is object-oriented since it accommodates object identity, complex-valued and virtual attributes (`methods'), aggregation, and subclass hierarchies (`inheritance'). Many of the recently proposed object-oriented modeling methods are just notational variants of ER diagrams.

Object-oriented methods often include attempts to model the behavior of a system. Behavior modeling is a significant extension to the ER meta-model. However, the proposed methods, such as Petri nets and state transition diagrams, are not well-integrated with the constructs of state modeling and therefore not satisfactory. Finding a modeling method that integrates both state and behavior is still an open problem.

1.4 Identifying Qualified Predicates

Certain domain predicates, on closer inspection, may turn out to be associated with some kind of qualification such as

a valid time span during which a predicate is asserted to apply to a tuple,
a degree of uncertainty with which a property or a relationship holds for an entity,
a level of security with which the information that a predicate applies to a tuple is classified, or
an information source according to which a property or relationship holds for an entity (in addition, a certain degree of reliability may be associated with the information source).

Qualified predicates require specialized logics (such as temporal logic, fuzzy logic, etc.) if their semantics is to be defined according to the paradigm of classical predicate logic. They can be represented in tables with designated columns containing the qualifications of tuples.

Qualified predicates are visualized in an ER diagram by including the qualification as a special attribute at the end of the attribute list separated by a line.

1.5 Identifying Incomplete Predicates

An ER model may contain predicates for which there is no guarantee that complete information about them is available. For such predicates the Closed-World Assumption (CWA) is not justified since not all tuples to which the predicate applies are known. Thus, one cannot infer that it is false that the predicate applies to a tuple simply because the tuple is not recorded in the database. The CWA requires that the owner of the database and its information suppliers have perfect control of the represented domain and know everything about it. Whenever this strong assumption is not plausible, some domain predicates may be incomplete (more precisely, their representation in the database may be incomplete).

There are two kinds of incomplete predicates: total and partial ones. While total predicates may, in principle, be completely represented in a database, partial predicates may not; they are necessarily incomplete. For partial predicates, falsity does not follow from non-truth: if it is not true that the predicate applies to a tuple, this does not already mean that it is therefore false. In general, predicates may have truth-value gaps, and falsity is not equal to non-truth. The classical law of the excluded middle applies only to total predicates no matter if their representation is complete or incomplete. It does not hold for incomplete predicates that are partial.

Incomplete predicates are best explained by partial logic with weak and strong negation. The extension of an incomplete predicate is represented in a bitable consisting of an upper and a lower part. While the upper part contains the truth extension, the lower part contains the falsity extension of the represented incomplete predicate. Thus, bitables allow to store both positive and negative information.

An incomplete predicate is visually represented in an ER diagram by drawing a horizontal line dividing the head of an entity box or of a relationship diamond in an upper and a lower part.

As an example, consider the information system of an organization that wants to record the personal preferences of its employees towards each other by means of a relationship type Likes. The information whether an employee likes or dislikes another employee is obtained by interviews. Obviously, the information collected in this way is incomplete: the employee John need not like or dislike the employee Peter, he may be neutral with respect to Peter. So, it does not follow from John's not liking Peter that he does therefore dislike him. The predicate representing the Like relationship type is incomplete, and, consequently, corresponds to a bitable containing all pairs for which Likes hold, i.e. its truth extension consisting of all pairs (x,y) such that x likes y, and, in addition, all pairs for which Likes does definitely not hold, i.e. its falsity extension consisting of all pairs (x,y) such that x dislikes y.

1.6 Identifying Intensional Predicates

As opposed to an extensional predicate, an intensional predicate is not represented by means of a table but is defined with the help of other (intensional and extensional) predicates. The extension of an intensional predicate is not explicitly stored in the form of a base table but has to be computed from the extension of those other predicates occurring in its definition.

Intensional predicates are defined by means of deduction rules consisting of a head (or conclusion) and a body (or premise). They can be used to represent various kinds of relationships between concepts, such as subsumption or synonymy, or to represent properties and relationships based on heuristics and on default assumptions.

In summary, incomplete, qualified and intensional predicates extend the concept of normal predicates and lead to conservative extensions of relational databases. Since these extensions are orthogonal, they form a rather large space of possible combinations.

Orthogonal extensions of relational databases
predicate category	representation	logic
normal predicates	relational database tables	classical logic
incomplete predicates	bitables	partial logics
qualified predicates	tables with designated columns & (temporal, fuzzy, etc.)	specialized logics
intensional predicate	deduction rules	constructive logics

1.7 Entity-Relationship Modeling in 7 Steps

The following procedure represents a simplified method how to obtain an ER model. Notice that whenever finishing one of the seven steps it is a good idea to review the results of all previous steps.

Step 1: List all potential entity types.

Write a list of candidate entity types which covers the entire scope of the application domain.

Step 2: Write a summary paragraph for each entity type.

Write a summary description of the knowledge base model by writing a summary paragraph for each entity type identified in step 1. Use a restricted declarative style of language and enumerate paragraphs and sentences.

Step 3: Identify relationship types.

Identify relationship types by analyzing the summary description and looking for verbs. Start by looking for subclass and for aggregation relationships, and after that identify the remaining domain relationship types. The result of this should be three lists: a list of subclass relationships of the form E₁ isIncludedIn E₂, a list of aggregation relationships of the form E₁ HAS E₂, and a list of domain relationship names together with associated entity types using either infix notation such as E₁ isRelatedTo E₂ or prefix notation such as R( E₁, E₂, ...,E_n)

Step 4: Determine the epistemic category of all domain predicates.

For each predicate obtained from the list of entity types and the list of domain relationship types, determine its epistemic category:

Is it a complete extensional predicate good for absolute assertions ?
Is it a qualified predicate such that assertions made with it are informationally correct only relative to some qualification (like a valid-time span, a degree of uncertainty, or a security level) ?
Is it an incomplete predicate for which we do not have sufficient information competence in order to guarantee that its complete extension is available ?
Is it an intensional predicate whose extension is not explicitly stored in the knowledge base but is derived from other predicates via deduction rules ?

Step 5: Draw an ER diagram.

Draw an ER diagram using all entity types identified in step 1 and all relationship types identified in step 3. Visualize the subclass hierarchies by placing subclass entity boxes below their superclasses connecting them with simple lines. If an entity type has several subclasses, indicate whether they form a complete partition and whether they may overlap or have to be disjoint. Visualize aggregation relationships by drawing the box of the part entity type within the larger box of the aggregate entity type. For each domain relationship type, determine whether it is one-to-one, one-to-many or many-to-many, and draw the corresponding crow's feet in the relationship connection lines accordingly.

Step 6: Determine the attributes of entities and relationships.

For each entity type and each relationship type, determine its attributes together with their domains and its attribute functions with their signatures. Mark complex-valued attributes and attributes with extensional domains.

Step 7: Complete the diagram by deriving additional entity types.

Inspect all attributes with extensional domains marked in step 6. Determine whether they refer to an entity type rather than to a data type, and derive additional entity types from this analysis. Add them to the ER diagram together with appropriate relationship links.

1.8 Agent-Object-Relationship Modeling

This section contains a brief outlook to the future of conceptual modeling.

ER modeling does not account for the dynamic aspects of information and knowledge processing systems. These aspects are related to notions like communication, interaction, events, activities or processes. As pointed out above, it may be useful to distinguish between different kinds of entity types. In particular, for capturing semantic aspects related to the dynamics of knowledge systems, it is necessary to distinguish between agents and passive objects. While both objects and agents are represented in the system, only agents interact with it, and the possible interactions may have to be represented in the system as well.

Being entities, agents and objects of the same type share a number of attributes representing their properties or characteristics. So, in agent-object-relationship (AOR) modeling, there are the same notions as in ER modeling (such as attribute domain, state, integrity constraint, key, etc.). In fact, objects in AOR modeling are treated exactly like entities in ER modeling.

While ER modeling supports the design of object-oriented information systems realized with the help of relational and object-relational databases, AOR modeling allows the high-level design of agent-oriented information systems. An agent-oriented information system represents agents and their potential interactions in addition to ordinary objects. It need not be designed as an agent itself, however. Thus, the AOR meta-model is not intended as a methodology for the design of agent-based systems in general but rather of agent-oriented information systems.

Interaction Frames and Agent Roles

From an enterprise information system (EIS) perspective, examples of agents are customers and suppliers, as well as those employees whose role is to interact with customers and suppliers. But also another EIS, e.g. involved in automated electronic commerce, may be an agent from the perspective of a system dealing with it. So, agents in this context may be natural or artificial systems that are involved in some kind of business interaction. If the preconditions and effects of these interactions are represented in the EIS, the system is able to check the consistency and completeness of business transactions.

Notice that it is a matter of choice whether to represent active entities as agents or as objects. Agents are special objects. One may simply ignore their agent-specific properties and treat them as objects. For instance, if the supplier-to-customer interaction does not have to be explicitly modeled in an enterprise information system. Customers may be simply represented as objects in the same way as orders and bank accounts.

But even if it is not necessary to represent certain agent types in the target system and track their interactions, it may be important to consider them in the conceptual model in order to be able to derive a transaction and process model from the possible interactions with that type of agent. The possible interactions between two agent types (or roles, say between customers and salespersons, or between patients and nurses) are defined by an interaction frame which lists for both participating roles:

All types of communication acts that may take place in the agent-to-agent communication frame. Since in the perspective of a `listener', the communication acts of other agents are events, we also speak of communication event types.
All relevant types of physical actions which may be performed in response to other actions of (or which are otherwise related to) the interaction frame.
All relevant types of environment events which may be perceived by an agent and which are related to the interaction frame.
A set of correctness properties defining the correct behavior of agents within the interaction frame. Correctness properties are temporal logic assertions expressing either a safety or a progress property. Typically, these properties refer to communication events (or messages) and possibly to the mental state of agents.

A communication event type is defined by its name and a formal parameter list. A communication event is represented as a typed message together with the sender and the receiver address. The message type corresponds to the communication event type. There are domain-independent message types such as request, tell, ask and reply. But message types may also be domain-specific. In an electronic commerce interaction frame, examples of domain-specific message types are: requestOffer, submitOrder and cancelOrder. Agents may be related to objects and to other agents by means of relationships as in ER modeling. An interaction frame can be viewed as a special agent-to-agent relationship.

Mental Attributes

While the state of an object does not have any application-independent structure or semantics, the state of an agent consists of mental components such as beliefs and commitments. In AOR modeling, the mental state of agents may be (partially) represented by means of mental attributes. Whenever an interaction frame contains correctness properties referring to the mental state of agents, the corresponding mental attributes must be included in the AOR model.

1.9 Summary

Conceptual modeling methods support the definition of a formal language and of associated integrity constraints needed to formalize an application domain. Standard predicate logic has to be enriched by including further conceptual distinctions, such as the entity-relationship dichotomy or the agent-object-relationship trichotomy, in order to capture a sufficiently structured formal model of an application domain.

1.10 Further Reading

Elaborate presentations of the ER modeling approach to relational database design can be found in (Teo94, BCN92). Another interesting approach for modeling information systems, focusing on the formal-linguistic analysis of specification sentences in requirement documents, is proposed in (NH89).

1.11 Exercises

Problem 1-1 Establish an ER model (and draw the corresponding diagram) of the university domain consisting of the entity types Person, Employee, Student, Course, Teacher, TeachingAssistant, and MasterThesis. Notice that the involved predicates are all extensional and complete. However, it is an option to model Student, Course and TeachingAssistant, as well as the relationships associated with them, as being qualified with a valid time.

Problem 1-2 Assume that the university is interested in information about its employees being smokers or non-smokers. This information, however, can only be obtained on a voluntary basis. Extend the university ER model from the previous exercise in an appropriate way.

Problem 1-3 Establish an ER model of the hospital domain consisting of the entity types Person, Patient, Room, Physician, Diagnosis and Treatment. Notice that diagnoses are uncertain and that information about patients (including their diagnoses and treatments) must be associated with a security level to protect it from unauthorized access.

2 The Fundamental Case: RELATIONAL DATABASES

The system of relational databases represents the most fundamental case of a knowledge system. It already contains all the basic components and operations of a knowledge system such as the distinction between the query, the input, and the representation language, as well as an update, an inference, and an inference-based query answering operation. It is one of the central claims of this book that every knowledge system should be upwards compatible with, and conservatively extend, the system of relational databases. Knowledge systems satisfying this requirement are called vivid in Section 6.4.

After introducing the basic concepts of relational databases, three major components of databases are discussed: queries, integrity constraints and updates. It is shown that query answering is based on logical inference, and that updates are subject to certain logical conditions involving the specified integrity constraints. Then, the problem of null values in databases is presented as a violation of the database completeness assumption. Finally, we present Reiter's database completion theory and discuss the relation of databases to predicate logic model theory.

2.1 Introduction

The main purpose of a database is to store and retrieve information given in an explicit linguistic format. As opposed to certain other types of information that are also processed in cognitive systems (such as humans), this type of information is essentially propositional, that is, it can be expressed in a formal language composed of names for entities (or objects) and relationships, and of complex expressions formed from elementary ones by means of sentential connectives, such as negation, conjunction and disjunction. While only elementary pieces of information corresponding to atomic sentences can be stored in a relational database, the retrieval of specific information is achieved by submitting complex queries corresponding to complex logical formulas. Thus, it becomes obvious that predicate logic, with its account of formal languages composed of constant and relation symbols, and of sentential connectives, is the mathematical basis of query answering in databases.

Already in 1970, Edgar F. Codd published his pioneering article ``A Relational Model of Data for Large Shared Data Banks'' in the Communications of the ACM, where he defined the principles of the relational database model. This was the first convincing conceptualization of a general purpose database model, and it is not an accident that it relies on formal logic.

In the mid-eighties finally, IBM presented its database management system DB2, the first industrial-strength implementation of the relational model, which continues to be one of the most successful systems for mainframe computers today. There are now numerous other relational database management systems that are commercially available. The most popular ones include Informix, Ingres, Oracle, Sybase and Microsoft SQL Server. To a great extent, the overwhelming success of these systems is due to the standardization of the database programming language SQL originally developed at IBM in the seventies. SQL is a declarative language for defining, modifying and querying relational database tables. SQL queries correspond to first order predicate logic formulas.

While most well-established information processing systems and tools such as programming languages, operating systems or word processors have evolved from practical prototypes, the unprecedented success story of the relational database model is one of the rare examples -- certainly the most important one -- where a well-established and widely used major software system is based on a formal model derived from a mathematical theory (in this case set theory and mathematical logic.)

Logically, a relational database is a finite set of finite set-theoretic relations over elementary data types (called tables), corresponding to a finite set of atomic sentences. Such a collection of ground atoms can also be viewed as a finite interpretation of the formal language associated with the database in the sense of first order model theory.

The information represented in a relational database is updated by inserting or deleting atomic sentences corresponding to table rows (or tuples of some set-theoretic relation). Since a relational database is assumed to have complete information about the domain represented in its tables, if-queries are answered either by yes or by no. There is no third type of answer such as unknown. Open queries (with free variables) are answered by returning the set of all answer substitutions satisfying the query formula.

2.14 Deficiencies of the Relational Database Model

Although relational databases represent a major breakthrough in the history of information processing, they still have a number of deficiencies calling for extensions and improvements. This concerns, in particular, the problems of entity identity and of complex-valued attributes and attribute functions. Also, as a requirement from conceptual modeling, the subclass (and inheritance) relationship between entity types should be supported. Furthermore, practical applications require base type extensibility and support for large data objects. And finally, for incorporating more semantics into databases and for knowledge management the support of rules is required. For expressing procedural knowledge in the database schema independently of specific application programs, the concept of reaction rules has to be supported. For representing terminological and heuristic knowledge, there should be support for deduction rules.

Besides these general purpose issues, there are several special purpose extensions of relational databases that are required to handle qualifications (such as valid time, security and reliability) and various forms of incompleteness (such as incomplete predicates and disjunctive information).

Adequate Support for Standard Names

In SQL92 databases, entities are identified with the help of primary keys. This requires that for any entity table, one of its keys is designated as primary and used as the standard name of the entities represented by the rows of that table. There are, however, cases where no natural key is available at all, and therefore an artificial primary key has to be introduced. Even more troublesome is the problem that an attribute participating in a primary key is, like any other attribute, modifiable. If such an attribute is, for some reason, modified, this leads to a change of the entity identity which is clearly undesirable. A standard name should be a unique identifier which is associated with an entity throughout its entire life cycle implying that it must be immutable.

Complex-Valued Attributes

For simplicity, Codd has banned complex-valued attributes from the relational database model (this requirement has been called `1. normal form'). Since in practice, many attributes have complex values, such as sets or tuples, this restriction leads to rather unnatural representations.

Attribute Functions

There are attributes whose values are not explicitly stored in the database but rather have to be computed in some way. These virtual attributes are associated with a specific function that is invoked whenever their value is queried. Relational databases, and SQL92, do not provide any means to define attribute functions.

Subclass Hierarchy and Inheritance

In the conceptual model of an application domain, one often obtains hierarchical subclass relationships between entity types. Such a subclass relationship implies that entities of the subclass inherit all attributes from the superclass entities by means of subtyping. However, this mechanism is not supported in relational databases.

User-Defined Types and Large Objects

Since much of the information processed by human beings is of an audio-visual nature, there is a growing tendency to include pictures, sound samples and videos in the characterization of entities. Also, advanced software applications create and manipulate complex documents such as spreadsheets, project plans or hypertext documents, which may have to be included in the characterization of entities as well. So, there is a need for special attributes associated with user-defined data types of arbitrary length that can be queried with the help of user-defined functions and predicates. The BLOB (`binary large object') data type offered by most commercial database systems is only an ad-hoc solution to this problem, since it is not well-integrated with the other data types in the query language and access mechanisms.

Reaction Rules

SQL databases support a restricted form of reaction rules, called triggers. Triggers are bound to update events. Depending on some condition on the database state, they may lead to an update action and to system-specific procedure calls. In Chapter 4, we propose a general form of reaction rules which can be used to specify the communication in multidatabases and, more generally, the interoperation between cooperative knowledge bases.

Deduction Rules

Deduction rules may be used for defining intensional predicates and for representing heuristic knowledge. Intensional predicates express properties of, and relationships between, entities on the basis of other (intensional and extensional) predicates. Heuristic knowledge is typically represented in the form of default rules which are most naturally expressed using the weak and strong negation from partial logic. While relational databases allow to define non-recursive intensional predicates with the help of views, they do not support default rules or any other form of heuristic knowledge.

Valid Time and Belief Time

In certain applications, it is essential to deal with temporally qualified information. This includes both the time span during which a sentence (or piece of information) is valid, and the time span during which a sentence is believed and recorded in the database. The addition of valid time and belief time requires special query capabilities which cannot be achieved by using the temporal data types of SQL92.

Spatial and Geographic Information

Spatial information is related to geometric concepts like points, lines, rectangles, polygons, surfaces, and volumes. Spatial database systems include direct support of these concepts. They are used for realizing geographic information systems.

Incomplete Predicates

There are domain predicates, especially in empirical domains, for which the database completeness assumption is not justified. For these incomplete predicates, the falsity extension is not simply the set-theoretic complement of the truth extension, and therefore both positive as well as negative cases have to be treated on par. An ad-hoc solution for incomplete predicates in relational databases would be to represent them in two tables: one for their truth and another one for their falsity extension. But there would still be no support of this concept in the query and update mechanisms.

Security

SQL databases only support security requirements at the schema level, that is, user-specific access restrictions can be defined for entire tables or for sensitive columns which are then non-modifiable or non-observable for unauthorized users. In practice, these possibilities are not sufficient to protect confidential information. Rather, one needs contents-based access restrictions preventing unauthorized users to query or update sensitive tuples of a table.

Uncertainty and Reliability

There are domains where information is typically uncertain, or where the information sources are not completely reliable. In these cases, it is necessary to assign degrees of uncertainty to new pieces of information, or to label them with their source which may be associated with a degree of reliability.

Disjunctive Information

The most difficult extension of databases and knowledge bases concerns the integration of disjunctive information. While the kind of incompleteness created by disjunctive information is an important characteristic of commonsense knowledge representation and reasoning, theoretical investigations have not succeeded yet in establishing convincing semantics and efficient computation methods for storing and retrieving disjunctive information. Questions about the right semantics arise when disjunctive information is combined with negation (on the basis of the database completeness assumption) and rules. Efficiency problems arise from the combinatorial explosion created by disjunctive information.

2.15 Summary

The system of relational databases represents the most fundamental case of a knowledge system. Consisting of elementary-valued tables, a relational database is restricted to atomic sentences. More complex tables and more complex sentences, such as negated, disjunctive or qualified sentences, which are needed to represent various forms of incomplete and qualified information, are not admitted. They require specific extensions of relational databases leading to object-relational, temporal, fuzzy, deductive, and other advanced database and knowledge systems.

3 Object-Relational Databases

Object-relational databases (ORDBs) have evolved from relational databases by adding several extensions derived from conceptual modeling requirements and from object-oriented programming concepts. It is expected that the application of ORDBs will outnumber that of relational databases in the near future because they allow the seamless integration of multimedia data types and large application objects such as text documents, spreadsheets and maps, with the fundamental concept of database tables. Many object-relational extensions will be included in SQL3, the successor to SQL92.

Historically, the successful application of object-oriented programming languages such as Smalltalk, C++ and Java, has led to the development of a number of object-oriented database (OODB) systems which support the storage and manipulation of persistent objects. These systems have been designed as programming tools to facilitate the development of object-oriented application programs. However, although they are called `database systems', their emphasis is not on knowledge representation but rather on persistent object management. Any database concept which is intended as an implementation platform for information systems and knowledge representation must support tables as its basic representation concept on which query answering is based. Tables correspond to extensional predicates, and each table row corresponds to a sentence or proposition. This correspondence is a fundamental requirement for true database systems. If it is violated, like in the case of OODBs, one deals with a new notion of `database system', and it would be less confusing to use another term instead (e.g. `persistent object management system') as proposed by (Kim 1995).

One can view the evolution of relational to object-relational databases in two steps:

The addition of abstract data types (ADTs) leads to ADT-relational databases consisting of complex-valued tables. ADTs include user-defined base types and complex types together with user-defined functions and type predicates, and the possibility to form a type hierarchy where a subtype of a tuple type inherits all attributes defined for it.
The addition of object identity, object references and the possibility to define subtables within an extensional subclass hierarchy results in object-relational databases consisting of complex-valued tables and object tables.

There are two notable differences between object-relational databases and object-oriented programming:

Object IDs in ORDBs are logical pointers. They are not bound to a physical location (like C++ pointers).
In addition to the intensional subtype hierarchy of the type system, ORDBs have an extensional subclass (or subtable) hierarchy that respects the subtype relationships defined in their type system.

7 Temporal Databases

Temporal databases represent the practically most important logical extension of relational databases since temporal information plays an essential role in many application domains. It is expected that the temporal database language standard TSQL2 proposed in (Snodgrass 1995) will be included in future versions of SQL. We show how the notions of a database table, of natural inference and of table operations have to be generalized in order to accommodate temporally qualified information.

8 Fuzzy Databases

Uncertain information shows up in various ways, typically not in business and administration domains but rather in more empirical domains such as in medicine, criminology, and in all kinds of empirical research. After discussing different types of uncertainty, we present a fuzzy database model for storing and retrieving gradually uncertain information.

8.1 Introduction

In various application domains, one encounters uncertain and imprecise information. While the fuzzy set theory of (Zadeh 1965) allows to handle imprecise or vague information, fuzzy logic and possibilistic logic, which are based on the possibility theory of (Zad79, DP80, DLP94), form the theoretical basis for processing gradually uncertain information.

Classical probability theory, based on the axioms of Kolmogorov, on the other hand, makes very strong assumptions. In many practical cases, it is therefore not applicable to the problem of handling uncertain information. Also, probabilistic methods are computationally much more costly than fuzzy methods because they are non-compositional.

Imprecise information often comes in the form of disjunctive or vague sentences, while uncertain information is often expressed in the form of sentences qualified by a degree of uncertainty like, for instance, `very likely' or `with 60% certainty'. Sometimes, such a qualification is called a subjective probability or a degree of belief. A particular form of uncertainty is given by statistical information. Sentences qualified by statistical uncertainty are based on statistical samples measuring relative frequencies. An example of such a sentence is "80% of all jaundice patients have hepatitis". Through their empirical grounding, these sentences are more reliable than sentences qualified by a mere subjective degree of belief.

In the literature, there is no clear taxonomy of uncertainty in databases and knowledge bases. An obvious distinction, however, concerns the uncertainty expressed at the level of attribute values, or at the level of the applicability of a predicate to a tuple. While the former is related to the issue of handling vagueness, the latter is related to an account of certainty-qualified sentences. In our formal exposition, we do not treat vagueness expressed by fuzzy-set-valued attributes, but only the more fundamental issue of gradual uncertainty based on fuzzy relations over ordinary (`crisp') attributes.