Foundations of Knowledge Systems
with Applications to Databases and Agents
Gerd Wagner
Kluwer Academic Publishers, Boston/Dordrecht/London
(available from Amazon)
Contents
Preface |
xvii |
Introduction |
xxiii |
|
|
Part I Tables and Objects |
|
1. Conceptual Modeling of Knowledge Bases |
3 |
1.1 Introduction |
3 |
1.2 Predicate Logic |
4 |
1.3 Entity-Relationship Modeling |
5 |
1.4 Identifying Qualified Predicates |
11 |
1.5 Identifying Incomplete Predicates |
11 |
1.6 Identifying Intensional Predicates |
13 |
1.7 Entity-Relationship Modeling
in 7 Steps |
13 |
1.8 Agent-Object-Relationship Modeling |
15 |
1.9 Summary |
17 |
1.10 Further Reading |
17 |
1.11 Exercises |
17 |
2. Relational Databases |
19 |
2.1 Introduction |
20 |
2.2 Basic Concepts |
21 |
2.3 Query Answering |
27 |
2.4 Conjunctive Queries |
30 |
2.5 General Queries |
36 |
2.6 Integrity Constraints |
44 |
2.7 Transforming an ER Model into a Relational Database
Schema |
47 |
2.8 Information Order and Relative Information Distance |
48 |
2.9 Updating Relational Databases |
49 |
2.10 Incomplete Information in Relational Databases |
55 |
2.11 A Relational Database without Nulls Represents
a Herbrand Interpretation |
59 |
2.12 Reiter's Database Completion Theory |
61 |
2.13 Database Inference as Preferential Entailment
Based on Minimal Models |
62 |
2.14 Deficiencies of the Relational
Database Model |
63 |
2.15 Summary |
66 |
2.16 Further Reading |
67 |
2.17 Exercises |
67 |
3. Object-Relational Databases |
71 |
3.1 ADT-Relational Databases |
73 |
3.2 Introducing Objects and Classes |
77 |
3.3 Further Reading |
82 |
3.4 Exercises |
82 |
|
|
Part II Adding Rules |
|
4. Reaction Rules |
87 |
4.1 Introduction |
88 |
4.2 Communication Events |
91 |
4.3 Epistemic and Communicative Reactions |
93 |
4.4 Operational Semantics of Reaction Rules |
96 |
4.5 Multidatabases |
97 |
4.6 Summary |
100 |
4.7 Further Reading |
100 |
4.8 Exercises |
100 |
5. Deduction Rules |
103 |
5.1 Introduction |
103 |
5.2 Closure Semantics |
109 |
5.3 Model-Theoretic Semantics |
119 |
5.4 Summary |
121 |
5.5 Further Reading |
121 |
5.6 Exercises |
122 |
|
|
Part III Positive Knowledge
Systems: Concepts, Properties and Examples |
|
6. Principles of Positive Knowledge Systems |
127 |
6.1 Introduction |
127 |
6.2 Basic Concepts |
128 |
6.3 Formal Properties of Knowledge Systems |
132 |
6.4 Vivid Knowledge Systems |
135 |
6.5 Knowledge Update and Knowledge Integration |
135 |
6.6 Summary |
136 |
7. Temporal Databases |
139 |
7.1 Introduction |
139 |
7.2 The Timestamp Algebra |
140 |
7.3 Temporal Tables |
141 |
7.4 Natural Inference in Temporal Databases |
142 |
7.5 Updates and Information Order |
143 |
7.6 Indexical Temporal Null Values |
143 |
7.7 Temporal Table Operations |
145 |
7.8 Temporal Deduction Rules |
147 |
7.9 Bitemporal Databases |
148 |
7.10 Further Reading |
148 |
7.11 Summary |
148 |
7.12 Exercises |
149 |
8. Fuzzy Databases |
151 |
8.1 Introduction |
151 |
8.2 Certainty Scales |
153 |
8.3 Fuzzy Tables |
154 |
8.4 Natural Inference in Fuzzy Databases |
155 |
8.5 Update, Knowledge Integration and Information
Order |
156 |
8.6 Fuzzy Table Operations |
157 |
8.7 Fuzzy Deduction Rules |
159 |
8.8 Further Reading |
161 |
8.9 Summary |
161 |
9. Further Examples of Positive Knowledge Systems |
165 |
9.1 Multi-Level Secure Databases |
165 |
9.2 Lineage Databases |
169 |
9.3 Disjunctive Databases |
171 |
9.4 S5-Epistemic Databases |
173 |
Part IV Admitting Negative
Information: From Tables to Bitables |
|
10. Principles of Non-Positive Knowledge Systems |
179 |
10.1 Introduction |
179 |
10.2 Basic Concepts |
180 |
10.3 Standard Logics as Knowledge Systems |
183 |
11. Relational Factbases |
187 |
11.1 Introduction |
187 |
11.2 Birelational Databases |
188 |
11.3 Relational Factbases |
191 |
12. Possibilistic Databases |
195 |
12.1 Introduction |
195 |
12.2 Possibilistic Tables |
196 |
12.3 Natural Inference in Possibilistic Databases |
197 |
12.4 Updates and Information Order |
199 |
12.5 Knowledge Integration |
200 |
12.6 Possibilistic Deductive Databases |
200 |
12.7 Possibilistic Deduction Rules and EMYCIN |
202 |
13. Further Examples of Non-Positive Knowledge Systems |
205 |
13.1 Multi-Level Secure Factbases |
205 |
13.2 Lineage Factbases |
207 |
13.3 Disjunctive Factbases |
209 |
|
|
Part V More on Reaction
and Deduction Rules |
|
14. Communication and Cooperation |
219 |
14.1 Multi-Knowledge Bases |
219 |
14.2 Distributed Updating and Query Answering |
224 |
14.3 Cooperative Knowledge Bases |
229 |
15. Deductive Knowledge Systems |
237 |
15.1 Introduction |
237 |
15.2 Deductive Knowledge Bases |
239 |
15.3 The Immediate Consequence Operator |
245 |
15.4 Operational versus Constraint Semantics
of Rules |
247 |
15.5 Ampliative DKBs |
247 |
15.6 Non-Ampliative DKBs |
252 |
16. Advanced Knowledge and Reasoning Services |
255 |
16.1 Explanations |
256 |
16.2 Diagnoses |
257 |
16.3 Automated Planning |
257 |
|
|
Appendices |
|
A Partial Logics with Two Kinds of Negation |
263 |
B Compositional Possibilistic Logic |
270 |
C On the Logic of Temporally Qualified Information |
280 |
|
|
Preface
One of the main uses of computer systems is the management of large amounts
of symbolic information representing the state of some application domain
such as information about all the people I communicate with in my personal
address database, or about relevant parts of the outer space in the
knowledge base of a NASA space mission. While database management systems
offer only the basic services of information manipulation and retrieval,
more powerful knowledge systems offer in addition advanced services such
as deductive and abductive reasoning for the purpose of finding explanations
and diagnoses, or generating plans.
In order to design and understand database and knowledge-based applications
it is important to build upon well-established conceptual and mathematical
foundations. What are the principles behind database and knowledge systems
? What are their major components ? Which are the important cases of knowledge
systems ? What are their limitations ? Addressing these questions, and
discussing the fundamental issues of information update, knowledge assimilation,
integrity maintenance, and inference-based query answering, is the purpose
of this book.
Databases, Logic and SQL
Since relational databases are the most fundamental case of a knowledge
system, they are treated in-depth in connection with the standard database
language SQL. The chapter on the foundations of relational databases is
also both an introduction to mathematical logic, and a course in SQL programming.
It is supplemented with an additional chapter on object-relational databases,
a new development that dominates the current progress in database technology
and SQL.
Various Kinds of Information
Depending on the type of application domain, we have to be able to deal
with various kinds of information. The simplest case are business and administrative
domains where the information to be processed is complete, and it is assumed
that all information sources are perfect (absolutely reliable, honest and
competent). But in more empirical domains, such as in outer space, even
if all information sources are perfect, there will be various forms of
incomplete information, e.g. involving partial predicates, and disjunctively
imprecise or gradually uncertain information. In addition to incompleteness,
information items may be asserted relative to some time spans in which
they are valid and believed, and they may be qualified in a number of ways.
In particular, different pieces of information may be assigned different
levels of security, and different degrees of reliability.
The book presents a comprehensive framework for the conceptual integration
and uniform treatment of all these types of information. This includes
the definition of generic knowledge system models capable to handle fuzzy,
temporal, confidential, and unreliable information.
Various Types of Rules
Advanced knowledge services (such as deductive query answering, reactive
input processing, and the abductive generation of explanations, diagnoses
and plans) are based on rules. The book presents an in-depth treatment
of two kinds of rules: deduction rules and reaction rules.
In addition to information about the current and possible state of affairs
in the form of facts and integrity constraints, a knowledge base may also
represent terminological and heuristic knowledge in the form of deduction
rules. This type of rule is well-known from deductive databases and logic
programs. It is shown how the concept of deduction rules can be applied
to the various kinds of information that have to be processed in practical
knowledge systems, leading to such powerful concepts like temporal and
fuzzy deduction rules. Typically, the semantics of these rules involves
nonmonotonic inference on the basis of the Closed-World Assumption.
Reaction rules can be used to specify the reactive behavior of a knowledge
base when it receives messages, either from the external world, or from
other nodes of a network it participates in. This concerns, in particular,
the communication and cooperation taking place in multidatabases and between
cooperative knowledge bases.
How to Use this Book
Foundations of Databases and Knowledge Systems covers both basic and advanced
topics. It may be used as the textbook of a course offering a broad introduction
to databases and knowledge bases, or it may be used as an additional textbook
in a course on databases or Artificial Intelligence. Professionals and
researchers interested in learning about new developments will benefit
from the encyclopedic character of the book providing organized access
to many advanced concepts in the theory of databases and knowledge bases.
Gerd Wagner
Berlin, April 1998
Introduction
The Success of Database Systems
In the sixties and seventies, pushed by the need to store and process large
data collections, powerful database software based on the file system technology
available at that time has been developed. These types of systems have
been named "hierarchical" and "network" databases, referring to the respective
type of file organization. Although these systems were able to process
large amounts of data efficiently, their limitations in terms of flexibility
and ease of use were severe. Those difficulties were caused by the unnatural
character of the conceptual user-interface of hierarchical and network
databases consisting of the rather low-level data access operations
that were dictated by their way of implementing storage and retrieval.
Thus, both database models have later on turned out to be cognitively inadequate.
Only through the formal conceptualization of relational databases
by Codd in the early seventies it became possible to overcome the inadequacy
of the database technology prevailing at that time. Only the logic-based
formal concepts of the relational database model have led to more cognitive
adequacy, and have thus constituted the conceptual basis for further progress
(towards object-relational, temporal, deductive, etc. databases). Unlike
much of theoretical computer science, Codd's theory of relational databases
is an example of a practical theory.
The Difficulties of `Expert Systems'
In the field of expert systems, on the other hand, there has never been
a conceptual breakthrough like that of the relational database model. It
seems that the field is still at the pre-scientific stage, and there seems
to be almost no measurable progress since the classical prototype MYCIN.
There is neither a formal definition of an `expert system', nor is there
any clear semantics of rules. The field has rather developed a variety
of notions (such as `production rules,' or `certainty factors') which have
often not been clearly defined, or have not been based on a logical semantics.
Lacking a clear (and formally defined) conceptual framework, it is difficult
for the expert system community to compare different systems and to identify
shortcomings and measure progress.
One may argue that these problems are due to the inherent difficulties
of knowledge representation and processing, but it rather seems that the
expert system community has failed to establish scientific foundations
and has also failed to learn from its neighbor fields of logic programming
and deductive databases which have successfully developed a formal concept
of rules based on a logical semantics.
What is a Knowledge System ?
Knowledge representation and reasoning systems, or shorter: knowledge
systems, are based on two fundamental operations:
-
New incoming pieces of information must be assimilated into
a knowledge base by means of an update operation. This includes
simple insertion of new information, deletion (retraction) of old information,
and possibly a number of more involved change operations for assimilating
new pieces of information which are in conflict with some already-stored
pieces. The update operation should satisfy some principle of minimal
mutilation.
-
Queries posed to the knowledge base must be answered by means of an inference-based
query
answering operation. The inference procedure should be complete with
respect to some well-understood logic, and sound with respect to
a preferential entailment relation in that logic.
In general, there are no specific restrictions on the internal structure
of a knowledge base. It appears, however, that a computational design can
be achieved by `compiling' incoming information into some normal form rather
than leaving it in the form of complex formulas.
The concept of a knowledge system constitutes a useful framework for
the classification and comparison of various computational systems and
formalisms like, e.g., relational, temporal and deductive databases. It
is more general than that of a logic (i.e. a consequence relation). A standard
logic can be viewed as a special kind of knowledge system. On the other
hand, by suitably defining the inference and update operations, knowledge
systems can serve as the basis for the operational definition of logics.
Knowledge Systems and Logic
The relationship between knowledge systems and logic is only obvious for
the inference operation: while in a logical system theorems are proved
from (or entailed by) a set of axioms (or 'theory') by means of the underlying
consequence relation, queries in a knowledge system are answered relative
to a given knowledge base by means of the underlying inference operation.
It turns out that knowledge-based inference is nonmonotonic: query answering
is not
based on all models of a knowledge base but solely on the set of all
intended models. For instance, in the case of relational databases, which
can be viewed as the most fundamental knowledge system, the intended models
are the minimal ones. Also, unlike a logical theory, a knowledge base is
not just a set of well-formed formulas. Rather, it is subject to specific
restrictions (such as, e.g., allowing only atomic sentences as in relational
databases), or more generally, consisting of facts, rules and integrity
constraints. While facts have the form of certain sentences of some suitably
restricted language, rules do, in general, not correspond to implicational
formulas, but rather to meta-logical expressions requiring a specific semantics.
Integrity constraints are sentences which have to be satisfied in all evolving
states of a knowledge base. They stipulate meaningful domain-specific restrictions
on the class of admissible knowledge bases. In addition to information
about the current and possible state of affairs (in the form of facts and
integrity constraints), and terminological and heuristic knowledge (in
the form of deduction rules), a knowledge base may also represent action
knowledge (e.g., in the form of reaction rules), and various types of meta-knowledge.
Knowledge systems and the deficiencies of logic
Knowledge Systems |
Logical Systems |
facts and deduction rules |
axioms |
knowledge base |
theory |
nonmonotonic inference |
monotonic consequence relation |
confirmed if-queries |
theorems |
open queries |
formulas with free variables |
answers |
? |
update operation |
? |
integrity constraints |
? |
reaction rules |
? |
Logic has provided a model of theory-based reasoning by means of consequence
relations, but it has not been concerned with the assimilation of new sentences
into a changing theory. For a theory of knowledge systems, however, we
need a model of both knowledge-based inference and of knowledge assimilation.
About this Book
Traditionally, books on `expert systems' and knowledge-based systems
have been composed of a loose collection of concepts and techniques from
Artificial Intelligence and related fields, such as pattern matching, search
methods, `rule-based systems', models of temporal, spatial and uncertain
reasoning, and their application in diagnosis, design and control. A great
number of prototype systems has been described in the literature, but there
is hardly any generic concept of knowledge-based systems. On the other
hand, knowledge representation research in Artificial Intelligence has
proceeded in a very theoretical manner and has been occupied to a great
extent with solving ``anomalies within formal systems which are never used
for any practical task'' (Brooks 1991). Many of the formalisms proposed
make, or follow, conceptual and ontological stipulations which are not
grounded in the practice of information and knowledge processing.
The theory of knowledge systems presented in this book should be construed
as an attempt of a practical theory which aims at establishing solid conceptual
foundations and generic models to be used in the software engineering of
knowledge and information systems. It is my firm belief, that knowledge
systems have to evolve from, and have to be integrated with, database and
information system technology. The starting points in this evolution are
the entity-relationship modeling method and the system of
relational
databases which are presented in Chapter 1 and 2. These and the following
chapters on object-relational databases, reaction rules, deduction rules,
temporal databases and fuzzy databases are intended for class room use
in computer science at the graduate levels. More advanced topics, such
as representing negative information using bitables, or elements
of a general theory of deduction rules, are covered in the remaining chapters.
The main technical contributions of this book are:
-
a formal model of object-relational databases;
-
a general account of reaction rules in connection with the concept
of knowledge- and perception-based agents, originally proposed in
(Wagner 1996);
-
a new semantics of deduction rules that is based on stable generated
closures and derived from the notion of stable generated models
of (Herre and Wagner 1997);
-
a new generalization of database tables, called bitables, for representing
negative information in connection with incomplete predicates (the
logical ideas of this generalization have already been presented in (Wagner
1991);
-
a new model of fuzzy and possibilistic databases based on
a new compositional definition of possibilistic logic;
-
a new definition of inference-based query answering in multi-level secure
(MLS) databases; and
-
a new model of lineage databases for tracking the lineage and reliability
of information items.
The integration and uniform treatment of all kinds of databases and knowledge
bases in a rigorous conceptual framework appears to be the greatest achievement
of the present work. It allows to identify generic concepts such as deduction
and reaction rules, and to define them in a way that is independent of
the particular choice of knowledge system.
The knowledge system concepts proposed in this book have obvious applications
in advanced database and in agent-based systems. An agent system needs
to include a knowledge system as one of its main components. In
particular, the knowledge systems of MLS factbases for protecting
confidential information, and of lineage factbases for tracking
and weighting the reliability of different information sources, may be
relevant to agent systems. Even more important is the concept of reaction
rules that allows the declarative specification of the reactive behavior
of agents. This book is for those who want to design knowledge-based agents,
and it is for those who want to develop innovative database systems. It
can be used both by computer science students and by professionals in computer
science and other fields with some background in logic to obtain organized
access to many advanced concepts in the theory of databases and knowledge
representation.
Those sections that are primarily of interest to logic-oriented readers
are indicated with "pure logic" next to the section head. Additional non-classical
logic background, not required for understanding the book, is provided
in the appendix as an option for the logic-oriented reader to gain deeper
insights into the connections between knowledge systems and logic.
This is probably the first book to cover such a broad range of diverse
database and knowledge systems. Necessarily, not all relevant questions
are addressed and the discussion of certain advanced topics may be incomplete,
biased, or even erroneous. As a human with limited resources, I may have
overlooked important issues and ignored important related work I should
have mentioned. I apologize for this and promise to do my best to provide
necessary corrections and supplements on the book's Web page which is http://www.informatik.uni-leipzig.de/~gwagner/ks.html
Notice that this Web page may also contain solutions to exercises and
other supplementary material which could not be included in the book.
1 Conceptual Modeling of Knowledge Bases
The task of a knowledge base is to represent a relevant body of knowledge
about a specific application domain in order to support knowledge-based
application programs and to be able to answer ad-hoc queries about the
state of the application domain. It is essential to develop an adequate
conceptual model of the application domain as the basis for designing a
knowledge base. This involves the formalization of intuitions and of informal
descriptions. We discuss the most important methodology for conceptual
modeling: entity-relationship modeling which may be viewed as an
extension of predicate logic. The emphasis of this chapter is on
identifying the different types of predicates needed for capturing more
domain knowledge: normal extensional, qualified, incomplete and intensional
predicates.
1.1 Introduction
For designing a specific knowledge base, the first activity is to analyze
the given application domain, and to develop a conceptual model
of it. From this model, which may be established at different levels of
abstraction, the knowledge base schema is derived. The schema of
a knowledge base defines its type and structure, and the domain-specific
vocabulary that is used to represent knowledge and to store and retrieve
particular pieces of information in the knowledge base. Conceptual modeling
methods (or meta-models) are based on suitable abstractions of the data
and processes application domains consist of. Traditional methods, such
as predicate logic and entity-relationship modeling, have been exclusively
focused on static representations of a domain, that is, they have been
only concerned with its state. More recently, the need for integrating
both static and dynamic aspects, i.e. both state and behavior, in a meta-model
has become obvious. This is documented by the search for an integrated
object-oriented modeling method both for general software engineering and
for information systems. Although large research communities are occupied
with this problem, and the recent proposal of a Unified Modeling Language
(UML), a combination of several previously proposed methods, has
received great attention in software engineering, there is still no real
breakthrough, and it seems that the right abstractions have not been found
yet. Apparently, the principles and concepts of object-orientation are
not sufficient for capturing all relevant static and dynamic aspects of
information systems. Rather, the abstractions sought for may be found in
the research areas of multi-agent systems and agent-oriented programming.
1.2 Predicate Logic
Conceptual modeling has been an issue in philosophy long before computers
and computerized information systems have been invented. In philosophy
of science and in logic, conceptual models of scientific domains have been
a major concern. The main result of these principled philosophical investigations
was the conceptual meta-model of predicate logic, as defined by Frege and
Tarski.
In predicate logic, all basic entities (or individuals) of a given universe
of discourse are conceived of as forming a flat set without any further
classification. Normally, but not necessarily, they have names called constant
symbols (such as 1,2,3, ..., I, II, III, ..., Venus, 007, `James Bond',
etc.). An entity, in predicate logic, may have zero, one or more names.
For example, `2', `II' and `two' denote the same natural number.
Likewise, `007' and `James Bond' denote the same person. Properties of,
and relationships among, entities are expressed as sentences (or assertions)
with the help of predicate symbols. The meaning of a predicate, in standard
predicate logic, is given by its extension, i.e. the set of all
tuples to which the predicate applies (forming a relation in the
sense of set theory).
Notice that in the meta-model of predicate logic, there is no distinction
between properties and relationships, both are expressed by predicates.
On the other hand, in first order predicate logic, there is a strict distinction
between basic entities (denoted by constants), functions and predicates
while in natural discourse, both functions and predicates are treated as
entities as well. These simplifications made by the predicate calculus
do not pose any problem in the domain of mathematics (in fact, they proved
to be very fertile). But in other, more empirical, application domains
they are cognitively inadequate. For such domains, standard predicate logic
is conceptually too flat, i.e. it does not offer sufficient support for
those concepts needed to capture the essential semantics of empirical domains.
Therefore, the research discipline of conceptual modeling in computer
science had to come up with new meta-models for being able to capture those
parts of domain semantics that are relevant for database and knowledge
system applications. The most fundamental of these meta-models is the Entity-Relationship
Modeling method proposed by Chen (1976).
1.3 Entity-Relationship Modeling
In entity-relationship (ER) modeling, basic entities are conceptualized
in conjunction with their properties and attributes which are (as opposed
to predicate logic) strictly distinguished from relationships between entities.
It is, however, straightforward to translate an ER model into classical
predicate logic: both entity and relationship types correspond to predicates,
and values from attribute domains correspond to constants. In fact, this
translation is performed whenever an ER model is implemented as a relational
database. In the current practice of database application development,
this methodology (of first constructing an ER model and then implementing
it with a relational database system) is state of the art. Notice, however,
that the ER meta-model is more general than the relational database model.
Neither does it require elementary attributes nor primary keys. It is therefore
also compatible with the object-relational database model
(see Chapter 3) where object-IDs replace primary keys, and attributes are
no longer required to be elementary but may have complex values or may
be derived by method invocation. While the original proposal of Chen (1976),
is restricted to the more basic ER constructs that can be easily implemented
in relational databases, we enrich the standard ER meta-model in such a
way that it can be used for the design of more general databases and knowledge
bases (including the important case of object-relational databases).
In addition to defining a semi-formal terminology, the ER modeling method
assigns visual representations to its concepts. The resulting ER diagrams
are a convenient and concise way of expressing an ER model.
Entities and Entity Types
An entity is something that occurs in our universe of discourse
(i.e. it may be the value of a quantified variable) and that can be uniquely
identified and distinguished from other entities. Notice that this characterization
refers to our universe of discourse and not to the `real world'. Although
it is commonly assumed that there is a close correspondence between the
two, and in the database literature the universe of discourse (or even
the content of a database) is often naively identified with `the real world',
we prefer to leave this issue to philosophy, and try to avoid any unnecessary
reference to such metaphysical notions like `the real world'.
Obviously, in an information processing system such as a database or
a knowledge system, one does not deal with the entity itself (in the ontological
sense) but rather with some representation of it. For linguistic simplicity,
however, the term `entity', in ER modeling, is used in its epistemic (representational)
sense. Examples of entities are the natural numbers 1,2,3,..., Leo (our
dog), James Bond (alias 007), Berlin, the planet Venus, my Alfa Romeo automobile,
the 200th birthday of Schubert, the letter `K', or the disease hepatitis
A.
Entities of the same type share a number of attributes and attribute
functions representing their properties or characteristics. An attribute
associates with each entity a stored value from its domain. An attribute
function associates with each entity a value to be computed by invoking
the function. Together, the values of all attributes of an entity form
the state of it. There may be different entities having the same
state. In that case, one needs an appropriate naming mechanism for identifying
and distinguishing entities independently of their state.
A standard name is a unique identifier assigned to an entity
during its entire life cycle. If a primary key is defined for an
entity type, it may be used as a standard name for entities of that type.
An example of a primary key is the social security number assigned to employees.
We assume, however, that entities can be identified independently of the
existence of a primary key. How this is achieved need not be defined in
the ER meta-model. Whenever an ER model is implemented in some practical
knowledge system, there has to be a method to assign standard names to
entities such as, e.g., primary keys in relational databases, or object
IDs in object-relational databases. In contrast to predicate logic,
in the ER meta-model and in database theory, it is commonly assumed that
all entities have (standard) names.
There may be certain restrictions on the admissible values for attributes,
called integrity constraints. A fundamental type of an integrity
constraint is a key dependency, that is a minimal set of attributes
whose values identify an entity. A key is a constraint since it implies
that there must be no pair of entities of the same type having the same
values for their key attributes. One of these key dependencies may be designated
as the primary key of the entity type. Natural primary keys, such as social
security numbers or flight numbers, are often used to identify entities
and to access them in an information system. Sometimes there is no natural
primary key, or even no key at all, in the definition of an entity type.
It is a matter of implementation, and not of conceptual modeling, how to
deal with this situation. In a relational database, an artificial primary
key attribute has to be added to maintain entity identity while in an object-relational
database an immutable object ID is automatically assigned to an entity
on its creation.
In summary, an entity type is defined by a name, a list of attributes
with associated domains, a list of attribute functions with associated
signatures and implementations, and a possibly empty set of integrity constraints
including key dependencies. Notice that an entity type, unlike the set-theoretic
notion of a data type, is not a purely extensional concept. There may be
two different entity types, say Lecturers and Researchers,
with exactly the same list of attributes (and attribute functions) and
possibly with the same extension. In contrast, two data type definitions,
say ULONG and DWORD, having the same extension are equal, by the axioms
of set theory.
At any moment, an entity type defined for a particular knowledge base
is associated with its current extension consisting of all entities of
that type which are currently represented in the knowledge base. An entity
type is like a container, and the associated extension is its content.
Thus, an entity type corresponds to a table in a natural way: the
columns of the table correspond to the attributes, a populated table contains
the current extension of the entity type, and a table row, forming an attribute
value tuple, corresponds to a specific entity.
Entity types are visually represented in the form of a labeled box which
may be partitioned into a head and a body part, if the attribute list is
to be included in the diagram. In that case, the name of the entity type
is written in the head of the box, and the attributes are listed in its
body.
The domain of an attribute is either an intensionally or an extensionally
defined set of possible values. Typically, an intensional domain
is defined by a formal grammar, while an extensional domain is defined
by an explicit set of values. Examples of intensional domains are the integers,
strings with at most 20 characters, or JPEG pictures. Examples of extensional
domains are the names of foreign currencies or the names of colors. An
extensional domain of an attribute of some entity type may be defined by
the extension of another (single attribute) entity type.
Notice that different entities may belong to different epistemic categories.
There are
-
concrete physical objects comprising passive objects (such as the Venus
or my Alfa Romeo), and physically embodied agents (such as the Star
Wars robot R2, our dog Leo, or James Bond),
-
software objects (such as the word processor window I am currently writing
in)
-
software agents (such as the Microsoft Office97 assistant `Clippit')
-
events (such as the 200th birthday of Schubert, or the mouse click to the
beginning of this line),
-
locations (such as Berlin),
-
analytical concepts (such as the natural number 1),
-
empirical concepts (such as the disease hepatitis A),
-
symbols (such as the letter `K').
The traditional ER modeling method has not been using any such classification
of entity types because it did not seem to be relevant in enterprise modeling
for business applications, which is the classical application domain of
ER modeling. In other domains, however, some of the above distinctions
may be essential for capturing more domain semantics. But even in business
domains, the distinction between objects and agents allows to capture more
semantics concerning such important concepts like business transactions
and roles.
Relationships and Relationship Types
A relationship among entity types is called a relationship type.
For example, attends is a relationship type used to name the particular
relationship that holds between a student S and a course C
if S attends C.
A relationship between two entities is symbolically represented as an
ordered pair labeled with the name of the relationship type. For instance,
the expression attends(Kim, Logic1)
represents a relationship of type attends between the entity
`Kim' of type Student and the entity `Logic 1' of type Course.
Mostly, relationships are binary, like attends, but in some
cases three or more entities are related which is represented by corresponding
n-tuples.
There are two special relationships between entity types which are independent
of the application domain and which are therefore distinguished from domain
relationships: subclass and aggregation relationships.
The subtype relationship between abstract data types has to be
distinguished from the subclass relationship between entity
types. Unlike objects in object-oriented programming, entities are not
instances of abstract data types (often called `classes' in OO programming)
but of entity types. Recall that an entity type is a named container with
an associated abstract data type, namely the tuple type defining the attributes
and attribute functions of all entities of that type. There may be several
distinct entity types associated with the same abstract data type. A tuple
type is a specialization (or a subtype) of another tuple type if
it has additional attributes. If an entity type E is a specialization
of an entity type D in the sense that every entity of type E
is also an entity of type D, then E is called a subclass
of D. This implies that every attribute of D is also an attribute
of E, and E may have additional attributes (i.e. the data
type of E is a subtype of that of D). The subclass relationship
between two entity types is sometimes called `isa relationship' because
we can say ``E isa D''. For instance, a sales engineer is
a sales person, and hence the entity type SalesEngineer is a subclass
of SalesPerson. We use the set-theoretic notation E isIncludedInD
for expressing a subclass relationship and visualize it by placing a subclass
below its superclasses and connect them with simple lines.
In an aggregation relationship, a number of part entities belong
exclusively to an aggregate entity and do not exist independently of such
an assignment. For instance, Room entities belong to a Building
entity, which is expressed as Building HAS Room. An
aggregation relationship type is visually represented by drawing the part
entity box within the aggregate entity box.
Depending on how many entities from one entity type can be associated
with how many entities of another entity type, three kinds of relationship
types are distinguished:
-
One-to-one: For each entity of either type there is at most one
associated entity of the other type.
-
Many-to-one from E1 to E2: Each entity of
type E2 is associated with zero or more E1 entities,
and each E1 entity is associated with at most one E2
entity. A many-to-one relationship from E1 to E2
is a (partial) function from E1 to E2.
-
Many-to-many: each entity of either type is related to zero or more
entities of the other type.
A relationship type is visually represented using a labeled diamond, where
the label consists of the name of the relationship type, connected to the
related entity boxes by straight lines. Its functionality type is visualized
by drawing crows feet for representing a many-side of the relationship
type. So, a many-to-many relationship type is represented by drawing crows
feet at the end of both connections to the related entity boxes. Since
relationships are also entities (of a higher-level), they may have attributes
as well. For example, if a relationship holds only for some time, there
may be an attribute for its duration.
Application Domain Predicates
Each name of an entity type and each name of a relationship type denotes
an application domain predicate which can be used to express the
properties of, and the relationships between, entities in the formal manner
of predicate logic. Thus, an ER model of an application domain defines
a formal language consisting of
-
the finite set of application domain predicates established by the entity-relationship
analysis, and
-
the set of all possible attribute values, i.e. the union of all attribute
domains, which forms the set of constant symbols.
In the sequel, we also briefly say domain predicate instead of 'application
domain predicate'.
ER Modeling Is Object-Oriented
ER modeling is object-oriented since it accommodates object identity, complex-valued
and virtual attributes (`methods'), aggregation, and subclass hierarchies
(`inheritance'). Many of the recently proposed object-oriented modeling
methods are just notational variants of ER diagrams.
Object-oriented methods often include attempts to model the behavior
of a system. Behavior modeling is a significant extension to the ER meta-model.
However, the proposed methods, such as Petri nets and state transition
diagrams, are not well-integrated with the constructs of state modeling
and therefore not satisfactory. Finding a modeling method that integrates
both state and behavior is still an open problem.
1.4 Identifying Qualified Predicates
Certain domain predicates, on closer inspection, may turn out to be associated
with some kind of qualification such as
-
a valid time span during which a predicate is asserted to apply
to a tuple,
-
a degree of uncertainty with which a property or a relationship
holds for an entity,
-
a level of security with which the information that a predicate
applies to a tuple is classified, or
-
an information source according to which a property or relationship
holds for an entity (in addition, a certain degree of reliability
may be associated with the information source).
Qualified predicates require specialized logics (such as temporal logic,
fuzzy logic, etc.) if their semantics is to be defined according to the
paradigm of classical predicate logic. They can be represented in tables
with designated columns containing the qualifications of tuples.
Qualified predicates are visualized in an ER diagram by including the
qualification as a special attribute at the end of the attribute list separated
by a line.
1.5 Identifying Incomplete Predicates
An ER model may contain predicates for which there is no guarantee
that complete information about them is available. For such predicates
the Closed-World Assumption (CWA) is not justified since not all
tuples to which the predicate applies are known. Thus, one cannot infer
that it is false that the predicate applies to a tuple simply because the
tuple is not recorded in the database. The CWA requires that the owner
of the database and its information suppliers have perfect control of the
represented domain and know everything about it. Whenever this strong assumption
is not plausible, some domain predicates may be incomplete (more precisely,
their representation in the database may be incomplete).
There are two kinds of incomplete predicates: total and partial
ones. While total predicates may, in principle, be completely represented
in a database, partial predicates may not; they are necessarily incomplete.
For partial predicates, falsity does not follow from non-truth: if it is
not
true that the predicate applies to a tuple, this does not already mean
that it is therefore false. In general, predicates may have
truth-value
gaps, and falsity is not equal to non-truth. The classical
law of
the excluded middle applies only to total predicates no matter if their
representation is complete or incomplete. It does not hold for incomplete
predicates that are partial.
Incomplete predicates are best explained by partial logic with
weak and strong negation. The extension of an incomplete predicate is represented
in a bitable consisting of an upper and a lower part. While the
upper part contains the truth extension, the lower part contains
the falsity extension of the represented incomplete predicate. Thus,
bitables allow to store both positive and negative information.
An incomplete predicate is visually represented in an ER diagram by
drawing a horizontal line dividing the head of an entity box or of a relationship
diamond in an upper and a lower part.
As an example, consider the information system of an organization that
wants to record the personal preferences of its employees towards each
other by means of a relationship type Likes. The information whether
an employee likes or dislikes another employee is obtained by interviews.
Obviously, the information collected in this way is incomplete: the employee
John need not like or dislike the employee Peter, he may be neutral with
respect to Peter. So, it does not follow from John's not liking Peter that
he does therefore dislike him. The predicate representing the Like
relationship type is incomplete, and, consequently, corresponds to a bitable
containing all pairs for which Likes hold, i.e. its truth extension
consisting of all pairs (x,y) such that x
likes y, and, in addition, all pairs for which Likes does
definitely not hold, i.e. its falsity extension consisting of all pairs
(x,y) such that x dislikes y.
1.6 Identifying Intensional Predicates
As opposed to an extensional predicate, an intensional predicate is not
represented by means of a table but is defined with the help of other (intensional
and extensional) predicates. The extension of an intensional predicate
is not explicitly stored in the form of a base table but has to be computed
from the extension of those other predicates occurring in its definition.
Intensional predicates are defined by means of deduction rules
consisting of a head (or conclusion) and a body (or premise).
They can be used to represent various kinds of relationships between concepts,
such as subsumption or synonymy, or to represent properties and relationships
based on heuristics and on default assumptions.
In summary, incomplete, qualified and intensional predicates extend
the concept of normal predicates and lead to conservative extensions of
relational databases. Since these extensions are orthogonal, they form
a rather large space of possible combinations.
Orthogonal extensions of relational databases
predicate category |
representation |
logic |
normal predicates |
relational database tables |
classical logic |
incomplete predicates |
bitables |
partial logics |
qualified predicates |
tables with designated columns & (temporal, fuzzy, etc.) |
specialized logics |
intensional predicate |
deduction rules |
constructive logics |
1.7 Entity-Relationship Modeling in 7 Steps
The following procedure represents a simplified method how to obtain an
ER model. Notice that whenever finishing one of the seven steps it
is a good idea to review the results of all previous steps.
Step 1: List all potential entity types.
Write a list of candidate entity types which covers the entire scope of
the application domain.
Step 2: Write a summary paragraph for each entity type.
Write a summary description of the knowledge base model by writing a summary
paragraph for each entity type identified in step 1. Use a restricted declarative
style of language and enumerate paragraphs and sentences.
Step 3: Identify relationship types.
Identify relationship types by analyzing the summary description and looking
for verbs. Start by looking for subclass and for aggregation relationships,
and after that identify the remaining domain relationship types. The result
of this should be three lists: a list of subclass relationships of the
form E1 isIncludedIn E2, a list of aggregation
relationships of the form E1 HAS E2, and a
list of domain relationship names together with associated entity types
using either infix notation such as E1 isRelatedTo E2
or prefix notation such as R( E1, E2, ...,En)
Step 4: Determine the epistemic category of all domain predicates.
For each predicate obtained from the list of entity types and the list
of domain relationship types, determine its epistemic category:
-
Is it a complete extensional predicate good for absolute assertions ?
-
Is it a qualified predicate such that assertions made with it are informationally
correct only relative to some qualification (like a valid-time span, a
degree of uncertainty, or a security level) ?
-
Is it an incomplete predicate for which we do not have sufficient information
competence in order to guarantee that its complete extension is available
?
-
Is it an intensional predicate whose extension is not explicitly stored
in the knowledge base but is derived from other predicates via deduction
rules ?
Step 5: Draw an ER diagram.
Draw an ER diagram using all entity types identified in step 1 and all
relationship types identified in step 3. Visualize the subclass hierarchies
by placing subclass entity boxes below their superclasses connecting them
with simple lines. If an entity type has several subclasses, indicate whether
they form a complete partition and whether they may overlap or have to
be disjoint. Visualize aggregation relationships by drawing the box of
the part entity type within the larger box of the aggregate entity type.
For each domain relationship type, determine whether it is one-to-one,
one-to-many or many-to-many, and draw the corresponding crow's feet in
the relationship connection lines accordingly.
Step 6: Determine the attributes of entities and relationships.
For each entity type and each relationship type, determine its attributes
together with their domains and its attribute functions with their signatures.
Mark complex-valued attributes and attributes with extensional domains.
Step 7: Complete the diagram by deriving additional entity types.
Inspect all attributes with extensional domains marked in step 6.
Determine whether they refer to an entity type rather than to a data type,
and derive additional entity types from this analysis. Add them to the
ER diagram together with appropriate relationship links.
1.8 Agent-Object-Relationship Modeling
This section contains a brief outlook to the future of conceptual modeling.
ER modeling does not account for the dynamic aspects of information
and knowledge processing systems. These aspects are related to notions
like communication, interaction, events, activities or processes. As pointed
out above, it may be useful to distinguish between different kinds of entity
types. In particular, for capturing semantic aspects related to the dynamics
of knowledge systems, it is necessary to distinguish between agents
and passive objects. While both objects and agents are represented
in the system, only agents interact with it, and the possible interactions
may have to be represented in the system as well.
Being entities, agents and objects of the same type share a number of
attributes representing their properties or characteristics. So, in agent-object-relationship
(AOR) modeling, there are the same notions as in ER modeling (such as attribute
domain, state, integrity constraint, key, etc.). In fact, objects in AOR
modeling are treated exactly like entities in ER modeling.
While ER modeling supports the design of object-oriented
information systems realized with the help of relational and object-relational
databases, AOR modeling allows the high-level design of agent-oriented
information systems. An agent-oriented information system represents
agents and their potential interactions in addition to ordinary objects.
It need not be designed as an agent itself, however. Thus, the AOR meta-model
is not intended as a methodology for the design of agent-based systems
in general but rather of agent-oriented information systems.
Interaction Frames and Agent Roles
From an enterprise information system (EIS) perspective, examples of agents
are customers and suppliers, as well as those employees whose role is to
interact with customers and suppliers. But also another EIS, e.g. involved
in automated electronic commerce, may be an agent from the perspective
of a system dealing with it. So, agents in this context may be natural
or artificial systems that are involved in some kind of business interaction.
If the preconditions and effects of these interactions are represented
in the EIS, the system is able to check the consistency and completeness
of business transactions.
Notice that it is a matter of choice whether to represent active entities
as agents or as objects. Agents are special objects. One may simply ignore
their agent-specific properties and treat them as objects. For instance,
if the supplier-to-customer interaction does not have to be explicitly
modeled in an enterprise information system. Customers may be simply represented
as objects in the same way as orders and bank accounts.
But even if it is not necessary to represent certain agent types in
the target system and track their interactions, it may be important to
consider them in the conceptual model in order to be able to derive a transaction
and process model from the possible interactions with that type of agent.
The possible interactions between two agent types (or roles, say
between customers and salespersons, or between patients and nurses) are
defined by an interaction frame which lists for both participating
roles:
-
All types of communication acts that may take place in the agent-to-agent
communication frame. Since in the perspective of a `listener', the communication
acts of other agents are events, we also speak of communication event
types.
-
All relevant types of physical actions which may be performed in
response to other actions of (or which are otherwise related to) the interaction
frame.
-
All relevant types of environment events which may be perceived
by an agent and which are related to the interaction frame.
-
A set of correctness properties defining the correct behavior of
agents within the interaction frame. Correctness properties are temporal
logic assertions expressing either a safety or a progress property. Typically,
these properties refer to communication events (or messages) and possibly
to the mental state of agents.
A communication event type is defined by its name and a formal parameter
list. A communication event is represented as a typed message together
with the sender and the receiver address. The message type corresponds
to the communication event type. There are domain-independent message types
such as request, tell, ask and reply. But message types may also be domain-specific.
In an electronic commerce interaction frame, examples of domain-specific
message types are: requestOffer, submitOrder and cancelOrder.
Agents may be related to objects and to other agents by means of relationships
as in ER modeling. An interaction frame can be viewed as a special agent-to-agent
relationship.
Mental Attributes
While the state of an object does not have any application-independent
structure or semantics, the state of an agent consists of mental
components such as beliefs and commitments. In AOR modeling,
the mental state of agents may be (partially) represented by means of mental
attributes. Whenever an interaction frame contains correctness properties
referring to the mental state of agents, the corresponding mental attributes
must be included in the AOR model.
1.9 Summary
Conceptual modeling methods support the definition of a formal language
and of associated integrity constraints needed to formalize an application
domain. Standard predicate logic has to be enriched by including further
conceptual distinctions, such as the entity-relationship dichotomy or the
agent-object-relationship trichotomy, in order to capture a sufficiently
structured formal model of an application domain.
1.10 Further Reading
Elaborate presentations of the ER modeling approach to relational database
design can be found in (Teo94, BCN92). Another interesting approach for
modeling information systems, focusing on the formal-linguistic analysis
of specification sentences in requirement documents, is proposed
in (NH89).
1.11 Exercises
Problem 1-1 Establish an ER model (and draw the corresponding
diagram) of the university domain consisting of the entity types Person,
Employee, Student, Course, Teacher, TeachingAssistant, and MasterThesis.
Notice that the involved predicates are all extensional and complete. However,
it is an option to model Student, Course and TeachingAssistant, as well
as the relationships associated with them, as being qualified with a valid
time.
Problem 1-2 Assume that the university is interested in
information about its employees being smokers or non-smokers. This information,
however, can only be obtained on a voluntary basis. Extend the university
ER model from the previous exercise in an appropriate way.
Problem 1-3 Establish an ER model of the hospital domain
consisting of the entity types Person, Patient, Room,
Physician, Diagnosis and Treatment. Notice that diagnoses are uncertain
and that information about patients (including their diagnoses and treatments)
must be associated with a security level to protect it from unauthorized
access.
2 The Fundamental Case: RELATIONAL DATABASES
The system of relational databases represents the most fundamental case
of a knowledge system. It already contains all the basic components and
operations of a knowledge system such as the distinction between the query,
the input, and the representation language, as well as an update, an inference,
and an inference-based query answering operation. It is one of the central
claims of this book that every knowledge system should be upwards compatible
with, and conservatively extend, the system of relational databases. Knowledge
systems satisfying this requirement are called vivid in Section
6.4.
After introducing the basic concepts of relational databases, three
major components of databases are discussed: queries, integrity constraints
and updates. It is shown that query answering is based on logical inference,
and that updates are subject to certain logical conditions involving the
specified integrity constraints. Then, the problem of null values in databases
is presented as a violation of the database completeness assumption. Finally,
we present Reiter's database completion theory and discuss the relation
of databases to predicate logic model theory.
2.1 Introduction
The main purpose of a database is to store and retrieve information given
in an explicit linguistic format. As opposed to certain other types of
information that are also processed in cognitive systems (such as humans),
this type of information is essentially propositional, that is,
it can be expressed in a formal language composed of names for entities
(or objects) and relationships, and of complex expressions formed from
elementary ones by means of sentential connectives, such as negation, conjunction
and disjunction. While only elementary pieces of information corresponding
to atomic sentences can be stored in a relational database, the retrieval
of specific information is achieved by submitting complex queries corresponding
to complex logical formulas. Thus, it becomes obvious that predicate logic,
with its account of formal languages composed of constant and relation
symbols, and of sentential connectives, is the mathematical basis of query
answering in databases.
Already in 1970, Edgar F. Codd published his pioneering article ``A
Relational Model of Data for Large Shared Data Banks'' in the Communications
of the ACM, where he defined the principles of the relational database
model. This was the first convincing conceptualization of a general purpose
database model, and it is not an accident that it relies on formal logic.
In the mid-eighties finally, IBM presented its database management system
DB2, the first industrial-strength implementation of the relational model,
which continues to be one of the most successful systems for mainframe
computers today. There are now numerous other relational database management
systems that are commercially available. The most popular ones include
Informix, Ingres, Oracle, Sybase and Microsoft SQL Server. To a great extent,
the overwhelming success of these systems is due to the standardization
of the database programming language SQL originally developed at
IBM in the seventies. SQL is a declarative language for defining, modifying
and querying relational database tables. SQL queries correspond to first
order predicate logic formulas.
While most well-established information processing systems and tools
such as programming languages, operating systems or word processors have
evolved from practical prototypes, the unprecedented success story of the
relational database model is one of the rare examples -- certainly the
most important one -- where a well-established and widely used major software
system is based on a formal model derived from a mathematical theory (in
this case set theory and mathematical logic.)
Logically, a relational database is a finite set of finite set-theoretic
relations over elementary data types (called tables), corresponding
to a finite set of atomic sentences. Such a collection of ground atoms
can also be viewed as a finite interpretation of the formal language associated
with the database in the sense of first order model theory.
The information represented in a relational database is updated by inserting
or deleting atomic sentences corresponding to table rows (or tuples of
some set-theoretic relation). Since a relational database is assumed to
have complete information about the domain represented in its tables, if-queries
are answered either by yes or by no. There is no third type
of answer such as unknown. Open queries (with free variables) are
answered by returning the set of all answer substitutions satisfying the
query formula.
2.14 Deficiencies of the Relational Database Model
Although relational databases represent a major breakthrough in the history
of information processing, they still have a number of deficiencies calling
for extensions and improvements. This concerns, in particular, the problems
of entity identity and of complex-valued attributes and attribute functions.
Also, as a requirement from conceptual modeling, the subclass (and inheritance)
relationship between entity types should be supported. Furthermore, practical
applications require base type extensibility and support for large data
objects. And finally, for incorporating more semantics into databases and
for knowledge management the support of rules is required. For expressing
procedural knowledge in the database schema independently of specific application
programs, the concept of reaction rules has to be supported. For
representing terminological and heuristic knowledge, there should be support
for deduction rules.
Besides these general purpose issues, there are several special purpose
extensions of relational databases that are required to handle qualifications
(such as valid time, security and reliability) and various forms of incompleteness
(such as incomplete predicates and disjunctive information).
Adequate Support for Standard Names
In SQL92 databases, entities are identified with the help of primary keys.
This requires that for any entity table, one of its keys is designated
as primary and used as the standard name of the entities represented
by the rows of that table. There are, however, cases where no natural key
is available at all, and therefore an artificial primary key has to be
introduced. Even more troublesome is the problem that an attribute participating
in a primary key is, like any other attribute, modifiable. If such an attribute
is, for some reason, modified, this leads to a change of the entity identity
which is clearly undesirable. A standard name should be a unique identifier
which is associated with an entity throughout its entire life cycle implying
that it must be immutable.
Complex-Valued Attributes
For simplicity, Codd has banned complex-valued attributes from the relational
database model (this requirement has been called `1. normal form').
Since in practice, many attributes have complex values, such as sets or
tuples, this restriction leads to rather unnatural representations.
Attribute Functions
There are attributes whose values are not explicitly stored in the database
but rather have to be computed in some way. These virtual
attributes are associated with a specific function that is invoked whenever
their value is queried. Relational databases, and SQL92, do not provide
any means to define attribute functions.
Subclass Hierarchy and Inheritance
In the conceptual model of an application domain, one often obtains hierarchical
subclass relationships between entity types. Such a subclass relationship
implies that entities of the subclass inherit all attributes from
the superclass entities by means of subtyping. However, this mechanism
is not supported in relational databases.
User-Defined Types and Large Objects
Since much of the information processed by human beings is of an audio-visual
nature, there is a growing tendency to include pictures, sound samples
and videos in the characterization of entities. Also, advanced software
applications create and manipulate complex documents such as spreadsheets,
project plans or hypertext documents, which may have to be included in
the characterization of entities as well. So, there is a need for special
attributes associated with user-defined data types of arbitrary length
that can be queried with the help of user-defined functions and predicates.
The BLOB (`binary large object') data type offered by most commercial database
systems is only an ad-hoc solution to this problem, since it is not well-integrated
with the other data types in the query language and access mechanisms.
Reaction Rules
SQL databases support a restricted form of reaction rules, called triggers.
Triggers are bound to update events. Depending on some condition on the
database state, they may lead to an update action and to system-specific
procedure calls. In Chapter 4, we propose a general form of reaction rules
which can be used to specify the communication in multidatabases and, more
generally, the interoperation between cooperative knowledge bases.
Deduction Rules
Deduction rules may be used for defining intensional predicates and for
representing heuristic knowledge. Intensional predicates express properties
of, and relationships between, entities on the basis of other (intensional
and extensional) predicates. Heuristic knowledge is typically represented
in the form of default rules which are most naturally expressed
using the weak and strong negation from partial logic. While relational
databases allow to define non-recursive intensional predicates with the
help of views, they do not support default rules or any other form
of heuristic knowledge.
Valid Time and Belief Time
In certain applications, it is essential to deal with temporally qualified
information. This includes both the time span during which a sentence (or
piece of information) is valid, and the time span during which a sentence
is believed and recorded in the database. The addition of valid time and
belief time requires special query capabilities which cannot be achieved
by using the temporal data types of SQL92.
Spatial and Geographic Information
Spatial information is related to geometric concepts like points, lines,
rectangles, polygons, surfaces, and volumes. Spatial database systems include
direct support of these concepts. They are used for realizing geographic
information systems.
Incomplete Predicates
There are domain predicates, especially in empirical domains, for which
the database completeness assumption is not justified. For these incomplete
predicates, the falsity extension is not simply the set-theoretic complement
of the truth extension, and therefore both positive as well as negative
cases have to be treated on par. An ad-hoc solution for incomplete predicates
in relational databases would be to represent them in two tables: one for
their truth and another one for their falsity extension. But there would
still be no support of this concept in the query and update mechanisms.
Security
SQL databases only support security requirements at the schema level, that
is, user-specific access restrictions can be defined for entire tables
or for sensitive columns which are then non-modifiable or non-observable
for unauthorized users. In practice, these possibilities are not sufficient
to protect confidential information. Rather, one needs contents-based access
restrictions preventing unauthorized users to query or update sensitive
tuples of a table.
Uncertainty and Reliability
There are domains where information is typically uncertain, or where the
information sources are not completely reliable. In these cases, it is
necessary to assign degrees of uncertainty to new pieces of information,
or to label them with their source which may be associated with a degree
of reliability.
Disjunctive Information
The most difficult extension of databases and knowledge bases concerns
the integration of disjunctive information. While the kind of incompleteness
created by disjunctive information is an important characteristic of commonsense
knowledge representation and reasoning, theoretical investigations have
not succeeded yet in establishing convincing semantics and efficient computation
methods for storing and retrieving disjunctive information. Questions about
the right semantics arise when disjunctive information is combined with
negation (on the basis of the database completeness assumption) and rules.
Efficiency problems arise from the combinatorial explosion created by disjunctive
information.
2.15 Summary
The system of relational databases represents the most fundamental case
of a knowledge system. Consisting of elementary-valued tables, a relational
database is restricted to atomic sentences. More complex tables and more
complex sentences, such as negated, disjunctive or qualified sentences,
which are needed to represent various forms of incomplete and qualified
information, are not admitted. They require specific extensions of relational
databases leading to object-relational, temporal, fuzzy, deductive, and
other advanced database and knowledge systems.
3 Object-Relational Databases
Object-relational databases (ORDBs) have evolved from relational databases
by adding several extensions derived from conceptual modeling requirements
and from object-oriented programming concepts. It is expected that the
application of ORDBs will outnumber that of relational databases in the
near future because they allow the seamless integration of multimedia data
types and large application objects such as text documents, spreadsheets
and maps, with the fundamental concept of database tables. Many object-relational
extensions will be included in SQL3, the successor to SQL92.
Historically, the successful application of object-oriented programming
languages such as Smalltalk, C++ and Java, has led to the development of
a number of object-oriented database (OODB) systems which
support the storage and manipulation of persistent objects. These systems
have been designed as programming tools to facilitate the development of
object-oriented application programs. However, although they are called
`database systems', their emphasis is not on knowledge representation but
rather on persistent object management. Any database concept which is
intended as an implementation platform for information systems and
knowledge representation must support tables as its basic representation
concept on which query answering is based. Tables correspond to extensional
predicates, and each table row corresponds to a sentence or proposition.
This correspondence is a fundamental requirement for true database systems.
If it is violated, like in the case of OODBs, one deals with a new notion
of `database system', and it would be less confusing to use another term
instead (e.g. `persistent object management system') as proposed by (Kim
1995).
One can view the evolution of relational to object-relational databases
in two steps:
-
The addition of abstract data types (ADTs) leads to ADT-relational
databases consisting of complex-valued tables. ADTs include user-defined
base types and complex types together with user-defined functions and type
predicates, and the possibility to form a type hierarchy where a
subtype of a tuple type inherits all attributes defined for it.
-
The addition of object identity, object references and the possibility
to define subtables within an extensional subclass hierarchy results
in object-relational databases consisting of complex-valued tables and
object
tables.
There are two notable differences between object-relational databases and
object-oriented programming:
-
Object IDs in ORDBs are logical pointers. They are not bound
to a physical location (like C++ pointers).
-
In addition to the intensional subtype hierarchy of the type system, ORDBs
have an extensional subclass (or subtable) hierarchy that respects the
subtype relationships defined in their type system.
7 Temporal Databases
Temporal databases represent the practically most important logical extension
of relational databases since temporal information plays an essential role
in many application domains. It is expected that the temporal database
language standard TSQL2 proposed in (Snodgrass 1995) will be included in
future versions of SQL. We show how the notions of a database table,
of natural inference and of table operations have to be generalized in
order to accommodate temporally qualified information.
8 Fuzzy Databases
Uncertain information shows up in various ways, typically not in business
and administration domains but rather in more empirical domains such as
in medicine, criminology, and in all kinds of empirical research. After
discussing different types of uncertainty, we present a fuzzy database
model for storing and retrieving gradually uncertain information.
8.1 Introduction
In various application domains, one encounters uncertain and imprecise
information. While the fuzzy set theory of (Zadeh 1965) allows to
handle imprecise or vague information, fuzzy logic and possibilistic
logic, which are based on the possibility theory of (Zad79,
DP80, DLP94), form the theoretical basis for processing gradually uncertain
information.
Classical probability theory, based on the axioms of Kolmogorov, on
the other hand, makes very strong assumptions. In many practical cases,
it is therefore not applicable to the problem of handling uncertain information.
Also, probabilistic methods are computationally much more costly than fuzzy
methods because they are non-compositional.
Imprecise information often comes in the form of disjunctive or vague
sentences, while uncertain information is often expressed in the form of
sentences qualified by a degree of uncertainty like, for instance,
`very
likely' or `with 60% certainty'. Sometimes, such a qualification
is called a subjective probability or a degree of belief.
A particular form of uncertainty is given by statistical information.
Sentences qualified by statistical uncertainty are based on statistical
samples measuring relative frequencies. An example of such a sentence is
"80% of all jaundice patients have hepatitis". Through their empirical
grounding, these sentences are more reliable than sentences qualified by
a mere subjective degree of belief.
In the literature, there is no clear taxonomy of uncertainty in databases
and knowledge bases. An obvious distinction, however, concerns the uncertainty
expressed at the level of attribute values, or at the level of the applicability
of a predicate to a tuple. While the former is related to the issue of
handling vagueness, the latter is related to an account of certainty-qualified
sentences. In our formal exposition, we do not treat vagueness expressed
by fuzzy-set-valued attributes, but only the more fundamental issue of
gradual uncertainty based on fuzzy relations over ordinary (`crisp') attributes.