.

Читать онлайн.
Название
Автор произведения
Жанр
Серия
Издательство
Год выпуска
isbn



Скачать книгу

taken a class from Maier on the magic of Logic Programming based on the notes for Computing with Logic. The earliest use we have found is a manuscript by Porter from October 1985 [Porter 1985]. We note that the name “Datalog” was also applied about the same time to a natural-language system for database query (written in Lisp) [Hafner and Godden 1985]. That use came from “database dialog,” and does not seem to have a connection to Prolog.

      In any case, by mid-1986, Datalog as the name for a database language had been established. Viewed from the programming language side, it was a restricted version of Prolog, with no function symbols and no extra-logical predicates, such as cut and var. From a logical perspective, it was based on definite Horn clauses over predicates. (A definite Horn clause is a disjunction of literals with exactly one positive literal.) As a database query language, it resembled domain relational calculus. There is some variation across authors about whether or not “pure Datalog” includes negation; here we will assume not. We do assume Datalog programs are safe; safety conditions need to be modified appropriately for some of the extensions below.

       1.4 Extensions to Datalog

      While pure Datalog has a simple syntax and a clean theory, it does lack direct support for features commonly found in database languages: negation (and set difference), arithmetic, sets and multi-sets, aggregation, updates, and typing and constraints. (There are also system capabilities such as concurrency and recovery that are orthogonal to expressiveness. We do not cover them in this chapter.) There have also been extensions to add object-oriented features (such as object identity, type hierarchies, and nested structures) and higher-order capabilities (such as variables over predicates). We treat negation in some detail, as it has a large effect on semantics and evaluation techniques, and touch on other extensions in varying degrees of detail.

      Pure Datalog cannot express all queries that the relational algebra can because, in particular, it lacks an analog for the difference operator (which requires negation in logic). For example, to find pairs of people who do not share a common advisor, we would want to use a negation operator and write

Image

      or something similar.7 One approach would be to extend Datalog to use a form of logic more general than definite clauses, and use an evaluation approach based on more general theorem-proving methods. However, such a generalization brings with it much higher computational costs, as the simple proof scheme (called linear resolution) is no longer complete for more general logics. In fact, this “classical” approach to negation does not easily extend to how negation is used in databases. Prolog makes use of the notion of “negation as finite failure,” whereby the negation of a goal is established if attempting to establish the goal fails. This approach is a form of default negation: if something cannot be determined to be true, it is assumed by default to be false. Traditional databases can be seen as using such a form of default negation, if we assume a relation is understood as a set of true facts. If a database query asking whether “Michael” is an employee finds no employee fact for “Michael,” then it answers “No,” that is, “Michael is not an employee.” It does not answer “I don’t know,” which would be an answer according to the classical logic, since neither “Michael is an employee” nor “Michael is not an employee” is logically implied by the set of (positive) employee facts in this case.

      The meaning of default negation is straightforward in traditional databases, but becomes problematic when queries are recursive. The simplest definition that raises questions has the form:

Image

      Interpreted as a rule to define shavesBarber in some “real world,” this seems to be saying that if shavesBarber is false in our world, then the body of the rule is true, and so it implies that shavesBarber is true. If, on the other hand, shavesBarber is true in our world then the body of the rule is false, and so there is no way to show that shavesBarber is true. Having failed to establish truth of this fact, by negation as failure one might conclude that shavesBarber should thus be false.

      The first proposal from the Prolog community to precisely define the meaning of negation in arbitrary rule sets was by Clark [1978] and is known as Clark’s completion. The idea is to “complete” the set of rules (thought of as implications) by essentially turning them into if-and-only-if rules (after combining clauses defining the same predicate using disjunction), and then to use first-order logical implication to determine which goals are true and which are false. So under the completion, the rule above becomes shavesBarber ⇔ ¬shavesBarber, which is inconsistent and has no models. So this situation can be taken to mean that shavesBarber is neither true nor false. This result seems reasonable for this particular one-rule program, but suppose we have a perfectly reasonable and consistent program and one adds the additional shaves-not-shaved rule above. Suddenly the entire database becomes inconsistent just because of one rule—even if this rule has nothing to do with the rest of the database.

      But problems with Clark’s completion do not end there. Consider the following positive rule:

Image

      What should this program say about iAmHappy? Under the usual semantics for positive rules, iAmHappy would be false since it is not in the minimum model of this program. Alternatively one can see that this rule (being a tautology) says nothing about anything (including iAmHappy) being true, so by default reasoning iAmHappy should be false. But if we take the completion of this program, we get iAmHappy ⇔ iAmHappy which is a tautology, so neither iAmHappy nor not iAmHappy is a logical consequence of this completed program. Thus, iAmHappy is not determined to be either true or false. This situation might not seem much of a problem for this program, but it has deeper consequences for other, more interesting programs. Consider a program defining the transitive closure of an edge relation:

Image

      If the edge relation has self-loops, there will be instances of the second rule that simplify to the form reachable(a,b) :– reachable(a,b). For example, if we simply have the single edge fact, edge(b,b), and take the instance of the second rule using X = a, Y = b, and Z = b, then edge(b,b) can be simplified away leaving the self-recursive form. In order to get the right answers for pairs of nodes that are unreachable, we must treat such an instance as determining that reachable(a,b) is definitely false, not that it is unknown. Thus, this example shows that the completion semantics gives the wrong answer for the fully positive program for transitive closure, i.e., the program above, under the completion semantics, does not define transitive closure. It is particularly unpleasant that this attempt to extend the semantics of definite programs (the least model) to programs with negations changes the meaning of definite programs. But this situation indeed is not surprising when we observe that the completion of a Datalog program is a first-order theory and that transitive closure is not first-order definable. And since, for many in the database community, the ability to define transitive closure was one of the key motivations for adding recursion to a query language, the completion semantics appeared to be a wrong semantics for Datalog programs with negation. Its problem was not even with negative recursion, but with positive recursion.

      So the search commenced for a semantics of logic programs with negation that would extend the least-fixed-point semantics of positive rule sets. Nicolas and Gallaire [1977] described the alternative approaches to formalizing database semantics in logic, using a theory or an interpretation. The completion semantics used a theory. The following semantics use interpretations.

      The first approach was to restrict the set of programs to those that were not problematic. If a program does not involve recursion through negation, there is no problem.