Eroxl's NotesGraph
Datalog

Datalog is a deductive database query language. Datalog allows for recursive queries and definitions which can not be represented in traditional SQL or relational algebra.

Syntax

Datalog programs are made up of 2 general components:

Facts

Facts form the initial "state" of the database they are statements which we hold to be true.

For example we can express the relationship between children and their parents as follows:

Parent("Dee", "Jan")
Parent("Jan", "Jamie")
Parent("Dee", "Wally")
Parent("Wally", "Jean")

Dee's parents are Jan and Wally. Jan's parent is Jamie and Wally's parent is Jean.

Rules

Rules describe how to deduce additional new facts based on the current facts.

Given our previous facts we can define a new relationship Grandparent as follows

Grandparent(A, C) :- Parent(A, B), Parent(B, C)

The comma between facts can be thought of as the and of the two, both facts must be true for the rule to be true (ie. B must be a parent of A and C must be the parent of B for C to be the grandparent of A).

If we instead wanted to find the direct relatives of a person we could do the following:

DirectRelative(A, B) :- Parent(A, B)
DirectRelative(A, B) :- Parent(B, A)

Repeating rules with the same name corresponds to a or between the two rules. In the above example DirectRelative would give any parents or children (the first defines parents of A, the second defines children of A). The variables names do not need to be the same between rules for the or to apply just the name of the rule needs to be.

The above rule gives us a new set of facts after it's evaluated (this can be thought of as a query):

Grandparent("Dee", "Jamie")
Grandparent("Dee", "Jean")

Dee's grandparents are Jamie and Jean as those are the parents of Jan and Wally respectively.

Anonymous Variables

If variables aren't needed in a rule they can be replaced by underscores:

Projected(N) :- Product(_, N, _, _, _)

Negation

Facts can be negated within a rule by preceding them with the NOT keyword:

AdoptedGrandchild(C, A) :- NOT Parent(A,B), Parent(B,C)

Explicit Queries

Some implementations of datalog don't deduce all possible results when rules are defined but only do so after a query:

?- Grandparent("Dee", X)

This query will return "Jamie" and "Jean" and asks who the grandparents of "Dee" are.

Relational Algebra Equivalents

Selection

Selection in datalog can be accomplished in two ways, either by repeating a variable or through using a constant.

Example

Consider the relations / rules defined as Company(cid, name, stock price, country) and Purchase(buyer-sin, seller-sin, store, pid)

Constant

In relational algebra we can select only the Canadian companies as follows:

The equivalent in datalog would be expressed as

CanadianCompanies(C, N, S, "Canada") :- Company(C, N, S, "Canada")

Repeated Variable

In relational algebra we can select only the purchases with the same buyer and seller as follows:

The equivalent in datalog would be expressed as

NullSales(B, B, S, P) :- Product(B, B, S, P)

Projection

Projection in datalog can be accomplished by creating a new rule with only the desired attributes.

Example

Consider the relation / rule defined as Product(pid, name, price, category, maker-cid)

In relational algebra we can project just the name of this relation as

The equivalent in datalog would be expressed as

Projected(N) :- Product(P, N, PR, C, M)

Joins

Joins can be approximated in datalog using a common variable between different rules.

Example

Consider the relations / rules defined as Purchase(buyer-sin, seller-sin, store, pid) and Person(sin, name, phone number, city).

In relational algebra we can use a join to find the stores where Fred made a purchase from

The equivalent in datalog would be expressed as

StoresFredBoughtFrom(S) :- Person(BSIN,  "Fred", T, C),
                           Purchase(BSIN, L, S, P)