Datalog is a deductive database query language. Datalog allows for recursive queries and definitions which can not be represented in traditional SQL or relational algebra.
Datalog programs are made up of 2 general components:
Facts form the initial "state" of the database they are statements which we hold to be true.
For example we can express the relationship between children and their parents as follows:
Parent("Dee", "Jan")
Parent("Jan", "Jamie")
Parent("Dee", "Wally")
Parent("Wally", "Jean")
Dee's parents are Jan and Wally. Jan's parent is Jamie and Wally's parent is Jean.
Rules describe how to deduce additional new facts based on the current facts.
Given our previous facts we can define a new relationship Grandparent as follows
Grandparent(A, C) :- Parent(A, B), Parent(B, C)
The comma between facts can be thought of as the and of the two, both facts must be true for the rule to be true (ie. B must be a parent of A and C must be the parent of B for C to be the grandparent of A).
If we instead wanted to find the direct relatives of a person we could do the following:
DirectRelative(A, B) :- Parent(A, B)
DirectRelative(A, B) :- Parent(B, A)
Repeating rules with the same name corresponds to a or between the two rules. In the above example DirectRelative would give any parents or children (the first defines parents of A, the second defines children of A). The variables names do not need to be the same between rules for the or to apply just the name of the rule needs to be.
The above rule gives us a new set of facts after it's evaluated (this can be thought of as a query):
Grandparent("Dee", "Jamie")
Grandparent("Dee", "Jean")
Dee's grandparents are Jamie and Jean as those are the parents of Jan and Wally respectively.
If variables aren't needed in a rule they can be replaced by underscores:
Projected(N) :- Product(_, N, _, _, _)
Facts can be negated within a rule by preceding them with the NOT keyword:
AdoptedGrandchild(C, A) :- NOT Parent(A,B), Parent(B,C)
Some implementations of datalog don't deduce all possible results when rules are defined but only do so after a query:
?- Grandparent("Dee", X)
This query will return "Jamie" and "Jean" and asks who the grandparents of "Dee" are.
Selection in datalog can be accomplished in two ways, either by repeating a variable or through using a constant.
Consider the relations / rules defined as Company(cid, name, stock price, country) and Purchase(buyer-sin, seller-sin, store, pid)
In relational algebra we can select only the Canadian companies as follows:
The equivalent in datalog would be expressed as
CanadianCompanies(C, N, S, "Canada") :- Company(C, N, S, "Canada")
In relational algebra we can select only the purchases with the same buyer and seller as follows:
The equivalent in datalog would be expressed as
NullSales(B, B, S, P) :- Product(B, B, S, P)
Projection in datalog can be accomplished by creating a new rule with only the desired attributes.
Consider the relation / rule defined as Product(pid, name, price, category, maker-cid)
In relational algebra we can project just the name of this relation as
The equivalent in datalog would be expressed as
Projected(N) :- Product(P, N, PR, C, M)
Joins can be approximated in datalog using a common variable between different rules.
Consider the relations / rules defined as Purchase(buyer-sin, seller-sin, store, pid) and Person(sin, name, phone number, city).
In relational algebra we can use a join to find the stores where Fred made a purchase from
The equivalent in datalog would be expressed as
StoresFredBoughtFrom(S) :- Person(BSIN, "Fred", T, C),
Purchase(BSIN, L, S, P)