# Relational Algebra, SQL, Relational Design Theory

These are some notes from the database class that Stanford offered online & free. Good times…

## Relational Algebra

Relational algebra forms the basis of database query languages. Operations on a relation produce a relation. The concept was known, but ignored, until E.F. Codd introduced the relational model in the 1970s.

Relational algebra can be used on finite relations only (unlike relational calculus, which allows operations on an infinite formula).

Certain operations are primitives of relational algebra. These core operations were defined by Codd. Primitive operations are: selection, projection, Cartesian product, set union, set difference, and rename. Other operations are derived from these six operations. The abbreviated operations are: intersection, division, natural join, and many more.

## SQL

Here’s a basic select statement in SQL. This example use relations Student(sID, sName, GPA, sizeHS) and Apply(sID, cName, major, decision).

SELECT sName, major

FROM Student, Apply

WHERE Student.sID = Apply.sID

Relational algebra constitutes the formal underpinnings, while SQL is a practical implementation. There are differences. For example:

– Relational algebra, which is based on the set model, allows duplicate values.

– SQL, which is based on the multiset model, does not allow duplicates.

#### Datalog

Database logic is called datalog and consists of if/then rules. The rules return TRUE or FALSE. For example, R(a1, a2, … , an) is TRUE if R(a1, a2, … , an) is a tuple in R. (See also: predicates, atoms, head, body, subqueries.)

Relations are represented by predicates. A predicate the name of a function that returns some (Boolean) values. An atom is a predicate followed by its arguments.

Given a relation R(a,b) with tuples (1,2) and (3,4), we can say

R(1,2) = TRUE

R(3,4) = TRUE

R(5,7) = FALSE

Predicates can take variables or constants. Using the example above, R(1,z) = TRUE if z=2.

The basic syntax of datalog is:

head ← body

The body consists of several subqueries, linked together with AND or NOT operators. Using datalog rules, we can construct new tuples with the tuples that satisfy subqueries of the body. Imagine iterating through all of the values in the body. If some combination of values makes the body = TRUE, then add that tuple to the relation whose predicate is the head.

#### SQL in a Server Environment

You can embed SQL in a program written in ordinary programming languages (e.g. C). One critical issue: how to move data between SQL relations and variables of the “host” language.

Persistent stored modules are pieces of code that are stored in the database and executed by the user.

A call-level interface allows you to program in conventional languages, and use a library of functions to access the database.

#### Three-tier architecture

A three-tier architecture looks something like this:

Database servers run database management systems (DBMS) and perform queries and modifications.

Application servers run the business logic of the system.

Web servers connect clients to the database system.

## Relational Design Theory

### Properties and Normal Forms

Normalization is the process of reducing redundancies in relational database design. Some normal forms are 3NF, Boyce-Codd Normal Form, and 4NF. Of those three, 3NF is the least restrictive and 4NF is the most restrictive.

Evaluate functional dependencies to test for Boyce Codd Normal Form.

Evaluate multivalued dependencies to test for Fourth Normal Form.

#### Functional Dependencies and Boyce Codd Normal Form

Relations are said to be in Boyce Codd Normal Form if, for every functional dependency A → B, then A is a key.

Functional dependencies are a generally useful concept.

– data storage, compression

– reasoning about queries, optimization

– generalization of the concept of keys

Definition of Functional Dependency

For every t,u in R

If t.A = u.A then t.B = u.B

Then A → B

Functional dependencies are based on knowledge of the real world. All instances of a relation must adhere to them.

#### Multivalued Dependencies and 4NF

Relations are said to be in Fourth Normal Form if, for every multivalued dependency A →→ B, then A is a key.

Rules to know:

– Splitting rule

– Combining rule

– Trivial dependency rules

– Transitive rule

– Closure of attributes

### Closure and Keys

Question: Is a set of attributes A a key for R?

Answer: Compute the closure of A, called A+ . If it’s equal to the set of all attributes, then A is a key.

Specifying Functional Dependencies (FDs) of a relation

S1 and S2 are sets of functional dependencies.

S2 “follows from” S2 if every relation instance satisfying S1 also satisfies S2.

S2 : { SSN → priority }

S1 : { SSN → GPA, GPA → priority }

(Hey linguists, computing closure is sorta like making sentences from phrase structure trees in syntax. Start with the head, then use sets of rules–ehem, functional dependencies–to add branches.)

How to test? Does A → B follow from S?

1. Compute the closure A+ based on functional dependencies S. Check if B is in the set.

2. Armstrong’s Axioms

Want: the minimal set of completely non-trivial FDs such that all FDs that hold on the relation follow from the dependencies in this set.

#### Boyce-Codd Normal Form

a.k.a. 3 ½ Normal Form

Decomposition of a relational schema…

Relational design by decomposition

– “mega” relations and properties of the data

#### BCNF decomposition algorithm

Input: Relation R and Functional Dependencies for R

Output: Decomposition of R into BCNF relations with “lossless join”

1. Compute keys for R (using FDs)

2. Repeat until all relations are in BCNF:

Pick any R’ with A → B that violates BCNF.

Decompose R’ into R1(A,B) and R2(A,rest)

Compute functional dependencies for R1 and R2

Compute keys for R1 and R2

Note: Sometimes you don’t want to decompose relations into BCNF. Larger relations can be useful, because you won’t have to join then back together for complicated queries. But, you have to take the query load into consideration.

#### Multivalued Dependency

Multivalued dependencies are based on knowledge of the real world data that is being captured by the database. All instances of the relation must adhere.

[ see notes for diagram ]

Multivalued dependencies are sometimes called tuple-generating dependencies.

Know these rules:

– FD-is-an-MVD rule

– Intersection rule

– Transitive rule

#### Fouth Normal Form

Relation R with MVDs is in 4NF if, for each nontrivial A →→ B, A is a key.

### Summary

Question: Why is a functional dependency called “functional”?

Answer: Because we can imagine some function f such that f(movie,year) = length. In other FD terms, movie, year → length.

Here’s a tip for decomposition: start by decomposing the largest functional dependency first. You’ll get a better design in the end.

Remember, decomposed != better design. You may need to access certain attributes together when querying the data, and the decomposed form will necessitate many joins. It may be better to leave those attributes together in the same relation.

Multivalued dependencies are a subset of functional dependencies.

Functional dependencies and BCNF : When we have relation R(A,B,C) with functional dependency A → B, we can factor out dependencies so we don’t repeat them over and over.

Multivalued dependencies and 4NF : relation R(A,B,C,D) with multivalued dependencies A →→ B. Consider the multiplicative effect of combinations. Put those in a separate relation.

4NF is a stronger design that BCNF.

When designing a database schema, remember that many designs are possible. Some are much better than others. Relational design theory helps us plan our schema. Normal form can lead to “good” relations. We learned how to design by decomposition. Usually, this process is intuitive and works well. But, it does have some shortcomings including dependency enforcement, query workload and over-decomposition.

## Leave a Reply