Skip to content
December 12, 2011 / ecschroder

Views, Transactions, Recursion, OLAP

These are some notes from the database class that Stanford offered online & free. Good times…

Views

Consider this three-level vision of databases:

Real world applications have lots of views. Views can be used to:
– hide some data from some users
– make some queries easier / more natural
– modularize database access

Views are important for customizing user authentication. Using views, it’s possible to give a user only to the tuples that meet certain criteria. (Views with aggregation are not possible.)

Modifications to views are a little tricky. Modifications must be rewritten to specify how they will modify the base tables. There are two ways to handle view modifications.

(1) Rewriting process specified explicitly by view creation.
PRO: can handle all modifications
CON: no guarantee of correctness (or meaningfulness)

(2) Restrict views and modifications so that translation to base table modifications are meaningful and unambiguous.
PRO: no user intervention
CON: restrictions are significant

Aggregation views are used for queries only. It doesn’t make sense to modify the aggregation view. What sort of modification would you make to the base tables? For example, to change the average GPA from 3.5 to 4.0, whose GPA would you increase?

Likewise, projections and distinct are keywords that can only be used in view queries, but not modifications.

Instead-of triggers are used to catch modifications to views and translate them into meaningful modifications to the underlying base tables.

Authorization

On a UNIX file system, the basic privileges are read, write and execute. The relational model is more complex than the UNIX file system, so it makes sense that there were be different (and more complex) types of privileges.

Types of privileges

There are nine types of privileges in SQL: SELECT, INSERT, DELETE, UPDATE, REFERENCE, USAGE, TRIGGER, EXECUTE, and UNDER.

SELECT, INSERT, DELETE and UPDATE are the basic privileges that apply to a relation.

REFERENCE is the right to use a relation in an integrity constraint.

EXECUTE is the right to execute code (for example, a persistent stored module).

UNDER is the right to create subtypes.

Obtaining privileges

The creator of a relation is the relation’s owner. The owner has all privileges and may grant privileges to others.

GRANT privilege_name
ON object_name
TO {user_name |PUBLIC |role_name}
[WITH GRANT OPTION];

The grant option allows the user who received privileges to grant the same privileges to others. A user may revoke privileges with keywords CASCADE or RESTRICT.

Transactions

What happens if the system fails in the middle of a set of changes? How should sets of changes be interleaved?

A transaction is a collection of one or more operations on the database that must be executed atomically. In other words, all of them are done or none of them are done.

SQL defines four different isolation levels. From least stringent to most stringent, they are: read uncommitted, read committed, repeatable read, and serializable.

Recursion

SQL is not a Turing complete. In other words, its not possible to solve any computation problem with the data-manipulation rules available in SQL. However, SQL has the benefits of being simple, convenient and declarative. It is expressive enough for most database queries. Basic SQL cannot express unbounded computations.

One type of problem that is tricky in SQL: how to find all of Mary’s ancestors, given the relation ParentOf(parent,child)? If we know how many generations out data encompasses, it’s not problem. (We can simply use a SELECT statement UNION another SELECT statement, rinse and repeat once for each generation.) The trickiness comes into play when we don’t know how many generations are involved.

The SQL With statement is useful for recursive problems. It’s like setting up temporary views.

With R1 as (query-1),

R2 as (query-2),

Rn as (query-n)

<query involving R1, …, Rn (and other tables)>

Using the With statement, we can write recursive queries. Here’s an example:

With Recursive
R as (base query
Union
recursive query)
<query involving R (and other tables)>

Here’s an example of a linear recursive query based on the relation Flights(airline,frm,to,departs,arrives):

WITH RECURSIVE Reaches(frm,to) As
(SELECT frm, to FROM Flights
UNION
(SELECT R1.frm, R2.to
FROM Reaches R1, Reaches R2
WHERE R1.to = R2.frm)
SELECT * FROM Reaches;

This same query, written in datalog, looks like this:

[1] Reaches(x,y) ← Flights(a,x,y,d,r)
[2] Reaches(x,y) ← Reaches(x,z) AND Reaches(z,y)

(In human-speak, rule [1] gives us all of the direct flights, while rule [2] gives us flight combinations that include a layover. It’s a classic recursion problem.)

The first SELECT statement in the above query corresponds to datalog rule [1]. The second SELECT statement (the recursive part) corresponds to datalog rule [2].

Note: Remember, the Union operator in SQL automatically eliminates duplicates. That’s different from how the Union operator works in relational algebra. Relational algebra is based on the set model, while SQL is based on the multiset model.

Linear recursion allows only one reference to R in the recursive statement. If there are multiple references to R in the recursive statement, it is said to be non-linear. Non-linear recursion has some benefits. Namely, queries look cleaner and they converge faster. However, non-linear recursive statements are harder to implement.

Say we have a graph with one linear path through 1000 nodes. If we want to get all nodes using a linear statement, it will add nodes one at a time and require 1000 steps. If we use a non-linear statement, nodes will be added in just log2(1000) = 10 steps.

The SQL standard only requires linear recursion. It is implemented only in PostgreSQL (not MySQL or SQLite). Non-linear recursion is not supported by any of these three DBMS systems. In fact, it’s not even part of the SQL standard.

Hubs and Authorities was a nifty little algorithm developed around the same time as Google page rank for searching web pages.

A unary relation is a relation with only one attribute.

A statement S is a monotone if S adds one or more tuples to a relation R, or leaves R unchanged, but does not delete from R. (Remember, from differential eq., monotonic functions are entirely non-increasing or entirely non-decreasing. The first derivative does not change the sign. Similarly, monotonic sequences are such that ai+1 > a1 for every i > 1 or ai+1 < a1 for every i < 1.)

On-Line Analytical Processing

There are two broad classes of database activity. Online Transaction Processing (OLTP) consists of short transactions and simple queries that touch small portions of data. They imply frequent updates. On the other hand, Online Analytical Processing (OLAP) consists of long transactions and complex queries that touch large portions of the data. They imply infrequent updates.

OLTP and OLAP are not completely discrete. In fact, we can think of a spectrum of transactions between short and simple (OLTP) and long and complex (OLAP).

Data warehousing is the concept of bringing data from OLTP sources into a single “warehouse” for OLAP analysis.

Decision support systems (DSS) …

A star schema consists of one large fact table with many smaller dimension tables. The fact table is updated infrequently and is only append-only. The dimension tables are updated frequently.

Two types of queries were added to SQL to handle OLAP.
– with cube is not yet implemented in any DBMS.
– with rollup is implemented only in MySQL. It is similar to summaries in Excel pivot tables. With rollup is useful for showing the hierarchical structure of attributes. (For example, attributes state, county, city can be “rolled up” into state.)

For large-scale datasets, using summary tables from the cube can be orders of magnitude faster than using data tables themselves.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: