Skip to content
December 12, 2011 / ecschroder

Notes: Relational Databases, XML Data

Phew, just finished the database class that Stanford offered free! online! this semester! Good times. A few notes…


Data manipulation languages (DML) are used to query and modify databases. They let you do ad-hoc queries in high-level languages.

Queries are performed on relations, and they return relations. (In other words, they are compositional and closed.)

Query languages include SQL and XPath. SQL is based on the principles of relational algebra. We could say that relational algebra is a formal query language, while SQL is an actual/implemented query language.

Relational Databases

A database consists of a set of relations (or tables). Each relation has a set of named attributes (or columns). Each tuple (or row) has a value for each attribute. (The value could be NULL, or empty.)

A data schema is the structure of the database; an instance is the data itself.

NULL is a special type of value. It means “unknown” or “undefined.” Be careful when querying data sets with null values; null values don’t satisfy the normal query rules. In other words, neither of these statements is true:

* NULL > 3.5

* NULL ≤ 3.5

If you were looking for records of a database which satisfy the condition GPA > 3.5, you won’t find Sally Sue, who has an unknown GPA (NULL). But she won’t turn up when you query for GPA ≤ 3.5, either.

A key is an attribute for which every value is unique. Or, to be more precise, a key is a set of values, for which every combination is unique. Lots of database systems will automatically add an ID attribute to serve as the key for a relation.

XML Data

XML: Introduction, Well-Formed XML

XML is a standard format for data representation and exchange. XML also has a streaming format. The formal spec for XML is enormous, so this course will only cover the most important concepts.

The basic constructs of XML are: tagged elements, attributes, text (a.k.a. “leaf elements of the tree”).

Here’s how the Relational Model and XML compare:

Relational Model XML
Timeline Has been around for a while. A lot newer than the relational model.
Structure Tables Hierarchy
Queries Simple, nice languages (well-developed over time) Less so…
Ordering None Implied
Implementation Native (mature systems handle it natively) Add-on (hasn’t been around very long)

Well-formed XML adheres to basic structural requirements:
– single root element
– matched tags, proper nesting
– unique attributes within elements

Displaying XML is a snap. Use rule-based language to translate it into HTML. Then apply styles via:
– cascading style sheets (CSS)
– extensible stylesheet language (XSL)

DTDs, IDs and IDREFs

DTDs can be written in the same document as the XML, or it can be kept in a separate file.

XML documents must meet some criteria to be Well-formed or Valid:

Well-formed XML Valid XML
Single root element Meets criteria of well-formed XML, plus…
Matched tags, proper nesting
Unique attributes within elements
Adheres to a content-specific spec (such as DTD or XSD)

Document Type Descriptors (DTD) are a grammar-like language used for specifying elements, attributes, nesting, ordering and the allowed number of occurrences of certain elements.

Reasons to use DTD/XSD (= benefits of typing):
– programs can assume that there’s a structure to the data
– CSS/XSL can assume that there’s a structure to the data
– there will be a specification for data exchange
– it makes documentation easier

Reasons not to use DTD/XSD (= benefits of no typing):
– flexibility, ease of change
– DTDs can be messy, irregular
– XDS can be very messy

Here’s an example of a DTD:
<!DOCTYPE Bookstore [
<!ELEMENT Bookstore (Book*, Author*)>
<!ELEMENT Book(Title, Remark?)>
<!ELEMENT Remark (#PCDATA | BookRef)*>

There are some differences between XML schemas and DTD. For example, in XML schema, you can have typed pointers. Not so in DTD.

XML Schema

XSD is written in XML, with special tags. It’s an extensive language. Like DTDs, XSD can specify elements, attributes, nesting, ordering and the allowed number of occurrences of a certain element.

XSD is always kept in a separate file. When using an XSD, you’ll need two files: a schema file and a data file. On the other hand, DTD can be written in the same file (at the top) or in a separate file.

There are 4 features of XSD that are not found in DTD:
– typed values
– key declarations
– references
– occurrence constraints


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: