## Chapter 6: Relational Algebra (yum!)

*plan: expect midterm posted Monday or Tuesday; due Monday following spring break. Open note, open Net….*

SQL is based on relational calculus, which asks for results but doesn’t specify *an order of operations*. Relational algebra is the basic set of operations for the relational model, more prescriptive, spelling out the steps. We’re looking at RA here because that’s what query optimizers (Chapter 15) use to perform the transformations necessary.

All objects in RA are relations: no duplicate tuples (records)! SQL doesn’t mind duplicates, eliminates them with UNIQUE or DISTINCT. Sorting to removing duplicates is expensive! In the old days, we didn’t want to incur that expense, so the guys who came up with SQL let duplicates slide.

sigma σ = SELECT

- subscript like WHERE clause
- horizontal partition: selects rows
- results always have fewer or same number of tuples as original R
- commutative

PROJECT = pi π

- vertical partition: selects columns
- note! this is like a straight SQL SELECT, without WHERE
- results must be set of tuples, no duplicates
- results always have fewer or same number of tuples as original R
- not commutative

You can write RA ops in a big nested expression, or you can step it and name the intermediate results

RENAME = rho ρ

- renames table and/or columns

UNION ( ∪ )

- the two relations we unite must be “type compatible”: same number of attributes, and each pair of corresponding attributes must be type compatible (have same or compatible domains: strings, CHARS, integers, etc…. it’s the datatype that matters, not the field name: default is to take on the field name of the first table)
- Note that usually what you’ll do is PROJECT two disparate tables, then UNION the results, since we don’t leave a bunch of type-compatible tables sitting around in the database

Type compatibility is a requirement for all three set operations, UNION, INTERSECTION ( ∩ ), and DIFFERENCE ( – )

UNION and INTERSECTION are commutative

CARTESIAN/CROSS PRODUCT ( × )

- has lots of tuples: multiply tuple count from each relation
- add attribute count from each relation
- usually not a meaningful relation in itself!

THETA JOIN, EQUIJOIN, NATURAL JOIN (NJ removes duplicate attributes)

SQL.SELECT = PROJECT

SQL.FROM = JOIN

SQL.WHERE = SELECT

We used to use rule-based query optimization, rules of thumb; now we use statistics-based query optimization: we have stats on the tables that inform our choices (see Chaper 15).

Relational databases took off because we can mathematically prove that stuff will work, using relational algebra! It is not the most efficient model (JOINs take time!), but that’s outweighed by the formal definitions that make stuff work.