Skip to main content

Relational Algebra: Fundamental and Extra Operations Explained

·763 words·4 mins
Author
Alessandro Ferrini

We will get a concise overview of the seven core relational algebra operations: Select, Projection, Union, Intersection, Difference, Product, and Join and also of the five extra operations that are used inside all Relational Database Management Systems.

This article is based on the lessons I teach about DBMS at Fondazione PIN in Prato in 2025.

The Fundamental Primitives
#

We can consider relational algebra to be a procedural language, meaning it describes the procedure to follow in order to obtain a result. It defines seven fundamental primitives, which are the basic operations that allow us to manipulate and query data in a relational database.

Each operator takes one or more relations as input and produces a new relation as output, allowing us to chain operations together.

σSelect
πProjection
Union
Intersection
Set Difference
×Cartesian Product
Join

Select (σ)
#

Select (σ) filters the tuples of a relation based on a condition.

Example: σ (age > 30) (People) - Selects only the people whose age is greater than 30.

Projection (π)
#

Projection (π) selects specific attributes of a relation. It can reorder attributes, remove unwanted ones, or manipulate values.

Example: π (FirstName, LastName) (People) - Returns only the first names and last names from the People relation.

Union (∪)
#

Union (∪) produces a relation that combines two compatible relations (with the same schema), removing duplicates.

Example: Students ∪ Teachers - Merges students and teachers into a single relation.

Intersection (∩)
#

Returns only the tuples present in both relations.

Example: Employees ∩ Managers - Finds those who are both employees and managers.

Set Difference (−)
#

Returns the tuples that are in one relation but not in the other.

Example: Students − Graduates - Students who have not graduated.

Cartesian Product (×)
#

Combines every tuple of one relation with every tuple of another.

Example: Students × Courses - Associates every student with every possible course.

Join (⨝)
#

Combines two relations based on a common attribute.

Example: Students ⨝ Registrations - Joins students and registrations on the ID column.

Other Operations
#

In addition to the seven fundamental primitives, there are other operations that we will briefly cover:

Rename (ρ)
#

Changes the name of a relation or its attributes.

Example: ρ (New_Students, Name → Student) (Students) - Rename the relation to New_Students and the attribute Name to Student.

Assignment (←)
#

Stores the result of a relational expression in a temporary variable.
Example: Temp ← σ (age > 30) (People) - Store in Temp the people older than 30.

Duplicate Elimination (δ)
#

Removes duplicate tuples from a relation.
Example: δ (π (Name) (Students)) - Returns only unique student names.

Aggregation (G)
#

Calculates aggregate values such as COUNT, SUM, AVG, MIN, MAX on groups of tuples.
Example: G (Class) COUNT(*) (Students) - Counts the number of students for each class.

Sorting (τ)
#

Sorts the tuples of a relation based on an attribute.
Example: τ (LastName ASC) (Students) - Sorts students by last name in ascending order.

Division (÷)
#

Finds the elements of one relation that are associated with all the elements of another relation.
Example: Students_Who_Attended_All_Courses ← Students ÷ Required_Courses - Finds the students who have attended all the required courses.

Relational Algebra and RDBMS
#

All RDBMSs (Relational Database Management Systems) implement the seven fundamental primitives and the first five extra operations.

Relational algebra is considered a procedural language; even if we represent it with mathematical notation, we can easily see how to implement it. It clearly defines the steps we want the DBMS to perform to fulfill our request. For example, consider these two expressions:

  • σ (age > 30) (R ⨝ S)
  • (R ⨝ (σ (age > 30) (S)))

We can do a join between two relations and then filter for age > 30, or filter relation S for age > 30 first and then do the join.

These two queries will have a clear performance difference, depending on how our data is structured and how much data there is. Which approach do you think is better, and why?

  1. σ (age > 30) (R ⨝ S)
  2. (R ⨝ (σ (age > 30) (S)))

The second is better; it is generally wise to filter as much as possible before performing the join so you have fewer tuples to compare. N.B. For relational algebra itself, there is no difference; it only describes the way things must happen.

The relational model is independent of the implementation of the query language. SQL is the de facto standard, although it exists in various dialects.