Relational Databases Are All About Maths
Updated: Jun 2, 2018
Here is an interesting fact, relational databases are named after the mathematical relation model SQL is based on and not
relationships created by foreign keys. Why is this imortant you ask? Well, knowing the relational database maths behind SQL will help you write better SQL queries, trust me.
When I was initially taught about databases and SQL there was this one paragraph in the course book about how it all came about because of mathematics. And that was it, an important bit of how all this works summed up in one paragraph.
This is also the reason most people think relational databases are named after the relations between the tables. All this math behind it is never really discussed or taught. Ok, so let’s get into it.
Sounds a bit fishy
Relational databases are based on the relational model created by Edgar F. Codd in 1969. I know, it sounds a bit fishy and yes, all pun intended. The model is based on the mathematical branches of set theory and predicate logic. Click on the links if you want know the finer details. I’ll try and explain it as simple as possible, but hey, it is maths, i.e. not that simple.
The relational model uses relations that are represented by a set of headers and a set of tuples. In database terms this relates to tables with columns and rows. Set theory comes in with the headers and tuples being seen as a whole set. In maths terms tables = set theory (see what I did there with the maths bit).
A predicate is an expression that when attributed to some object, makes a proposition either true or false. For example, “age greater than 50” is a predicate. The relational model uses predicates to enforce integrity and filter data. In maths terms queries = keys = predicate logic (Just did it again, add an ‘oops’ and you can call me Britney).
There are a bunch of rules around set theory and predicate logic and understanding how they work will help you write better SQL. Here are two important ones:
A set should not contain duplicate rows or columns.
There is no relevance to the order of the elements in a set
Following the relational model
You should write SQL in terms of the relational model and not procedural concepts. Using procedural concepts usually leads to logic that uses iterations. Using the relational model you write relational operations that return relational results. Refer back to the two rules above.
Simple little example, if you need to get all the employees who have been working for a company for longer than 5 years you write a query to return the employees you don’t iterate over all the rows to find all the employees.
Although this is a simple example I often see iterative logic used. I can only guess because a developer didn’t take the time to figure out the SQL or took the time to design a relational model that works. How many times has a SQL query become a huge headache 6 months down the line as the number of rows started to increase? We have all heard the horror stories.
I leave you with this interesting question. What do you think is the biggest mistake developers make when writing SQL queries?