Nearly anyone who goes in to IT or computer science must learn about databases… but what about every other person in this world? Databases are actually extremely common, yet so few people know about them. This blog series will not only help you understand completely what a relational Database is, but will allow you to design your very own.

Let’s get started!
The role of a database is to store data (singular = datum). A datum is a fact, statistic, or a piece of information. Data is a collection of facts, statistics, or pieces of information. What does this mean in the practical world? Data can be anything from a record of users and passwords, shipping addresses, documentation of sales, information about students and colleges, etc. Data can be anything!

If you are reading this I am assuming you are either considering building a database for

  1. Your own business or service,
  2. For another’s business or service (maybe you’re looking for a new job),
  3. You are in school and are required to learn about databases.

There are a multitude of reasons for you to build a database, but a database is not for everyone. By the time you are done reading this article you will know if a database is what you need to learn about.

When it comes to the practical world, a database is what stores information for websites, colleges, businesses, etc… A database is usually accessed through a programming language (we will be discussing this in future blogs). The data in a database is stored on a hard drive on a server and using a database allows us to see it logically even if files are randomly stored on hard drive sectors. It takes the randomly stored files on a hard drive and converts it to an understandable format.

A visual representation of a database.

The database stores information on a hard drive but presents it in a readable way (tables).

Now that we know what a database is, let’s look at a relational database. A relational database is a specific type of database (there are other kinds of databases). The thing that sets a relational database apart from every other kind of database is that it stores everything in relations.

A relation shows how two or more things are connected. The term relation comes from mathematics. We won’t be getting too much in to the mathematics part so don’t freak out, but here is the basics of a relation in mathematics:

Relations in Mathematics

First we take all possibilities of combining two sets of numbers. for example, here are two sets, X and Y:

X = {5, 3},

Y = {2, 4}.

X multiplied by Y = {(5,2), (5,4), {3,2), (3,4)}. This is called the Cartesian product. An individual parenthesis is called a tuple, we have 4 tuples:

  • (5,2)
  • (5,4)
  • (3,2)
  • (3,4)

When we multiply two sets (such as X and Y), sometimes each parentheses group, such as (5,2), is called an ordered pair and is in the format (X, Y).You can see that the first number in each ordered pair is from the X set (5 or 3) and the second number in each ordered pair is from the Y set (2 or 4).

Here is the easiest example of a cartesian product.  We simply connect every number from the first parenthesis to every number in the second parenthesis.  You may here this referred to as multiplying the sets, but we are not multiplying any of the individual numbers within the set.  We would NOT multiple 5 x 2 to get 10, for example.

Here is the easiest example of a cartesian product. We simply connect every number from the first parenthesis to every number in the second parenthesis. You may hear this referred to as multiplying the sets, but we are not multiplying any of the individual numbers within the set. We would NOT multiple 5 x 2 to get 10, for example.

Any subset of the Cartesian product is known as a relation:

R = {(5,2), (5,4)} A relation consists of 1 or more tuples from the Cartesian product, in this case, 2.

Relations in Database Design and Development

In database design, rather than having X and Y, we have entities and attributes. An entity is anything we store data about. Each attribute is a fact we store about the entity. If we had a database that stored customers’ IDs, names, addresses, and phone numbers, the customer would be the entity and the ID, name, and address would each be an attribute:

Entity (what we are storing information about): Attributes (facts about the entity):
Customer
  1. Customer ID
  2. First name
  3. Last name
  4. phone number

 

Applying the simplified concept of relations to database design, we get:

Customers = {(ID 1), (ID 2), (ID 4), (ID 8), (ID 5)} (random numbers I picked)

Attributes of the customers = {The customer’s ID, the customer’s first name, the customer’s last name, the customer’s phone number}

Customers multiplied by Attributes of the customers =

  • { (1, John, Johnson, 123-234-3454),
  • (2, Billy, Billison, 321-432-5434),
  • (4, Cletus, Billy, 123-764-2314),
  • (8, Caleb, Curry, 800-432-1241),
  • (5, See, Squared, 542-123-1322) }

Just like in mathematics, each point above is a known as a tuple, giving us a relation with 5 tuples having 4 attributes for each tuple :

Here is a classification of how a relationship works.  We have a relation consisting of multiple tuples and each tuple has multiple attributes about that tuple.

Here is a classification of how a relation works. We have a relation consisting of multiple tuples and each tuple has multiple attributes about that tuple.

Here you can see what our relation looks like when we insert our actual values for the tuples and attributes. You can also see that we use the customer ID as a way to name each tuple.  This is because the customer ID will always be unique and is the best way to know which tuple we are describing.  Imagine if we used full names to describe each tuple, we could potentially have two tuples or more if multiple people have the same name, but if we gave each person a unique ID and used it to refer to each person we would not have to worry about this problem.

Here you can see what our relation looks like when we insert our actual values for the tuples and attributes. You can also see that we use the customer ID as a way to name each tuple. This is because the customer ID will always be unique and is the best way to know which tuple we are describing. Imagine if we used full names to describe each tuple, we could potentially have two tuples or more if multiple people have the same name, but if we gave each person a unique ID and used it to refer to each person we would not have to worry about this problem.

Database relations differ from mathematics relations in that the order of the attributes does not matter. In mathematics, the values, (5,2) and (2,5) are completely different. In Database design, having the values (Caleb, Curry) and (Curry, Caleb) is completely acceptable because they are both talking about the same person (in this example, the person with the ID of 8…we use an ID because more than one person can have the name Caleb Curry).

A relation is visually expressed through tables:

Customer ID First name Last name Phone number
1 John Johnson 123-234-3454
2 Billy Billison 321-432-5434
4 Cletus Billy 123-764-2314
8 Caleb Curry 800-435-1241
5 See Squared 542-123-1322

 

A table, once again, is a physical representation of a relation…it shows us all of the data in what looks like a spread sheet form. The column headers tell us the attributes and travel up and down. The rows show us an individual tuple. Memorize this:

  • Relation = table
  • Tuple = row
  • Attribute = column

Here is a relation for student information:

Student ID First Name Last Name
642 Caleb Curry
647 Caleb Basal

 

If you study this relation, you will see that we have two students, Caleb Curry (ID 642) , and Caleb Basal (ID 647).

The use of relations sets relational databases apart from the rest because you can now have multiple relations in a database. These relations are connected in someway, but they are not about the same entity. Let’s look at this:

Instructor:

Instructor ID First Name Last Name
6546 Matt Martin
6654 Jake Johnson

 

Class:

Class ID Class Title Instructor
1253 Biology 101 6546
4326 Biology 201 6654

 

Now we have an additional two relations, Instructor and Class. Using this information we can figure out that the class with the ID of 1253 has the instructor with the ID of 6546. This tells us that Matt Martin teaches Biology 101 (see bold below):

Instructor ID First Name Last Name
6546 Matt Martin
6654 Jake Johnson

 

Class:

Class ID Class Title Instructor
1253 Biology 101 6546
4326 Biology 201 6654

 

In addition to these two tables, we still have the students table.

Once you build your database, you can run queries (singular = query). A query is an action on a database that returns results, such as returning every professor who is teaching more than 5 classes who has been an employee for more than 10 years. You can do these complex things with a Relational Database Management System (RDBMS), which we will be talking about in the next blog post. Because we have to learn programming techniques, a database requires more training than learning how to fill out a spread sheet. The plus side is that we can figure out complex things very easily once we learn enough about running queries.

Here are some examples of what we can obtain from this data using relational database queries:

  • What classes each student is taking
  • How many students and which students are taking a specific class
  • Return what classes a specific instructor teaches
  • Every professor who has been an employee for 10 years with atleast a 95% attendance rating who teaches atleast 3 classes where the class average GPA is atleast 3.5…
  • so forth!

This is all done by the RDBMS! We don’t have to sit there and run statistics with a pen and pencil for 20 hours. These examples are all related in some way because we are talking about a school. What we just did was show very basic database design for the database “school.” For your knowledge’s sake, this blog was written almost entirely in English, in future blogs I will be discussing common naming and design conventions (capitalization, spaces, plural/singular titles, etc…).

A database is often combined with a programming language to separate the front end of a process with the back end.

  • Front end – Collects data. This is what the user sees.
  • Back end – Stores and manipulates that data.

Doing this allows customization in who can access data (adding a layer of security), and the database can easily be changed without changing the front end of an application.

A database requires a lot of design to prevent anomalies (out of the ordinary and different than what you expected), repeating data, and null (empty) values. Before basic database designs are finalized and programmed, they are normalized to get the absolute best database structure. Normalization is the process of removing potential structural problems in a database. This can be a lot of work if you are a beginner but greatly improves productivity and saves money and energy trying to fix integrity problems that come with spreadsheets (and yes…we will be talking about all of this in future blogs).

Do I need a Database?

If you are storing a large amount of information with a lot of different categories, you definitely need a database. Using spreadsheets or some weak application to store everything you need will ultimately hurt you in the long run…especially when you decide you need a database and have to spend millions upon millions of hours transferring all of your information to a database (okay…maybe not millions, but you get my point).

If you are starting an online business using your own back end system, then yes you need a database.

If you are looking for a way to automate a lot of storage and statistical problems, then yes you need a database.

If you want to speed up your storage systems, then yes you need a database.

If you want to save a lot of space, then yes, you need to get a database! Always remember to have backups though!!!

If you are tired of losing papers and sheets and charts and statistics, then yes you need a database.

If you are looking for layers of security where only certain people can access certain information, then get a database!!!!!

if you…yes, you need a database.

If you are are running a small business only storing the phone numbers of your customers and you are using only a small fraction of your time with information gathering, then spreadsheets or pre-made applications are what you need! This is assuming that you will never grow to a larger business. Do not waste your time and energy building a database to store the information of 10 clients or customers.

Conclusion

  • A relational database is what you are looking for if you are looking for flexibility, complexity, and sophisticated queries.
  • A relational database is called a relational database because it is built using relations.
  • A relation is physically represented by a table. It lists all of the attributes of tuples.
  • IDs keep each tuple unique.
  • relations are often called tables, tuples are often called rows, and attributes are often called columns.
  • Relations are connected using IDs.
  • Databases are designed and then normalized to improve structure and prevent anomalies.

Further Study:

 

This site is powered by DreamHost, offering world class web hosting and support!

Incredible Web Hosting

When you sign up through clicking this image, you will be supporting this website.