Why Normalise Database Tables?

Many GUI and web developers I have spoken to, do not fully understand the value to be gained from normalising their database designs. It is, after all, easier to build the tables to look exactly like the form or web page they are building.

The question that usually arises when we as database “specialists” question them is: but what is wrong with that design? my code works, doesn’t it?

The obvious answer to this would be “yes, your code works”, but does it really?

Designing tables in this manner will eventually have serious drawbacks, including, but not limited to, redundancy, wasted space and loss of data integrity.

As an example, we will use an extract of one table, from a failed company (the reason for it’s eventual failure will become apparent in due course).

Definition of Order Table

Order Table Definition

Now the referenced programmers defend their design, stating that it is normalised, as it does have a primary key, so the rows are uniquely identifyable.

OK, let us look at that statement.

Yes, there is a primary key. Now, does having a primary key make a table Normalised?

The anser is yes, provided it also complies to First Normal Form (1NF) and Second Normal Form (2NF) rules. This table does not comply to 1NF rules, which state that:

> the table is a faithful representation of a relation
> it is free of repeating groups

(see here for a definition of 1NF)

So, How do we go about normalising this data set?

> First, let us eliminate the repeating columns, by placing all these columns in their own table, which will look as follows once complete:

Remove repeating columns

Remove repeating columns

Now, looking at the resulting tables, we can see they comply to 1NF.

Now, let us check that they comply to 2NF, which states:

> Identify a candidate key;
> Ensure those records are supportive of said candidate key, i.e. those that are dependent on the candidate key, with no partial dependencies.

for Orders, a good candidate key would be OrderNo (which we will make into a Primary Key, as it must, of needs, remain unique) and for PartsPerOrder, OrderNo and Part make a good choice for candidate key (we will create a surrogate key (viz. PartPerOrderCode) as the primary key, with a non clustered index covering the candidate key(s) (along with the respective foreign keys)

Thus the structure now complies to 2NF .

(see here for a definition of 2NF)

Now we need to go further. Are there any fields in the tables that do not belong (i.e. do not describe the record in the table)?

The PartPerOrder table still has a field for the Sales Person’s mobile phone. If the salesperson changes their Mobile number, this will need to be updated in each and every order that person was involved in. That could (and quite possibly will) lead to data anomolies.

How do we now eliminate this data?
By applying the rules for Third Normal Form (3NF).

We can ask if this data may already exist in another table. Shouldn’t the salesperson contact information therefore be stored in the Staff table?

Checking the Staff table, we do indeed find a column labelled “MobilePhone”, which should, presuming it has been completed correctly, contain the person’s correct mobile number.

Removing this and using the column in the Staff table gives us the following:

Now there are no redundant data columns in the tables, which gives a much cleaner design (see here for a definition of 3NF).

Is it possible to improve the data?
Yes, we can proceed to the next level of Normalisation.
This next level of normalisation is termed Boyce-Codd Normal Form (BCNF) (also called 3.5 Normal Form (3.5NF) by some theorists).

The aim here is to ask if a column really describes the table and, if not, move it to another table (and, in so doing, probably eliminate existing nullable columns).

In the Staff table, there are columns that do not explicitly describe the staff member as a person, viz. their contact details (Extention and MobilePhone).

We therefore create a new table to carry the staff member’s contact information, and remove them from the Staff table. This could be called StaffContactDetails and will appear as follows:

Boyce-Codd Normal Form

Now there are really no redundant data columns visible, and the number of nullable columns is greatly reduced (see here for a definition of BCNF).

The nullable columns that are left add value where they are, as they do describe the tables and will, eventually, be filled in (e.g. when an order is finalised and paid).

We could, however, normalise this design further, but we need to ask ourselves if this will add value in terms of both storage space and performance (for details of the other levels of normalisation, refer to Fourth Normal Form (4NF), Fifth Normal Form (5NF) and Sixth Normal Form (6NF)).

The programmer would now scream and shout, stating that his code will not work, he cannot represent this “shatterred” data in his front-end, he will have to start all over again, etc., etc. ad nausium.

Whereupon we, as the database “guys” will simply answer: “Haven’t you ever heard of a view or stored procedure?”.

In any event, using our example, why did the company fail?
> what happens when a client has a large order, wherein they request numerous different parts (substantially more that 5)?
– There will be one order generated and paid for for each 5 parts required. If the client required 100 different parts, there will be 20 orders completed, each one have to be settled independently.
– This is expensive and time-consuming, not to mention frustrating.

– Most people will rather go to a competitor, who has a simpler and more streamlined process (with a well designed database), who can proces all parts in one order, with one payment.

When the developer was asked why it was done this way, he replied that management assured him no-one would order more that 5 parts at a time – this points back to:
> not understanding the business nor the environment in which the client operates, and
> poor requirement gathering skills.

Requirement elicitation will be handled in a future post to this blog…..

 

This entry was posted in Database and tagged , . Bookmark the permalink.