In this article, we will see what is Normalization in DBMS and what is the need of Normalization.
Let us take one relational table called student and see the problems in that table or relational, also we will see how to resolve those problems, then automatically you will be able to understand what is the need for Normalization in database.
Let us see the practical example.
Here I am taking student table (student relation/ or student relational schema) .
Here there are duplicate entries in the department information.
I have highlighted the data where it has redundancy.
Here we have 8 records in our table what if we got millions of records?
It is a problem right?
Let us say we have 100 students in our table for the CSE department, for each row we have to repeat the same department information.
But what is the point of repeating information again and again?
This is what redundancy.
Why this problem occurs because here I have taken larger schema and stored more information in only one table.
In this table we have 6 attributes but assume that there are other attributes related to the department like HOD and HOD phone number etc, we need to repeat that information as well for all the topples.
What is that? That is called redundancy.
Problem is what ? It is a larger schema .
Now If we decompose our table into smaller tables then maybe we can reduce the redundancy.
The redundancy is the main cause of all the problems.
Data Inconsistency, data Insertion problems, data deletion problems, data update problems and all the further problems will arise because of redundancy.
If you reduce or remove the redundancy what will happen?
Remember I am not saying that we can remove the redundancy completely using Normalization in SQL Server or any other database.
It can reduce the redundancy, means it can reduce the duplicate data.
So if we are able to reduce the redundancy, obviously we are able to reduce the problems that will occur because of redundant data.
What problems can occur if we have redundant data. There are three problems which may occur because of redundancy.
What we call is Insert Anomaly, Update Anomaly and Delete Anomaly.
Anomaly is what?
When data has multiple copies at one place there are chances that we update the data at one place and we may forget to update the same data in another place.
If this is the scenario, we can’t say which data is correct. In one place we have some value and other places we have different values for the same data.
For suppose, I want to insert department information in the student table something like the department name, mechanical, building B1, and Room Number 120.
Here other columns we have credits, name, and sid. We have to insert all these records as null values right?
All other columns are fine. What about SID? It is the primary key.
As we know we cannot insert the null value on the primary key. Hence the insertion is not possible here.
In other words, we cannot insert department information unless we have at least one student enrolled in that department.
Why so? We are taking all the information in one table only. That is insertion anomaly.
This is a problematic situation maybe students are not there, students are not yet enrolled in that department but we need department information.
Here we are not able to insert the department information in this student table because student ID is a primary key.
We want to insert some information in a table but we cannot able to insert the required information. Why so? For inserting that information we need some extra information.
If you don’t have that extra information, you cannot insert the information in the student table. That is wrong.
I hope you got what is insertion anomaly.
Consider that the CSE department is shifted from B1 to C1 and the room number is 301. You have to update this information in all the topples where the department is CSE.
Why so? While creating this table or this relational schema we have to insert the information of this department once for each student of that department is present, that is how the table has been designed.
In our example we got 3 topples, but in the real world we may have millions of records.
In that case it is very difficult to update each topple.
By chance, we have updated 2 topple and we forgot to update the other topple then that would be inconsistency and which leads to inconsistency problems.
So now the data is in an inconsistent state this is not a good database design. This problem is called update anomaly.
What is delete anomaly? let us take one example.
Suppose I want to delete one student’s information. In our example let us say I want to delete record number 6.
If you want to delete the SUMA record then obviously you have to delete the complete record (topple). When you are going to delete the SUMA record obviously the department associated with SUMA will be deleted.
We have only one student who is there in this department, obviously when you are deleting a student means the student has left the college it doesn’t mean that we are going to delete the department from the college right?
Obviously that is not the case. We have to delete the ESE department and there is no other students are associated with this department then it is a problem?
This is called Delete Anomaly.
How we can solve the Anomaly problems? What is the solution for Anomaly?
Suppose we decompose the student table into two different tables and see.
I have divided the student table into two tables, one is a student and the other one is the department.
The student table is having information related to students and the department table is having information related to the department.
Now we can say we have removed all the problems.
Why so? We have removed the redundancy?
If you can observe the department table, Here we are storing the department table information only once. Whereas in the student table, it is repeated for each student.
These two tables students and departments are related to each other. Here we are working with a relational database management system all the information is stored in the form of relation and each relation is related to other relation.
How these two tables are related? with the help of primary key and foreign key relation.
In the student table, we have a department name and the department table we have a department name.
If we are divided the larger schema into a smaller schema or we can say smaller tables then obviously we have to write down some rules.
In other words, the department name is the primary key in the department table for this relation and the department name in the student table is a foreign key.
Now here if we apply the Anomalies.
Insert anomaly, Obviously you can insert ME record in the department table. This means no need to insert any other information related to ME .
But it is not like that we have divided the two tables we can insert any record in any table or we can delete any record in any table.
Here we will also have some referential integrity constraints that will be applied on the foreign keys.
Update Anomaly problem, As discussed I want to update building B1 as C1 and room 101 as 301 for CSE department.
We can update it very well right? why because we have only one department name exists in the department table. Here we have to update only one topple.
Not all the topples like other case. So we have solved the update anomaly problem.
Delete Anomaly Problem:
When you want to delete the information related to a specific student you also need to delete the associated department information for that student which is not good and we called it as delete anomaly.
If you take an example of OLTP and OLAP databases.
In the OLTP case, we need normalized data because insert, update, and deletions are very frequent.
So, these queries are very frequent obviously we need normalization to avoid anomalies.
That is why in OLTP system we need normalized data.
In the OLAP system data warehouse, we store huge data or historical data and we need to perform analytical queries, which means here we perform complex queries is in which we need to join multiple tables to get the results.
Why in this case we need de-normalized data?
Here basically data is stored in the de-normalized form to make the performance better.
But the OLTP data would be in the normalized form to get the performance better.
So hope you got what is the use of normalization?
What are the benefits of normalization?
The first point we can say to reduce the redundancy from the table. If you reduce the redundancy then obviously it will compact the data. So the second point is we can save space.
We can also reduce anomalies.
Normalization reduces null values. Let us say in our example I have student information and I don’t have department information obviously I need to store null values right?
Normalization will simplify the queries.
It is important for OLTP systems where insert, delete, and update queries exists very frequently by the end-users.
It simplifies the database structure. In our example, by looking at the structure of the student and department table we can easily understand what is the table all about.
For suppose you have one large table having 30 to 50 attributes like student, department, and HOD and faculty information.
By looking at that table and that schema it is difficult to understand.
If you divide that into four tables like student, department, faculty, and HOD as separate tables then by looking at those tables we can easily understand tables and schema.
It will simplifies the queries you can write down simple queries on the table .
Sorting and indexes are simple after applying the normalization.
So, these are all some important points about normalization.
Another important thing is that the decomposition of this table is not soo easy to find the right decomposition is a very tuff process.
In our example student table is a very simple table and by looking at the table itself we can understand separate the table as two different tables like students and department that is very easy.
But sometimes it is not so easy to find the right decomposition. Also, there would be some lossless decomposition.
So that lose decomposition we don’t want. It is also very important how to decompose the table.
Normalization is also having multiple levels.
Normal forms like 1st, 2nd, 3rd, BCNF, 4th, 5th, and 6th normal form. Many normal forms are there.
I hope this article is useful to you.
Please leave your comment below in case of any queries or suggestion.