Let us see the sensible example.
Here i’m taking student table (student relation/ or student relational schema) .
Here there are duplicate entries in the department information.
I have highlighted the data where it has redundancy.
Here we’ve 8 records in our table what if we got many records?
It is a drag right?
Let us say we’ve 100 students in our table for the CSE department, for every row we’ve to repeat an equivalent department information.
But what’s the purpose of repeating information again and again?
This is what redundancy.
Why this problem occurs because here I even have taken larger schema and stored more information in just one table.
In this table we’ve 6 attributes but assume that there are other attributes associated with the department like HOD and HOD telephone number etc, we’d like to repeat that information also for all the topples.
What is that? that’s called redundancy.
Problem is what ? it’s a bigger schema .
Now If we decompose our table into smaller tables then maybe we will reduce the redundancy.
The redundancy is that the main explanation for all the issues
Data Inconsistency, data Insertion problems, data deletion problems, data update problems and every one the further problems will arise due to redundancy.
If you reduce or remove the redundancy what is going to happen?
Remember I’m not saying that we will remove the redundancy completely using Normalization in SQL Server or the other database.
It can reduce the redundancy, means it can reduce the duplicate data.
So if we are ready to reduce the redundancy, obviously we are ready to reduce the issues which will occur due to redundant data.
What problems can occur if we’ve redundant data? There are three problems that may occur because of redundancy.
What we call is Insert Anomaly, Update Anomaly and Delete Anomaly.
Anomaly is what?
When data has multiple copies at one place there are chances that we update the data at one place and we may forget to update the same data in another place.
If this is the scenario, we can’t say which data is correct. In one place we have some value and other places we have different values for the same data.
For suppose, I want to insert department information in the student table something like the department name, mechanical, building B1, and Room Number 120.
Here other columns we have credits, name, and sid. We have to insert all these records as null values right?
All other columns are fine. What about SID? It is the primary key.
As we know we cannot insert the null value on the primary key. Hence the insertion is not possible here.
In other words, we cannot insert department information unless we have at least one student enrolled in that department.
Why so? We are taking all the information in one table only. That is insertion anomaly.
This is a problematic situation maybe students are not there, students are not yet enrolled in that department but we need department information.
Here we are not able to insert the department information in this student table because student ID is a primary key.
We want to insert some information in a table but we cannot able to insert the required information. Why so? For inserting that information we need some extra information.
If you don’t have that extra information, you cannot insert the information in the student table. That is wrong.
I hope you got what is insertion anomaly.
Consider that the CSE department is shifted from B1 to C1 and the room number is 301. You have to update this information in all the topples where the department is CSE.
Why so? While creating this table or this relational schema we have to insert the information of this department once for each student of that department is present, that is how the table has been designed.
In our example we got 3 topples, but in the real world we may have millions of records.
In that case it is very difficult to update each topple.
By chance, we have updated 2 topple and we forgot to update the other topple then that would be inconsistency and which leads to inconsistency problems.
So now the data is in an inconsistent state this is not a good database design. This problem is called update anomaly.
What is delete anomaly? let us take one example.
Suppose I want to delete one student’s information. In our example let us say I want to delete record number 6.
If you want to delete the SUMA record then obviously you have to delete the complete record (topple). When you are going to delete the SUMA record obviously the department associated with SUMA will be deleted.
We have only one student who is there in this department, obviously when you are deleting a student means the student has left the college it doesn’t mean that we are going to delete the department from the college right?
Obviously that is not the case. We have to delete the ESE department and there is no other students are associated with this department then it is a problem?
This is called Delete Anomaly.
How we can solve the Anomaly problems? What is the solution for Anomaly?
Suppose we decompose the student table into two different tables and see.
I have divided the scholar table into two tables, one may be a student and therefore the other one is that the department.
The student table has information associated with students and therefore the department table has information associated with the department.
Now we will say we’ve removed all the issues .
Why so? we’ve removed the redundancy?
If you’ll observe the department table, Here we are storing the department table information just one occasion . Whereas within the student table, it’s repeated for every student.
These two tables students and departments are associated with one another . Here we are working with a electronic database management system all the knowledge is stored within the sort of relation and every relation is said to other relation.
How these two tables are related? with the assistance of primary key and foreign key relation.
In the student table, we’ve a department name and therefore the department table we’ve a department name.
If we are divided the larger schema into a smaller schema or we will say smaller tables then obviously we’ve to write down down some rules.
In other words, the department name is that the primary key within the department table for this relation and therefore the department name within the student table may be a foreign key.
Now here if we apply the Anomalies.
Insert anomaly, Obviously you’ll insert ME record within the department table. this suggests no got to insert the other information associated with ME .
But it’s not like that we’ve divided the 2 tables we will insert any record in any table or we will delete any record in any table.
Here we’ll even have some referential integrity constraints which will be applied on the foreign keys.
Update Anomaly problem, As discussed i would like to update building B1 as C1 and room 101 as 301 for CSE department.
We can update it alright? why because we’ve just one department name exists within the department table. Here we’ve to update just one topple.
Not all the topples like other case. So we’ve solved the update anomaly problem.
Delete Anomaly Problem:
When you want to delete the knowledge associated with a selected student you furthermore may get to delete the associated department information for that student which isn’t good and that we called a delete anomaly.
If you’re taking an example of OLTP and OLAP databases.
In the OLTP case, we’d like normalized data because insert, update, and deletions are very frequent.
So, these queries are very frequent obviously we’d like normalization to avoid anomalies.
That is why in OLTP system we’d like normalized data.
In the OLAP system data warehouse, we store huge data or historical data and that we got to perform analytical queries, which suggests here we perform complex queries is during which we’d like to hitch multiple tables to urge the results.
Why during this case we’d like de-normalized data?
Here basically data is stored within the de-normalized form to form the performance better.
But the OLTP data would be within the normalized form to urge the performance better.
So hope you bought what’s the utilization of normalization?
What are the benefits of normalization?
The first point we will tell reduce the redundancy from the table. If you reduce the redundancy then obviously it’ll compact the info. therefore the second point is we will save space.
We can also reduce anomalies.
Normalization reduces null values. allow us to say in our example I even have student information and that i don’t have department information obviously i want to store null values right?
Normalization will simplify the queries.
It simplifies the database structure. In our example, by watching the structure of the scholar and department table we will easily understand what’s the table all about.
For suppose you’ve got one large table having 30 to 50 attributes like student, department, and HOD, and school information.
By watching that table which schema it’s difficult to know .
If you divide that into four tables like student, department, faculty, and HOD as separate tables then by watching those tables we will easily understand tables and schema.
It will simplifies the queries you’ll write down simple queries on the table .
Sorting and indexes are simple after applying the normalization.
So, these are all some details about normalization.
Another important thing is that the decomposition of this table isn’t soo easy to seek out the proper decomposition may be a very tuff process.
In our example student table may be a very simple table and by watching the table itself we will understand separate the table as two different tables like students and department that’s very easy.
But sometimes it’s not very easy to seek out the proper decomposition. Also, there would be some lossless decomposition.
So that lose decomposition we don’t want. it’s also vital the way to decompose the table.
Normalization is additionally having multiple levels.
Normal forms like 1st, 2nd, 3rd, BCNF, 4th, 5th, and 6th normal form. Many normal forms are there.