default-coding

Habits for Faster Quality

View the Project on GitHub sudeeprp/default-coding

Data Schema and Access

Today’s applications are powered by data, just like machines are powered by energy.

For example, a cab-booking application would rely on data like your location, map-information, driver availability, dynamic pricing and payment details.

A relational database models data in terms of entities and relationships. Here are some examples of entities:

If you aren’t familiar with SQL, you can quickly get a flavor for it over here: w3schools SQL interpreter

Normalization

Consider the following data:

CustomerID CustomerName Pin code City Country
300 Arijit 110005 Delhi India
301 John 02111 Boston USA
302 Mary 02115 Boston USA
303 Sonu 560001 Bangalore India
304 Vijay 560002 Bangalore India
305 Latha 110001 Delhi India

Observe that there is some redundancy here. For example, ‘India’ will be repeated in every row for all customers in India. In any case, we can infer both the city and the country from the pin code.

Normalization is the process of removing redundancy. Here is a normalized representation of the above data:

CustomerID CustomerName Pin code
300 Arijit 110005
301 John 02111
302 Mary 02115
303 Sonu 560001
304 Vijay 560002
305 Latha 110001
Pin code prefix City
1100 Delhi
5600 Bangalore
0211 Boston
City Country
Delhi India
Bangalore India
Boston USA

Denormalization

There are situations where redundancy is actually desirable. An example is healthcare data. Every medical image needs to be accompanied by the all the patient’s details, for effective diagnosis.

Machine learning algorithms need all the data together to make effective models.

Normalized data can be denormalized using ‘join’ operations on the data. Use the tutorial mentioned to learn about joins.