What Are Slowly Changing Dimensions (SCDs)?
Before progressing to the topic of how to create an ETL package in SSIS, we have to first learn some basic concepts of Slowly Changing Dimensions (SCDs).
Introduction
Slowly Changing Dimensions allows empowering the historic aspect of data in an analytical system in the data warehouse. As you know that the data warehouse is used to analyze historical data, and it is essential to store the different states of data.
To store data in data warehousing, we utilize fact and dimension tables. The Fact table includes measures, metrics, and facts about a business process, whereas the Dimension table, which provides descriptive qualities, is a partner to the Fact table. Data is initiated at operational databases in a data environment, and data is extracted-transformed-loaded (ETL) to the data warehouse to meet the analytical environment.
Since we need to maintain the history, we can't just simply overwrite the data so we need to implement some special techniques where SCDs come into play.
6 Slowly Changing Dimensions Types:
Here is the table of the different slowly changing dimensions as shown below.
SCD Type 0
There are times when you should overlook any changes. For example, when a student joins the university, there are attributes such as Address, Student Name, Joined Date, etc. that should not change over time.
SCD Type 1
Here, you simply overwrite data in dimensions. There can be situations where you don’t have the entire data when the record is initiated in the dimension. The following example is given by ChatGPT, explaining SCD Type 1.
SCD Type 2
Slowly Changing Dimensions Type 2 is the most popular dimension used in the data warehouse where some columns are added to the table.
- IsCurrent Flag
- Start Date
- End Date
This approach is extremely strong since it keeps track of the history of the whole record and allows for easy change-over-time analysis. However, it comes with higher maintenance costs, additional storage space requirements, and possible performance drawbacks when employed on very big dimensions. Below is the example of SCD Type 2 given exclusively by ChatGPT.
SCD Type 3
The historical aspects of the data are kept in an additional column but this method is not scalable as it only keeps the last version of history. Let's suppose a person has changed his address so his new address will be updated and his previous address will be added to a new column "Previous Address".
SCD Type 4
A rapidly changing column is moved out of the dimension and is moved to a new dimension table. This new dimension is linked to the fact table. This technique reduces complexity and improves performance as well as eliminates unnecessary volume in the main dimension.
SCD Type 4 example from SQLShack |
SCD Type 6
This hybrid method is a combination of Type 2 and Type 3 SCDs hence having both columns & rows in its implementation.
Thanks for reading!
To conclude, We have looked into the detail of SCDs. In the next blog, we will learn about CDC (Change Data Capture). Make sure to subscribe to my blog to get upcoming updates.
If any query occurs feel free to ask in the comment section.
See you next time,
@TechAE
No comments:
Post a Comment
Thank you for submitting your comment! We appreciate your feedback and will review it as soon as possible. Please note that all comments are moderated and may take some time to appear on the site. We ask that you please keep your comments respectful and refrain from using offensive language or making personal attacks. Thank you for contributing to the conversation!