慢慢?Slowly Changing Dimensions in SSAS

SCD是精心设计的数据仓库的关键组成部分

在许多商业智能培训课程中似乎省略的一个重要主题是逐渐变化的维度或SCD。当我教BI课程时,我提出了一个介绍该主题的意义,它可能比其他任何主题更加兴奋的学生更加兴奋。相信我,说起来容易做起来难。兴奋和商业智能并不总是适合同一句子。..好的,所以我们知道一个数据仓库是由涉及事实,维度,属性和层次结构的关系设计组成的,该设计在“星”或“雪花”模式中排列。事实通常是我们要用于评估业务的数字价值,例如销售收入,成本,利润率等。如果事实是我们要衡量的事实,那么维度就是我们想要分析事实的方式;例如,按年,按季度,按月,客户或按地区按季度按季度收入。属性被添加到尺寸表中,以充实维度,以使其具有更多的含义,例如客户人口统计,例如年龄或工资范围。层次结构是彼此相关的多个维度,例如年/季度/月或客户/区域。那么,变化的尺寸是什么? As the name suggests, it’s a dimension that changes slowly and predictably. The big decision we have to make is: do we care? When a customer moves from one region to another, what should happen to their previous orders? If we are not careful, they will end up appearing under the new region and go missing from the old one. Not bad if you are a sales rep for the new region but disastrous if you are the sales rep who actually made those sales. Another example is when analyzing schools: when a student moves to another school should their previous exam results apply to the new school? Of course not. (Depends on whether they are a good or bad student I hear you say? Shame on you…). We want to be accurate in all such cases, but with standard dimensions we may fall into the trap of losing sight of that history. At the dimension attribute level there are at least 3 types of SCD. Type 1 SCD means the attribute is a “changing attribute” but we only care about the most current value. Type 2 means it is a “historical attribute” and we very much care about maintaining historical accuracy. Type 3 is for the rare attribute where we only care about the original and current value but not those changes in between, sometimes called “First and Last”. There are some other types but then we are getting too academic for this time of the day… If we do nothing about our dimension design, we will end up with all Type 1 attributes. This might be OK for Customer Last Name which may change but as long as we don’t need to analyze how many “Smiths” bought a particular product then Type 1 should be just fine. Sales By Region or Scores by School are classic examples where Type 2 SCD design is definitely needed. And the trick to enable this design is both simple and ingenious. For the customer who moves from one region to another, or the student who moves to a new school we need to create a row in the dimension table for each move with a corresponding Start Date and End Date to indicate the period they were there. However, relational constraints limit the Primary Key to a single unique value per row so, enter the concept of the “Surrogate Key”. Within the Data Warehouse design, the original business key from the operational system becomes just another attribute. Then we add a surrogate key as the primary key and have the system generate it to be unique. Now we can have multiple rows for a customer or student, with accurate start and end dates to indicate the history of the moves. The single row without an end date value will indicate the current location. SSIS has a Slowly Changing Dimension Transform which helps us with the incremental load. And here is the ingenious part: we only need to join to the fact table using the surrogate key because, as long as we load the data correctly with our incremental load processes, the end user will be able to analyze Customer by Region or Scores by School with confidence of accuracy even when analyzing over time-periods. The “magic” is in the data load. Don’t worry; I was skeptical too, until I tested I out. It’s a “eureka moment” when you see it working for the first time. The Business Analysts then perform their analysis as normal, using high performing cubes or multi-dimensional databases, oblivious to the underlying complexity using their “drag and drop” GUI in Excel and it’s “Business Intelligence as Usual” except with deadly accuracy. Cheers, Brian.

加入网络世界社区足球竞猜app软件FacebookLinkedIn评论最重要的主题。
有关的:

版权所有©2011足球竞彩网下载

IT工资调查:结果在