April 26, 2018

Data Warehouse - Important QnA 3


Question. Provide brief answers to the following:

Data warehouse is subject oriented. For following companies what will be critical business subjects? 

i. Manufacturing company   
ii. Insurance company
iii. Bank
iv. University

Give an example of each to state the condition when to treat Customer as Fact and when as a Dimension. Why does Kimball advise construction of first-level data marts before embarking on consolidated data marts? What are the key and cardinality constraints between fact and dimension tables.
Discuss a 'Conformed Dimension' with example.
Ans. A: Business subjects for the below companies:  
       

Ans B. Example of treating Customer as Fact and Dimension:  This is basically depend on your business requirements.
  • If you want to get the customer's data in a descriptive format so make it customer as a dimension table. In this it will get the column name as : Customer Key , Customer name, Customer Code, Address, State. 
  • If you want to get the customer's data as in a summarize format so make it customer as a fact table. In this it will have the column name as : All the Dimensions Key, Total Experience, Count of Customers, Proficiency.
Ans C.  First-level data marts before embarking on consolidated data marts: 
  • It is prudent to focus on the first-level data marts as dimensional building blocks before tackling the task of consolidating and Kimball advised the same because they minimize the risk of signing up for an implementation that is too ambitious as well as Difficult to integrate if the overall requirements are not considered in the beginning.
  • Once it is time to begin a data mart development project, it is recommended to start the actual implementation with first-level data marts. Most of the overall risk of failure comes from biting off too much of the extract transformation-load (ETL) data staging design and development effort. Once we’ve fully enumerated the list of first-level data marts, then we can identify more complex multi source marts as a second step. These data marts are referred as consolidated data marts because they typically cross business processes.
Ans D. Constraints between Fact and Dimension tables: Following are the constraints in order to differentiate the fact and dimension table.
  • A dimension table usually contains a collection of surrogate keys, natural keys, and properties. In the struggle for this, a fact table will have a foreign key, measure / quantity, and dissolved dimensions.
  • Dimension tables deliver expressive or circumstantial information for the measurement of a fact table. In the case of fact tables, they provide the measurements of an initiative.
  • The dimension table has to be loaded first. However when loading the fact tables, one must have a look at the dimension table. Reason being the fact table has foreign keys, measures, facts, and the foreign keys of fact table are the primary keys in the dimension table.
  • The primary key in the dimension table is the foreign key in the fact table. Each record in a dimension table can describe many records in the fact table, making the join cardinality of dimension tables to fact tables one-to-many.
Ans E.  Conformed Dimension:  
  • A conformed dimension has exactly an equivalent significance and content when being mentioned from different fact tables. 
  • A conformed dimension can refer to multiple tables in multiple data marts within the same organization. 
  • If two dimension tables are needed to be considered as conformed, they must either be same or one must be a subdivision/subset of another. 
  • For instance, two dimension tables that are precisely identical with the exception of the primary key are not considered conformed dimensions.
  • The time dimension is a communal conformed dimension in an association or business. 
Question. Mention the approaches of Top-down (Inmon) and Bottom-up (Kimball) for Enterprise Data Warehouse development. What are the advantages of each approach? 

Ans.
Top-down Approach(Inmon): Inmon defines a data warehouse as a centralized repository for the entire enterprise. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form

Bottom-up Approach(Kimball)In a bottom-up approach, Data warehouse is the conglomerate(Group) of all data marts within the enterprise. Information is always stored in the dimensional model. Unlike the top-down approach, Data marts contain both atomic and summary that users may want or need in the future. Data is modeled in a star schema design to optimize usability and query performance.

Advantages: 





Question. A bank management wants to analyze the customer transactions based on the branch, customer and account type. They monitor transactions using counts and amounts. Draw a star schema to satisfy their need.
In the same above scenario, it was noticed that the customer table was having a lot of redundant values in terms of the city, state and country. Modify the above schema to handle redundancy. Also mention any demerits, if any, with the same.


Ans. Star Schema for Customer Transactions: 



Star Schema for Customer Transactions to reduce redundancy: