Showing posts sorted by relevance for query data warehouse. Sort by date Show all posts
Showing posts sorted by relevance for query data warehouse. Sort by date Show all posts

May 06, 2018

Data Warehouse - MCQS

Question.
The data Warehouse is______.
A. read write only.
B. write only.
C. read only.
D. none.
ANSWER: A

Question.
__________ is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of
management decisions.
A. Web Mining.
B. Data Warehousing.
C. Data Mining.
D. Text Mining.

ANSWER: B

Question.
If you have to summarize data as per hierarchy in dimension attributes, which is appropriate SQL clause?

Select one:
SUM
HAVING
WHERE
ROLLUP

Answer: ROLLUP

Question.
Which of the following is Apache data warehouse?

Select one:
Hbase
Sqoop
HDFS
Hive

Answer: Hive

Question.
A Data Mart whose source is directly from transactional systems, legacy applications, or external data feeds

Select one:
Dependent Data Mart
Intra-entry Data Mart
Independent Data Mart
Inter-entry Data Mart

Answer: Independent Data Mart

Question.
In a star schema, the number of rows in a dimension table is

Select one:
1. Larger than the fact table
2. Smaller than the fact table
3. Same as in the fact table

4. None of the options


AnswerSmaller than the fact table

Question.
The difference between ROLLUP and CUBE in SQL is

Select one:
CUBE summarizes for all combinations while ROLLUP does along hierarchy
CUBE summarizes along a hierarchy while ROLLUP does for all combinations
CUBE and ROLLUP handle NULLs differently
CUBE and ROLLUP are the same

Answer: CUBE summarizes for all combinations while ROLLUP does along hierarchy

Question.
Data scrubbing is which of the following?

Select one:
A process to load the data in the data warehouse and to create the necessary indexes
A process to upgrade the quality of data before it is moved into a data warehouse
A process to aggregate data after it is moved into a data warehouse
A process to reject data from the data warehouse and to create the necessary indexes


Answer: A process to upgrade the quality of data before it is moved into a data warehouse

Question.
Which of the following is NOT a kind of data warehouse application?

Select one:
Data mining
Analytical processing
Information processing
Transaction processing

AnswerTransaction processing

Question.
Which one of the following is NOT a characteristic of metadata?

Select one:
Self-describing
Includes user data
Data about data
Helps user to build ad-hoc queries

Answer: Includes user data

Question.
A _______ is a table on disk that contains the result set of a query

Select one:
Materialized View
Pivot
View
Table

AnswerMaterialized View

Question.
Which one is NOT the Codd’s rule for OLAP?

Select one:
Restricted cross dimensional operation
Transparency
Consistent Reporting Performance
Intuitive Data Manipulation

Answer: Restricted cross dimensional operation

Question.
A bitmap index is most appropriate on which of these columns

Select one:
Salary
Gender
Date of Birth
Name

Answer: Gender

Question.
SQL Windowing is based on

Select one:
Partitions defined for the table
Indexes
Materialized Views
Partitions defined in query

Answer: Partitions defined in query

Question.
The SQL analytical function to divide records into equal sized groups is

Select one:
PERCENTILE
GROUP BY
NTILE
QUARTILE

Answer: NTILE

Question.
The operation of moving from coarser granularity data to a finer granularity  is called

Select one:
Drill down
Dice
Rollup
Slice


AnswerDrill down

Question.
Views in SQL are of the type

Select one:
Always Physical
Sometimes physical
Always Virtual
Always up-to-date

AnswerSometimes physical

Question.
MOLAP stores aggregate values in which of the following structure?

Select one:
Table
Index
View
Array

AnswerArray

Question.
Which of the following OLAP techniques offers best latency?

Select one:
ROLAP
MOLAP
DOLAP
HOLAP

Answer: ROLAP

Question.
Aggregate navigator is based on

Select one:
Indexes
Summary tables
Logical Views
Materialized Views

AnswerMaterialized Views

Question.
Summarization for a table is to be done using 4 attributes. How many SQL statements using GROUP BY are required to produce equivalent results?

Select one:
4
8
64
16

Answer: 16

Question.
Difference between Rank and Sort in SQL is

Select one:
Sort allows aggregates while Rank does not
Rank gives a separate sequence of numbers
Sort is based on attribute while Rank is not
They are same

AnswerRank gives a separate sequence of numbers

Question.
Dimensions D1 and D2 are not considered conforming if

Select one:
D1 columns are subset of D2
D1 identical to D2
D1 and D2 are present on the same server
D1 is rolled up subset of D2

AnswerD1 and D2 are present on the same server

Question.
In SQL, what is the difference between GROUP BY and PARTITION BY clauses ?

Select one:
They are same.
SELECT clause
Type of attributes
WHERE clause

AnswerWHERE clause

Question.
When were analytic functions introduced in SQL?

Select one:
2011
1999
1992
2003


Answer2003

Question.
What is the difference between Rank and Dense_Rank in SQL?

Select one:
Rank and Dense_Rank are the same
Dense_Rank uses consecutive numbers
Dense_Rank hides some results
Dense_Rank rejects duplicates


Answer: Dense_Rank uses consecutive numbers

Question.
The type of relationship in star schema from dimension to fact is

Select one:
Many-to-Many
One-to-Many
One-to-One
Many-to-one 


Answer: one-to-many 

Question.
A table can not be partitioned by

Select one:
Hash
List
Range
Rank

AnswerRank

Question.
A snowflake schema is which of the following type of table?

Select one:
1. Fact
2. Normalized Dimension
3. Factless Dimension
4. All of the above

Answer: All of the above

Question
Enterprise Data Warehouse is a _________________warehouse.

Select one:
1. Public
2. Non Centralized
3. Centralized
4. Private

Answer: Centralized

Question
Which of the following is not a data component of a data warehouse?

Select one:
1. Component Key.
2. Metadata
3. Current detail data.

4. Lightly summarized data.

Answer: Component Key

Question
Data warehouses commonly use a _______________key to uniquely identify an entity for various purposes.

Select one:
1. Composite
2. Primary
3. Super
4. Surrogate

Answer: Surrogate

April 26, 2018

Data Warehouse - Important QnA 3


Question. Provide brief answers to the following:

Data warehouse is subject oriented. For following companies what will be critical business subjects? 

i. Manufacturing company   
ii. Insurance company
iii. Bank
iv. University

Give an example of each to state the condition when to treat Customer as Fact and when as a Dimension. Why does Kimball advise construction of first-level data marts before embarking on consolidated data marts? What are the key and cardinality constraints between fact and dimension tables.
Discuss a 'Conformed Dimension' with example.
Ans. A: Business subjects for the below companies:  
       

Ans B. Example of treating Customer as Fact and Dimension:  This is basically depend on your business requirements.
  • If you want to get the customer's data in a descriptive format so make it customer as a dimension table. In this it will get the column name as : Customer Key , Customer name, Customer Code, Address, State. 
  • If you want to get the customer's data as in a summarize format so make it customer as a fact table. In this it will have the column name as : All the Dimensions Key, Total Experience, Count of Customers, Proficiency.
Ans C.  First-level data marts before embarking on consolidated data marts: 
  • It is prudent to focus on the first-level data marts as dimensional building blocks before tackling the task of consolidating and Kimball advised the same because they minimize the risk of signing up for an implementation that is too ambitious as well as Difficult to integrate if the overall requirements are not considered in the beginning.
  • Once it is time to begin a data mart development project, it is recommended to start the actual implementation with first-level data marts. Most of the overall risk of failure comes from biting off too much of the extract transformation-load (ETL) data staging design and development effort. Once we’ve fully enumerated the list of first-level data marts, then we can identify more complex multi source marts as a second step. These data marts are referred as consolidated data marts because they typically cross business processes.
Ans D. Constraints between Fact and Dimension tables: Following are the constraints in order to differentiate the fact and dimension table.
  • A dimension table usually contains a collection of surrogate keys, natural keys, and properties. In the struggle for this, a fact table will have a foreign key, measure / quantity, and dissolved dimensions.
  • Dimension tables deliver expressive or circumstantial information for the measurement of a fact table. In the case of fact tables, they provide the measurements of an initiative.
  • The dimension table has to be loaded first. However when loading the fact tables, one must have a look at the dimension table. Reason being the fact table has foreign keys, measures, facts, and the foreign keys of fact table are the primary keys in the dimension table.
  • The primary key in the dimension table is the foreign key in the fact table. Each record in a dimension table can describe many records in the fact table, making the join cardinality of dimension tables to fact tables one-to-many.
Ans E.  Conformed Dimension:  
  • A conformed dimension has exactly an equivalent significance and content when being mentioned from different fact tables. 
  • A conformed dimension can refer to multiple tables in multiple data marts within the same organization. 
  • If two dimension tables are needed to be considered as conformed, they must either be same or one must be a subdivision/subset of another. 
  • For instance, two dimension tables that are precisely identical with the exception of the primary key are not considered conformed dimensions.
  • The time dimension is a communal conformed dimension in an association or business. 
Question. Mention the approaches of Top-down (Inmon) and Bottom-up (Kimball) for Enterprise Data Warehouse development. What are the advantages of each approach? 

Ans.
Top-down Approach(Inmon): Inmon defines a data warehouse as a centralized repository for the entire enterprise. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form

Bottom-up Approach(Kimball)In a bottom-up approach, Data warehouse is the conglomerate(Group) of all data marts within the enterprise. Information is always stored in the dimensional model. Unlike the top-down approach, Data marts contain both atomic and summary that users may want or need in the future. Data is modeled in a star schema design to optimize usability and query performance.

Advantages: 





Question. A bank management wants to analyze the customer transactions based on the branch, customer and account type. They monitor transactions using counts and amounts. Draw a star schema to satisfy their need.
In the same above scenario, it was noticed that the customer table was having a lot of redundant values in terms of the city, state and country. Modify the above schema to handle redundancy. Also mention any demerits, if any, with the same.


Ans. Star Schema for Customer Transactions: 



Star Schema for Customer Transactions to reduce redundancy:


August 04, 2018

Data Warehouse Reference - QnA

Question.
How can you apply the data to the warehouse? What are the modes?
Answer:
Data may be applied in the following four different modes: load, append, destructive merge, and constructive merge. Let us understanding of the effect of applying data in each of these four modes:

Load: If the target table to be loaded already exists and data exists in the table, the load process wipes out the existing data and applies the data from the incoming file. If the table is already empty before loading, the load process simply applies the data from the incoming file.

Append:You may think of the append as an extension of the load. If data already exists in the table, the append process unconditionally adds the incoming data, preserving the existing data in the target table. When an incoming record is a duplicate of an already existing record, you may define how to handle an incoming duplicate. The incoming record may be allowed to be added as a duplicate. In the other option, the incoming duplicate record may be rejected during the append process.

Destructive Merge : Merge In this mode, you apply the incoming data to the target data. If the primary key of an incoming record matches with the key of an existing record, update the matching target record. If the incoming record is a new record without a match with any existing record, add the incoming record to the target table.

Constructive Merge: This mode is slightly different from the destructive merge. If the primary key of an incoming record matches with the key of an existing record, leave the existing record, add the incoming record, and mark the added record as superseding the old record.

Question.
Let's say that the data warehouse for Big_University consists of four dimension students, courses, semesters and trainers, and there are two measurements and avg_grade. At the lowest ideological level (eg, for a given student, curriculum, semester and trainer combination), avg_grade measures the student's actual course grade. At higher conceptual levels, avg_grade stores the average grade for the given combination. Draw a snowflake schema diagram.

Answer:
http://www.waseian.com/2018/08/data-warehouse-comprehensive2015-16.html
Question.
Based on current trends in technology need to design information systems . Explain the points to be taken care with respective traditional operational systems and the newer informational systems that need to be built?
Answer:
The essential reason for the lack of ability to provide strategic facts is that we have been trying all along to provide strategic facts from the operational systems. These operational systems such as command processing, record control, dues and claims processing, casualty billing, and so on are not planned or intended to deliver strategic information. If we need the skill to provide strategic data and information, we must get the information from overall different types of systems. Specially designed decision care systems or informational systems can deliver strategic information.
  We find that in order to provide strategic information we need to build informational systems that are different from the operational systems we have been building to run the basic business. It will be worthless to continue to dip into the operational systems for strategic information as we have been doing in the past. As companies face fiercer competition and businesses become more complex, continuing the past practices will only lead to disaster.
  • Watching the wheels of business turn
  • Show me the top-selling products
  • Show me the problem regions
  • Tell me why (drill down)
  • Let me see other data (drill across)
  • Show the highest margins
  • Alert me when a district sells below target
http://www.waseian.com/2018/08/data-warehouse-comprehensive2015-16.html
We need to design and build informational systems
  • That serve different purposes
  • Whose scopes are different
  • Whose data content is different
  • Where the data usage patterns are different
  • Where the data access types are different
Question.
2-D data pulled out from the data cube.


Product ID
Location ID
Number Sold
1
1
10
1
3
6
2
1
5
2
2
22

Represent the above into 3-D format, focussing majorly  on product-id and sales


Answer:
Product ID
Location ID
Total Sold
1
10
-
6
16
2
5
22
-
27
Total
15
22
6
43











Question.5
What is a  OLAP cube?                                                                                                        


Answer
An OLAP data cube is a representation of data in multiple dimensions, using facts and dimensions. It is characterized by the combination of information according to it’s relationship. It can consist in a collection of 0 to many dimensions, representing specific data. 
There are five basic operation to perform on these kind of data cubes: 
  1. Slicing
  2. Dicing
  3. Roll-Up
  4. Drill-Up
  5. Drill-Down
  6. Pivoting
Question
Why is dimensional normalization not required?
Answer
Dimensional normalization allows to solve database related problems. It is used to remove unnecessary features which are used as De-normalized dimensions. Dimensions have sub-dimensions which are added together. Due to this fact dimensional generalization is not used:
  • Data structure is more complex and which can cause performance to be degraded because it needs to be included in tables and relationships are retained
  • Query Performance suffers while collecting or retrieving multiple dimensional values It requires proper analysis and operational reports.
  • Space is not used properly and more space is needed.
Question.
What are the steps involved in creating dimensional modeling process?
Answer:
The business process of the dimensional modeling includes:

(a) Choose The Business Process: In this, 4-step design method is followed that helps to provide the usability of the dimensional model. This allows the business process to be more systematic in representation and more helpful in explaining it as well. It includes the use of Business Process Modelling Notation (BPMN) or Unified Modelling Language (UML).

(b)Declaring The Grain: After choosing the business process, the declaration of the model comes that consists of grains. The grain of the model provides the accurate description of the dimensional model and allows the focus should be shifted there.

(c)Identify The Dimensions:In this phase, the dimension is identified in the dimensional model. Dimensions are defined in cereals which are defined in the declaration part above. Dimensions acts as a foundation of the fact table where the data gets collected that comes under the fact. 

(d) Identify The Facts: Defining the dimensions provides a way to create a table in which the fact data can be stored. These facts are populated on the basis of the numerical figures and facts.

Question.
Consider a data warehouse, where the fact data is calculated to be 36GB of data per year, and 4 years’ worth of data are to be kept online. The data is to be partitioned by month and four concurrent queries are to be allowed.
Compute the partition size, Temporary Space and Space Required for this scenario. 
Answer:
Partition size P = 36GB per year / 12 = 3 GB
T = (2n +1)P = [(2 x 4) + 1]3 = 27 GB
F = 36GB X 4 years = 144 GB
Space Required = 3.5F + T = 3.5 X 144 + 27 = 531 GB

Question.
Discuss the merits and demerits of using views from the perspective of security of data warehouse.
Answer:
Views are easier option to define security initially. Later it will cause challenges.
Some of the common restrictions that may apply to the handling of views are:
  •     restricted data manipulation language (DML) operations,
  •     lost query optimization paths,
  •     restrictions on parallel processing of view projections.
The use of views to enforce security will impose a maintenance overhead. In particular, if views are used to enforce restricted access to data tables and aggregations, as these changes, the views may also change.
Question.
 For following statements, indicate True or False with proper justification:

A.    It is a good practice to drop the indexes before the initial load.
True.  Index entry creations during mass loads can be too time-consuming. So drop the indexes prior to the loads to make the loads go quicker. You may rebuild or regenerate the indexes when the loads are complete 

B.    The choice of index type depends on cardinality.
True. Bit-map index can be used only for low cardinality data

C.    The importance of metadata is the same for data warehouse and an operational system.
False.  In an operational system, users get information thru predefined screens and reports. In DW, users seek information thru ad-hoc queries.

D.    Backing up the data warehouse is not necessary because you can recover data from the source systems.
False. Information in DW is accumulated over long periods and elaborately preprocessed
 
E.    MPP is a shared-memory parallel hardware configuration.
False.  MPP is a share-nothing hardware architecture.