Showing posts sorted by date for query data warehouse. Sort by relevance Show all posts
Showing posts sorted by date for query data warehouse. Sort by relevance Show all posts

October 23, 2018

Data Mining - MCQS



Question

Which of the following activities is NOT a data mining task?
Select one:
a. Monitoring the heart rate of a patient for abnormalities
b. Monitoring and predicting failures in a hydropower plant
c. Predicting the future stock price of a company using historical records
d. Extracting the frequencies of a sound wave
The correct answer is: Extracting the frequencies of a sound wave

Question
Which of the following is not a data mining task?

Select one:
a. Feature Subset Detection
b. Association Rule Discovery
c. Regression
d. Sequential Pattern Discovery

The correct answer is: Feature Subset Detection
Question
Value set {poor, average, good, excellent} is an example of
Select one:
a. Nominal attribute
b. Numeric attribute
c. Continuous attribute
d. Ordinal attribute
The correct answer is: Ordinal attribute

Question
Which data mining task can be used for predicting wind velocities as a function of temperature, humidity, air pressure, etc.?

Select one:
a. Cluster Analysis
b. Regression
c. Clasification
d. Sequential pattern discovery

The correct answer is: Regression

Question
Identify the example of sequence data

Select one:
a. weather forecast
b. data matrix
c. market basket data
d. genomic data

The correct answer is: genomic data

Question
In a data mining task where it is not clear what type of patterns could be interesting, the data mining system should
Select one:
a. handle different granularities of data and patterns
b. perform all possible data mining tasks
c. allow interaction with the user to guide the mining process 
d. perform both descriptive and predictive tasks
The correct answer is: allow interaction with the user to guide the mining process
Question 
Removing duplicate records is a data mining process called________
Select one:
a. data isolation
b. recovery
c. data pruning
d. data cleaning 
The correct answer is: data cleaning
Question 
Various visualization techniques are used in ___________ step of KDD
Select one:
a. selection
b. interpretation 
c. transformation
d. data mining
The correct answer is: interpretation
Question 
Which of the following is not a Visualization Method?
Select one:
a. Hierarchical visualization technique
b. Tuple based visualization Technique
c. Icon based visualization techniques
d. Pixel oriented visualization technique 
The correct answer is: Tuple based visualization Technique

Question
Data set {brown, black, blue, green , red} is example of

Select one:
a. Continuous attribute
b. Ordinal attribute
c. Numeric attribute
d. Nominal attribute

The correct answer is: Nominal attribute

Question
Which of the following is NOT a data quality related issue?

Select one:
a. Attribute value range
b. Outlier records
c. Missing values
d. Duplicate records

The correct answer is: Attribute value range

Question
To detect fraudulent usage of credit cards, the following data mining task should be used

Select one:
a. Outlier analysis
b. prediction
c. association analysis
d. feature selection

The correct answer is: Outlier analysis
Question 
Which of the following is NOT example of ordinal attributes?
Select one:
a. Ordered numbers
b. Military ranks
c. Zip codes
d. Movie ratings 
The correct answer is: Zip codes
Question 
Which of the following is not a data pre-processing methods
Select one:
a. Data Cleaning
b. Data Visualization 
c. Data Discretization
d. Data Reduction
The correct answer is: Data Visualization
Question 
Incorrect or invalid data is known as _________
Select one:
a. Outlier
b. Missing data
c. Changing data
d. Noisy data 
The correct answer is: Noisy data

Question
Which of the following is an Entity identification problem?

Select one:
a. One person with different email address
b. One person's name written in different way
c. Title for person
d. One person with multiple phone numbers

The correct answer is: One person's name written in different way
Question 
Data Visualization in mining cannot be done using
Select one:
a. Graphs
b. Information Graphics
c. Charts
d. Photos 
The correct answer is: Photos
Question
Nominal and ordinal attributes can be collectively referred to as_________ attributes
Select one:
a. perfect
b. consistent
c. qualitative
d. optimized
The correct answer is: qualitative
Question 
The number of item sets of cardinality 4 from the items lists {A, B, C, D, E}
Select one:
a. 20
b. 2 
c. 10
d. 5
The correct answer is: 5
Question 
Identify the example of Nominal attribute
Select one:
a. Salary
b. Temperature 
c. Gender
d. Mass
The correct answer is: Gender
Question 
Which of the following are descriptive data mining activities?
Select one:
a. Clustering
b. Deviation detection 
c. Regression
d. Classification
The correct answer is: Clustering

Question 
Which statement is not TRUE regarding a data mining task?
Select one:
a. Deviation detection is a predictive data mining task
b. Classification is a predictive data mining task 
c. Clustering is a descriptive data mining task
d. Regression is a descriptive data mining task
The correct answer is: Regression is a descriptive data mining task
Question 
Correlation analysis is used for
Select one:
a. identifying redundant attributes
b. eliminating noise
c. handling missing values
d. handling different data formats 
The correct answer is: identifying redundant attributes
Question 
In Binning, we first sort data and partition into (equal-frequency) bins and then which of the following is not a valid step
Select one:
a. smooth by bin boundaries
b. smooth by bin median 
c. smooth by bin values
d. smooth by bin means
The correct answer is: smooth by bin values

Question 
Which of the following is NOT data mining efficiency/scalability issue?
Select one:
a. The running time of a data mining algorithm
b. Incremental execution
c. Data partitioning 
d. Easy to use user interface
The correct answer is: Easy to use user interface
Question 
Synonym for data mining is
Select one:
a. Data Warehouse
b. Knowledge discovery in database
c. Business intelligence
d. OLAP 
The correct answer is: Knowledge discovery in database
Question 
Data scrubbing can be defined as
Select one:
a. Check field overloading
b. Delete redundant tuples
c. Use simple domain knowledge (e.g., postal code, spell-check) to detect errors and make ions
d. Analyzing data to discover rules and relationship to detect violators 
The correct answer is: Use simple domain knowledge (e.g., postal code, spell-check) to detect errors and make ions
Question 
Dimensionality reduction reduces the data set size by removing _________
Select one:
a. irrelevant attributes 
b. composite attributes
c. derived attributes
d. relevant attributes
The correct answer is: irrelevant attributes
Question 
In asymmetric attibute
Select one:
a. Range of values is important
b. No value is considered important over other values
c. Only non-zero value is important
d. All values are equals 
The correct answer is: Only non-zero value is important
Question 
Which of the following is not a data mining task?
Select one:
a. Feature Subset Detection 
b. Regression
c. Sequential Pattern Discovery
d. Association Rule Discovery
The correct answer is: Feature Subset Detection
Question 
Which of the following is NOT an example of data quality related issue?
Select one:
a. Using a field for different purposes 
b. Contradicting values
c. Noise
d. Multiple date formats 
The correct answer is: Multiple date formats
Question 
Similarity is a numerical measure whose value is
Select one:
a. Higher when objects are more alike
b. Lower when objects are more alike
c. Increases with Minkowski distance
d. Higher when objects are not alike 
The correct answer is: Higher when objects are more alike
Question 
The dissimilarity between two data objects is
Select one:
a. Lower when objects are more alike
b. Higher when objects are more alike
c. Lower when objects are  not alike
d. Applies only categorical attributes 
The correct answer is: Lower when objects are more alike
Question 
The important characteristics of structured data are

Select one:
a. Resolution, Distribution, Dimensionality ,Objects
b. Sparsity, Centroid, Distribution , Dimensionality
c. Dimensionality, Sparsity, Resolution, Distribution 
d. Sparsity, Resolution, Distribution, Tuples

The correct answer is: Dimensionality, Sparsity, Resolution, Distribution

Question 
Which of the following statement is not TRUE for a Tag Cloud

Select one:
a. Tag cloud is a visualization of statistics of user-generated tags 
b. Tag cloud can be used for numeric data only
c. The importance of a tag is indicated by font size or color
d. Tags may be listed alphabetically in a tag cloud

The correct answer is: Tag cloud can be used for numeric data only

Question 
Which of the following data mining task is known as Market Basket Analysis?

Select one:
a. Clasification
b. Regression
c. Association Analysis 
d. Outlier Analysis

The correct answer is: Association Analysis

Question 
Which of the following is not a Data discretization Method?

Select one:
a. Histogram analysis
b. Cluster Analysis
c. Data compression
d. Binning

The correct answer is: Data compression

Question 
Which of the following activities is a data mining task?

Select one:
a. Monitoring the heart rate of a patient for abnormalities
b. Dividing the customers of a company according to their profitability 
c. Extracting the frequencies of a sound wave
d. Predicting the outcomes of tossing a (fair) pair of dice

The correct answer is: Monitoring the heart rate of a patient for abnormalities

Question 
Sorted data (attribute values ) for price are: 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34. Identify which is NOT a bin smoothed by boundaries?

Select one:
a. Bin 2: 21, 21, 25, 25
b. Bin 1: 4, 4, 4, 15 
c. Bin 1: 4, 4, 15, 15
d. Bin 3: 26, 26, 26, 34

The correct answer is: Bin 1: 4, 4, 15, 15

Question
The difference between supervised learning and unsupervised learning is given by

Select one:
a. unlike unsupervised learning, supervised learning needs labeled data
b. unlike unsupervised learning, supervised learning can be used to detect outliers
c. unlike supervised leaning, unsupervised learning can form new classes
d. there is no difference

The Correct answer is: unlike unsupervised learning, supervised learning needs labeled data

Question 
The Data Sets are made up of

Select one:
a. Data Objects
b. Attributes 
c. Dimensions
d. Database

The correct answer is: Data Objects

August 04, 2018

Data Warehouse Reference - QnA

Question.
How can you apply the data to the warehouse? What are the modes?
Answer:
Data may be applied in the following four different modes: load, append, destructive merge, and constructive merge. Let us understanding of the effect of applying data in each of these four modes:

Load: If the target table to be loaded already exists and data exists in the table, the load process wipes out the existing data and applies the data from the incoming file. If the table is already empty before loading, the load process simply applies the data from the incoming file.

Append:You may think of the append as an extension of the load. If data already exists in the table, the append process unconditionally adds the incoming data, preserving the existing data in the target table. When an incoming record is a duplicate of an already existing record, you may define how to handle an incoming duplicate. The incoming record may be allowed to be added as a duplicate. In the other option, the incoming duplicate record may be rejected during the append process.

Destructive Merge : Merge In this mode, you apply the incoming data to the target data. If the primary key of an incoming record matches with the key of an existing record, update the matching target record. If the incoming record is a new record without a match with any existing record, add the incoming record to the target table.

Constructive Merge: This mode is slightly different from the destructive merge. If the primary key of an incoming record matches with the key of an existing record, leave the existing record, add the incoming record, and mark the added record as superseding the old record.

Question.
Let's say that the data warehouse for Big_University consists of four dimension students, courses, semesters and trainers, and there are two measurements and avg_grade. At the lowest ideological level (eg, for a given student, curriculum, semester and trainer combination), avg_grade measures the student's actual course grade. At higher conceptual levels, avg_grade stores the average grade for the given combination. Draw a snowflake schema diagram.

Answer:
http://www.waseian.com/2018/08/data-warehouse-comprehensive2015-16.html
Question.
Based on current trends in technology need to design information systems . Explain the points to be taken care with respective traditional operational systems and the newer informational systems that need to be built?
Answer:
The essential reason for the lack of ability to provide strategic facts is that we have been trying all along to provide strategic facts from the operational systems. These operational systems such as command processing, record control, dues and claims processing, casualty billing, and so on are not planned or intended to deliver strategic information. If we need the skill to provide strategic data and information, we must get the information from overall different types of systems. Specially designed decision care systems or informational systems can deliver strategic information.
  We find that in order to provide strategic information we need to build informational systems that are different from the operational systems we have been building to run the basic business. It will be worthless to continue to dip into the operational systems for strategic information as we have been doing in the past. As companies face fiercer competition and businesses become more complex, continuing the past practices will only lead to disaster.
  • Watching the wheels of business turn
  • Show me the top-selling products
  • Show me the problem regions
  • Tell me why (drill down)
  • Let me see other data (drill across)
  • Show the highest margins
  • Alert me when a district sells below target
http://www.waseian.com/2018/08/data-warehouse-comprehensive2015-16.html
We need to design and build informational systems
  • That serve different purposes
  • Whose scopes are different
  • Whose data content is different
  • Where the data usage patterns are different
  • Where the data access types are different
Question.
2-D data pulled out from the data cube.


Product ID
Location ID
Number Sold
1
1
10
1
3
6
2
1
5
2
2
22

Represent the above into 3-D format, focussing majorly  on product-id and sales


Answer:
Product ID
Location ID
Total Sold
1
10
-
6
16
2
5
22
-
27
Total
15
22
6
43











Question.5
What is a  OLAP cube?                                                                                                        


Answer
An OLAP data cube is a representation of data in multiple dimensions, using facts and dimensions. It is characterized by the combination of information according to it’s relationship. It can consist in a collection of 0 to many dimensions, representing specific data. 
There are five basic operation to perform on these kind of data cubes: 
  1. Slicing
  2. Dicing
  3. Roll-Up
  4. Drill-Up
  5. Drill-Down
  6. Pivoting
Question
Why is dimensional normalization not required?
Answer
Dimensional normalization allows to solve database related problems. It is used to remove unnecessary features which are used as De-normalized dimensions. Dimensions have sub-dimensions which are added together. Due to this fact dimensional generalization is not used:
  • Data structure is more complex and which can cause performance to be degraded because it needs to be included in tables and relationships are retained
  • Query Performance suffers while collecting or retrieving multiple dimensional values It requires proper analysis and operational reports.
  • Space is not used properly and more space is needed.
Question.
What are the steps involved in creating dimensional modeling process?
Answer:
The business process of the dimensional modeling includes:

(a) Choose The Business Process: In this, 4-step design method is followed that helps to provide the usability of the dimensional model. This allows the business process to be more systematic in representation and more helpful in explaining it as well. It includes the use of Business Process Modelling Notation (BPMN) or Unified Modelling Language (UML).

(b)Declaring The Grain: After choosing the business process, the declaration of the model comes that consists of grains. The grain of the model provides the accurate description of the dimensional model and allows the focus should be shifted there.

(c)Identify The Dimensions:In this phase, the dimension is identified in the dimensional model. Dimensions are defined in cereals which are defined in the declaration part above. Dimensions acts as a foundation of the fact table where the data gets collected that comes under the fact. 

(d) Identify The Facts: Defining the dimensions provides a way to create a table in which the fact data can be stored. These facts are populated on the basis of the numerical figures and facts.

Question.
Consider a data warehouse, where the fact data is calculated to be 36GB of data per year, and 4 years’ worth of data are to be kept online. The data is to be partitioned by month and four concurrent queries are to be allowed.
Compute the partition size, Temporary Space and Space Required for this scenario. 
Answer:
Partition size P = 36GB per year / 12 = 3 GB
T = (2n +1)P = [(2 x 4) + 1]3 = 27 GB
F = 36GB X 4 years = 144 GB
Space Required = 3.5F + T = 3.5 X 144 + 27 = 531 GB

Question.
Discuss the merits and demerits of using views from the perspective of security of data warehouse.
Answer:
Views are easier option to define security initially. Later it will cause challenges.
Some of the common restrictions that may apply to the handling of views are:
  •     restricted data manipulation language (DML) operations,
  •     lost query optimization paths,
  •     restrictions on parallel processing of view projections.
The use of views to enforce security will impose a maintenance overhead. In particular, if views are used to enforce restricted access to data tables and aggregations, as these changes, the views may also change.
Question.
 For following statements, indicate True or False with proper justification:

A.    It is a good practice to drop the indexes before the initial load.
True.  Index entry creations during mass loads can be too time-consuming. So drop the indexes prior to the loads to make the loads go quicker. You may rebuild or regenerate the indexes when the loads are complete 

B.    The choice of index type depends on cardinality.
True. Bit-map index can be used only for low cardinality data

C.    The importance of metadata is the same for data warehouse and an operational system.
False.  In an operational system, users get information thru predefined screens and reports. In DW, users seek information thru ad-hoc queries.

D.    Backing up the data warehouse is not necessary because you can recover data from the source systems.
False. Information in DW is accumulated over long periods and elaborately preprocessed
 
E.    MPP is a shared-memory parallel hardware configuration.
False.  MPP is a share-nothing hardware architecture.