WASEIAN: Search results for data warehouse

Showing posts sorted by date for query data warehouse. Sort by relevance Show all posts

October 23, 2018

Data Mining - MCQS

Waseian Data Mining , DM

Question

Which of the following activities is NOT a data mining task?

Select one:

a. Monitoring the heart rate of a patient for abnormalities

b. Monitoring and predicting failures in a hydropower plant

c. Predicting the future stock price of a company using historical records

d. Extracting the frequencies of a sound wave

The correct answer is: Extracting the frequencies of a sound wave

Question
Which of the following is not a data mining task?

Select one:
a. Feature Subset Detection
b. Association Rule Discovery
c. Regression
d. Sequential Pattern Discovery

The correct answer is: Feature Subset Detection

Question

Value set {poor, average, good, excellent} is an example of

Select one:

a. Nominal attribute

b. Numeric attribute

c. Continuous attribute

d. Ordinal attribute

The correct answer is: Ordinal attribute

Question
Which data mining task can be used for predicting wind velocities as a function of temperature, humidity, air pressure, etc.?

Select one:
a. Cluster Analysis
b. Regression
c. Clasification
d. Sequential pattern discovery

The correct answer is: Regression

Question
Identify the example of sequence data

Select one:
a. weather forecast
b. data matrix
c. market basket data
d. genomic data

The correct answer is: genomic data

Question

In a data mining task where it is not clear what type of patterns could be interesting, the data mining system should

Select one:

a. handle different granularities of data and patterns

b. perform all possible data mining tasks

c. allow interaction with the user to guide the mining process

d. perform both descriptive and predictive tasks

The correct answer is: allow interaction with the user to guide the mining process

Question

Removing duplicate records is a data mining process called________

Select one:

a. data isolation

b. recovery

c. data pruning

d. data cleaning

The correct answer is: data cleaning

Question

Various visualization techniques are used in ___________ step of KDD

Select one:

a. selection

b. interpretation

c. transformation

d. data mining

The correct answer is: interpretation

Question

Which of the following is not a Visualization Method?

Select one:

a. Hierarchical visualization technique

b. Tuple based visualization Technique

c. Icon based visualization techniques

d. Pixel oriented visualization technique

The correct answer is: Tuple based visualization Technique

Question
Data set {brown, black, blue, green , red} is example of

Select one:
a. Continuous attribute
b. Ordinal attribute
c. Numeric attribute
d. Nominal attribute

The correct answer is: Nominal attribute

Question
Which of the following is NOT a data quality related issue?

Select one:
a. Attribute value range
b. Outlier records
c. Missing values
d. Duplicate records

The correct answer is: Attribute value range

Question
To detect fraudulent usage of credit cards, the following data mining task should be used

Select one:
a. Outlier analysis
b. prediction
c. association analysis
d. feature selection

The correct answer is: Outlier analysis

Question

Which of the following is NOT example of ordinal attributes?

Select one:

a. Ordered numbers

b. Military ranks

c. Zip codes

d. Movie ratings

The correct answer is: Zip codes

Question

Which of the following is not a data pre-processing methods

Select one:

a. Data Cleaning

b. Data Visualization

c. Data Discretization

d. Data Reduction

The correct answer is: Data Visualization

Question

Incorrect or invalid data is known as _________

Select one:

a. Outlier

b. Missing data

c. Changing data

d. Noisy data

The correct answer is: Noisy data

Question
Which of the following is an Entity identification problem?

Select one:
a. One person with different email address
b. One person's name written in different way
c. Title for person
d. One person with multiple phone numbers

The correct answer is: One person's name written in different way

Question

Data Visualization in mining cannot be done using

Select one:

a. Graphs

b. Information Graphics

c. Charts

d. Photos

The correct answer is: Photos

Question

Nominal and ordinal attributes can be collectively referred to as_________ attributes

Select one:

a. perfect

b. consistent

c. qualitative

d. optimized

The correct answer is: qualitative

Question

The number of item sets of cardinality 4 from the items lists {A, B, C, D, E}

Select one:

a. 20

b. 2

c. 10

d. 5

The correct answer is: 5

Question

Identify the example of Nominal attribute

Select one:

a. Salary

b. Temperature

c. Gender

d. Mass

The correct answer is: Gender

Question

Which of the following are descriptive data mining activities?

Select one:

a. Clustering

b. Deviation detection

c. Regression

d. Classification

The correct answer is: Clustering

Question

Which statement is not TRUE regarding a data mining task?

Select one:

a. Deviation detection is a predictive data mining task

b. Classification is a predictive data mining task

c. Clustering is a descriptive data mining task

d. Regression is a descriptive data mining task

The correct answer is: Regression is a descriptive data mining task

Question

Correlation analysis is used for

Select one:

a. identifying redundant attributes

b. eliminating noise

c. handling missing values

d. handling different data formats

The correct answer is: identifying redundant attributes

Question

In Binning, we first sort data and partition into (equal-frequency) bins and then which of the following is not a valid step

Select one:

a. smooth by bin boundaries

b. smooth by bin median

c. smooth by bin values

d. smooth by bin means

The correct answer is: smooth by bin values

Question

Which of the following is NOT data mining efficiency/scalability issue?

Select one:

a. The running time of a data mining algorithm

b. Incremental execution

c. Data partitioning

d. Easy to use user interface

The correct answer is: Easy to use user interface

Question

Synonym for data mining is

Select one:

a. Data Warehouse

b. Knowledge discovery in database

c. Business intelligence

d. OLAP

The correct answer is: Knowledge discovery in database

Question

Data scrubbing can be defined as

Select one:

a. Check field overloading

b. Delete redundant tuples

c. Use simple domain knowledge (e.g., postal code, spell-check) to detect errors and make ions

d. Analyzing data to discover rules and relationship to detect violators

The correct answer is: Use simple domain knowledge (e.g., postal code, spell-check) to detect errors and make ions

Question

Dimensionality reduction reduces the data set size by removing _________

Select one:

a. irrelevant attributes

b. composite attributes

c. derived attributes

d. relevant attributes

The correct answer is: irrelevant attributes

Question

In asymmetric attibute

Select one:

a. Range of values is important

b. No value is considered important over other values

c. Only non-zero value is important

d. All values are equals

The correct answer is: Only non-zero value is important

Question

Which of the following is not a data mining task?

Select one:

a. Feature Subset Detection

b. Regression

c. Sequential Pattern Discovery

d. Association Rule Discovery

The correct answer is: Feature Subset Detection

Question

Which of the following is NOT an example of data quality related issue?

Select one:

a. Using a field for different purposes

b. Contradicting values

c. Noise

d. Multiple date formats

The correct answer is: Multiple date formats

Question

Similarity is a numerical measure whose value is

Select one:

a. Higher when objects are more alike

b. Lower when objects are more alike

c. Increases with Minkowski distance

d. Higher when objects are not alike

The correct answer is: Higher when objects are more alike

Question

The dissimilarity between two data objects is

Select one:

a. Lower when objects are more alike

b. Higher when objects are more alike

c. Lower when objects are not alike

d. Applies only categorical attributes

The correct answer is: Lower when objects are more alike

Question

The important characteristics of structured data are

Select one:

a. Resolution, Distribution, Dimensionality ,Objects

b. Sparsity, Centroid, Distribution , Dimensionality

c. Dimensionality, Sparsity, Resolution, Distribution

d. Sparsity, Resolution, Distribution, Tuples

The correct answer is: Dimensionality, Sparsity, Resolution, Distribution

Question

Which of the following statement is not TRUE for a Tag Cloud

Select one:

a. Tag cloud is a visualization of statistics of user-generated tags

b. Tag cloud can be used for numeric data only

c. The importance of a tag is indicated by font size or color

d. Tags may be listed alphabetically in a tag cloud

The correct answer is: Tag cloud can be used for numeric data only

Question

Which of the following data mining task is known as Market Basket Analysis?

Select one:

a. Clasification

b. Regression

c. Association Analysis

d. Outlier Analysis

The correct answer is: Association Analysis

Question

Which of the following is not a Data discretization Method?

Select one:

a. Histogram analysis

b. Cluster Analysis

c. Data compression

d. Binning

The correct answer is: Data compression

Question

Which of the following activities is a data mining task?

Select one:

a. Monitoring the heart rate of a patient for abnormalities

b. Dividing the customers of a company according to their profitability

c. Extracting the frequencies of a sound wave

d. Predicting the outcomes of tossing a (fair) pair of dice

The correct answer is: Monitoring the heart rate of a patient for abnormalities

Question

Sorted data (attribute values ) for price are: 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34. Identify which is NOT a bin smoothed by boundaries?

Select one:

a. Bin 2: 21, 21, 25, 25

b. Bin 1: 4, 4, 4, 15

c. Bin 1: 4, 4, 15, 15

d. Bin 3: 26, 26, 26, 34

The correct answer is: Bin 1: 4, 4, 15, 15

Question
The difference between supervised learning and unsupervised learning is given by

Select one:
a. unlike unsupervised learning, supervised learning needs labeled data
b. unlike unsupervised learning, supervised learning can be used to detect outliers
c. unlike supervised leaning, unsupervised learning can form new classes
d. there is no difference

The Correct answer is: unlike unsupervised learning, supervised learning needs labeled data

Question

The Data Sets are made up of

Select one:

a. Data Objects

b. Attributes

c. Dimensions

d. Database

The correct answer is: Data Objects

Data Warehouse Reference - QnA

Waseian Comprehensive , Data Warehouse

Question.

How can you apply the data to the warehouse? What are the modes?
Answer:
Data may be applied in the following four different modes: load, append, destructive merge, and constructive merge. Let us understanding of the effect of applying data in each of these four modes:

Load: If the target table to be loaded already exists and data exists in the table, the load process wipes out the existing data and applies the data from the incoming file. If the table is already empty before loading, the load process simply applies the data from the incoming file.

Append:You may think of the append as an extension of the load. If data already exists in the table, the append process unconditionally adds the incoming data, preserving the existing data in the target table. When an incoming record is a duplicate of an already existing record, you may define how to handle an incoming duplicate. The incoming record may be allowed to be added as a duplicate. In the other option, the incoming duplicate record may be rejected during the append process.

Destructive Merge : Merge In this mode, you apply the incoming data to the target data. If the primary key of an incoming record matches with the key of an existing record, update the matching target record. If the incoming record is a new record without a match with any existing record, add the incoming record to the target table.

Constructive Merge: This mode is slightly different from the destructive merge. If the primary key of an incoming record matches with the key of an existing record, leave the existing record, add the incoming record, and mark the added record as superseding the old record.

Question.

Let's say that the data warehouse for Big_University consists of four dimension students, courses, semesters and trainers, and there are two measurements and avg_grade. At the lowest ideological level (eg, for a given student, curriculum, semester and trainer combination), avg_grade measures the student's actual course grade. At higher conceptual levels, avg_grade stores the average grade for the given combination. Draw a snowflake schema diagram.

Answer:

http://www.waseian.com/2018/08/data-warehouse-comprehensive2015-16.html

Question.

Based on current trends in technology need to design information systems . Explain the points to be taken care with respective traditional operational systems and the newer informational systems that need to be built?

Answer:

The essential reason for the lack of ability to provide strategic facts is that we have been trying all along to provide strategic facts from the operational systems. These operational systems such as command processing, record control, dues and claims processing, casualty billing, and so on are not planned or intended to deliver strategic information. If we need the skill to provide strategic data and information, we must get the information from overall different types of systems. Specially designed decision care systems or informational systems can deliver strategic information.

We find that in order to provide strategic information we need to build informational systems that are different from the operational systems we have been building to run the basic business. It will be worthless to continue to dip into the operational systems for strategic information as we have been doing in the past. As companies face fiercer competition and businesses become more complex, continuing the past practices will only lead to disaster.

Watching the wheels of business turn
Show me the top-selling products
Show me the problem regions
Tell me why (drill down)
Let me see other data (drill across)
Show the highest margins
Alert me when a district sells below target

We need to design and build informational systems

That serve different purposes
Whose scopes are different
Whose data content is different
Where the data usage patterns are different
Where the data access types are different

Question.

2-D data pulled out from the data cube.

Product ID	Location ID	Number Sold
1	1	10
1	3	6
2	1	5
2	2	22

Represent the above into 3-D format, focussing majorly on product-id and sales

Answer:

Product ID	Location ID			Total Sold
1	10	-	6	16
2	5	22	-	27
Total	15	22	6	43

Question.5
What is a OLAP cube?

Answer:

An OLAP data cube is a representation of data in multiple dimensions, using facts and dimensions. It is characterized by the combination of information according to it’s relationship. It can consist in a collection of 0 to many dimensions, representing specific data.

There are five basic operation to perform on these kind of data cubes:

Slicing
Dicing
Roll-Up
Drill-Up
Drill-Down
Pivoting

Question

Why is dimensional normalization not required?
Answer:

Dimensional normalization allows to solve database related problems. It is used to remove unnecessary features which are used as De-normalized dimensions. Dimensions have sub-dimensions which are added together. Due to this fact dimensional generalization is not used:

Data structure is more complex and which can cause performance to be degraded because it needs to be included in tables and relationships are retained
Query Performance suffers while collecting or retrieving multiple dimensional values It requires proper analysis and operational reports.
Space is not used properly and more space is needed.

Question.

What are the steps involved in creating dimensional modeling process?

Answer:

The business process of the dimensional modeling includes:

(a) Choose The Business Process: In this, 4-step design method is followed that helps to provide the usability of the dimensional model. This allows the business process to be more systematic in representation and more helpful in explaining it as well. It includes the use of Business Process Modelling Notation (BPMN) or Unified Modelling Language (UML).

(b)Declaring The Grain: After choosing the business process, the declaration of the model comes that consists of grains. The grain of the model provides the accurate description of the dimensional model and allows the focus should be shifted there.

(c)Identify The Dimensions:In this phase, the dimension is identified in the dimensional model. Dimensions are defined in cereals which are defined in the declaration part above. Dimensions acts as a foundation of the fact table where the data gets collected that comes under the fact.

(d) Identify The Facts: Defining the dimensions provides a way to create a table in which the fact data can be stored. These facts are populated on the basis of the numerical figures and facts.

Question.

Consider a data warehouse, where the fact data is calculated to be 36GB of data per year, and 4 years’ worth of data are to be kept online. The data is to be partitioned by month and four concurrent queries are to be allowed.
Compute the partition size, Temporary Space and Space Required for this scenario.

Answer:

Partition size P = 36GB per year / 12 = 3 GB
T = (2n +1)P = [(2 x 4) + 1]3 = 27 GB
F = 36GB X 4 years = 144 GB
Space Required = 3.5F + T = 3.5 X 144 + 27 = 531 GB

Question.

Discuss the merits and demerits of using views from the perspective of security of data warehouse.

Answer:

Views are easier option to define security initially. Later it will cause challenges.
Some of the common restrictions that may apply to the handling of views are:

restricted data manipulation language (DML) operations,
lost query optimization paths,
restrictions on parallel processing of view projections.

The use of views to enforce security will impose a maintenance overhead. In particular, if views are used to enforce restricted access to data tables and aggregations, as these changes, the views may also change.

Question.

For following statements, indicate True or False with proper justification:

A. It is a good practice to drop the indexes before the initial load.
True. Index entry creations during mass loads can be too time-consuming. So drop the indexes prior to the loads to make the loads go quicker. You may rebuild or regenerate the indexes when the loads are complete

B. The choice of index type depends on cardinality.
True. Bit-map index can be used only for low cardinality data

C. The importance of metadata is the same for data warehouse and an operational system.
False. In an operational system, users get information thru predefined screens and reports. In DW, users seek information thru ad-hoc queries.

D. Backing up the data warehouse is not necessary because you can recover data from the source systems.
False. Information in DW is accumulated over long periods and elaborately preprocessed

E. MPP is a shared-memory parallel hardware configuration.
False. MPP is a share-nothing hardware architecture.

October 23, 2018

Data Mining - MCQS

Question

August 04, 2018

Data Warehouse Reference - QnA

𝐒𝐞𝐚𝐫𝐜𝐡 𝐓𝐡𝐢𝐬 𝐁𝐥𝐨𝐠

𝐅𝐀𝐂𝐄𝐁𝐎𝐎𝐊

𝐓𝐨𝐭𝐚𝐥 𝐒𝐜𝐡𝐨𝐥𝐚𝐫

𝕽𝖊𝖈𝖊𝖓𝖙

𝕱𝖊𝖆𝖙𝖚𝖗𝖊𝖉

Systems Programming- MCQS

𝐁𝐥𝐨𝐠 𝐀𝐫𝐜𝐡𝐢𝐯𝐞

Categories

Recent Comments