May 20, 2018

Multimedia Computing MID Sem Solution


Question.  List three pattern substitution based compression algorithms. For each algorithm, give one application where the method is used with respect
to multimedia data.

Answer:
  • Repetitive Sequence Suppression.
  • Run-length Encoding.
  • Pattern Substitution.
Repetitive Sequence Suppression Example: 1 from Silence suppression in audio, ‘white space’ in text, simple uniform backgrounds in images.

Run-length Encoding : 1 from Computer graphics generated images, Faxes, part of JPEG (latter stage) pipeline.

Pattern Substitution : 1 from Pattern recognition/token substitution, Entropy coding (Huffman), LZW/GIF, vector quantization.

Question. What is the basic concept used in defining an Information Theoretic approach to data compression?

Answer:
The entropy of an information source S, defined as:
,
is the basis Information Theoretic compression algorithms.

Question: What advantages does the arithmetic coding algorithm offer over Huffman coding algorithm with respect to data compression?

Answer:
  • Good compression ratio (better than Huffman coding), entropy around the Shannon Ideal value. – Huffman coding uses an integer number of bits for each symbol,
    • hence k is never less than 1.
    • – Use decimal number of bits 
Disadvantages with the arithmetic coding algorithm: 
  • Memory: potentially large symbol tables needed.
  • Speed due possibly complex computations due to large symbol tables.
Question: Given the following Differential Pulse Code Modulated (DPCM) Sequence reconstruct the original signal.
+4 + 2 + 3 - 2 + 3 - 1 + 1 + 1

Answer
DPCM decoding: Simply start with accumulator zero for each number add the value to current accumulator, output accumulator value.
So solution is:
4 6 9 7 10 9 10 11

Question: Given the following Run Length Encoded (RLE) Sequence reconstruct the
original 2D 8x8 (binary) data array.
(0, 8),
(0, 1), (1, 1), (0, 4), (1, 1), (0, 1),
(0, 1), (1, 2), (0, 2), (1, 2), (0, 1),
(0, 1), (1, 6), (0, 1),
(0, 2), (1, 4), (0, 2),
(0, 3), (1, 2), (0, 3),
(0, 2), (1, 1), (0, 2), (1, 1), (0, 2),
(0, 1), (1, 1), (0, 4), (1, 1), (0, 1)

Answer:
The format of RLE is for each pair (colour, length) so just ‘parse’ each row, to expand colour to length number of values to get the solution:

Question:
Calculate the uncompressed digital output if a video signal is sampled using the following values: 25 frames per second, 160 x 120 pixels, True (Full) color depth.

Answer:
True color = 24 bits (3 bytes) per pixel

So  number of bytes per second is
24*160*120*25 =  11,520,000 bits

Question:
If a suitable CD stereo quality audio signal is included with the video signal in part (c), how long would it take for the signal to be transmitted on a 128 kbps channel?

Answer:
CD quality audio = 22,050 Hz. Sampling rate = 44,100 Hz.
Stereo  audio = 44100*2 * 16 bits  =1,411,200 bits per second

So uncompressed bytes stream is 11,520,000 + 1,411,200 =  12,931,200 bits per second. Channel is 128 kbps. Hence time taken = 101 seconds

Question:
For compressing an image for power point slide, which compression would you prefer-regular JPEG or JPEG 2000? Why?

Answer:
JPEG – 2000 – handles computer generated images (sharp edges etc.) better. Compression ratio is 20% higher than original JPEG

Question: What are the key differences between the JPEG and MPEG I-Frame compression
pipeline?

Answer:
Four main differences:
  • JPEG uses YIQ whilst MPEG use YUV (YCrCb) colour space.
  • MPEG used larger block size DCT windows 16 even 32 as opposed to JPEG’s 8.
  • Different quantization— MPEG usually uses a constant quantizationvalue.
  • Only Discrete Pulse Code Modulation (DPCM) on DC component in JPEG on zig zag scan. AC (JEPG) and complete zig zag scan get RLE.
Question: What processes give rise to the lossy nature of JPEG/MPEG video compression?

Answer:
Lossy steps:
  • Colour space subsampling in IQ or UV components.
  • Quantization reduces bits needed for DCT components. 
Question
(A) In MPEG audio compression, what is frequency masking?

Answer
When an audio signal consists of multiple frequencies the sensitivity of the ear changes with the relative amplitude of the signals. If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard.

(B) Briefly describe the cause of frequency masking in the human auditory system?

Answer:
Frequency Masking:
  • Stereocilia in inner ear get excited as fluid pressure waves flow over them.
  • Stereocilia of different length and tightness on Basilar membrane so resonate in sympathy to different frequencies of fluid waves (banks of stereocilia at each frequency band).
  • Stereocilia already excited by a frequency cannot be further excited by a lower amplitude near frequency wave.
(C) In MPEG audio compression, what is temporal masking?

Answer:
After the ear hears a loud sound, consisting of multiple frequencies, it takes a further short while before it can hear a quieter sound close in frequency.

(D) Briefly describe the cause of temporal masking in the human auditory system?

Answer:
  • (Like frequency masking) Stereocilia in inner ear get excited as fluid pressure waves flow over them and respond to different frequencies.
  • Stereocilia already excited by a certain frequency will take a while to return to rest state, as inner ear is a closed fluid chamber and pressure waves will eventually dampen down.
  • Similar to frequency masking Stereocilia in a ’dampening state’ may not respond to a a lower amplitude near frequency wave.
(E) Briefly describe, using a suitable diagram if necessary, the MPEG-1 audio compression algorithm, outlining how frequency masking and temporal masking are encoded.

Answer:
MPEG audio compression basically works by:
  • Dividing the audio signal up into a set of frequency subbands (Filtering)
  • Use filter banks to achieve this.
  • Subbands approximate critical bands.
  • Each band quantised according to the audibility of quantization noise.
Frequency masking and temporal masking are encoded by:

Frequency Masking: MPEG Audio encodes this by quantising each filter bank with adaptive values from neighbouring bands energy, defined by a look up table.

Temporal Masking: Not so easy to model as frequency masking. MP3 achieves this with a 50% overlap between successive transform windows gives window sizes of 36 or 12 and applies basic frequency masking as above.

Question: Construct coding tree for ROADROADIES by Shannon-Fano Algorithm and calculate the entropy. Get the Compression Ratio also for the same.

Answer:
Shannon-Fano Algorithm: 
  1. Sort the symbols according to the frequency count of their occurrences.
  2. Recursively divide the symbols into two parts, each with approximately the same number of counts, until all the parts contain only 1 symbol.
  • Below are the letters with their occurrence counts.[occurrence counts should be in descending form]



log2(1/pi) is calculated as:
Ex: For symbol R:
pi=Count/total number of counts = 2/11=0.18181
log2(1/0.18181) = 2.459 approx.

Code is calculated as:
Ex: Traversing the tree from top to down.

No. of bits used is calculated as:
Ex: For symbol R:> count*code[number of bits/digits]= 2*1=2
For symbol I:> count*code[number of bits]= 1*5=5

Calculate Entropy:



=2.7312

Hence, 2.7312 number of bits are required to represent each symbol. [It can be asked as, What is the average number of bits needed for each character?]

Compression Ratio:  This can be found as:


Default ASCII encoding: 8 bits per symbol
Result after Shannon Fano Coding: 2.7312 bits per symbol

Compression Ratio = (11*8)/(11*2.7312) = 2.929

Question:
Consider the following uncompressed sequence of character bytes: AAAABBBBCAAAAACCCCCAAAAABBBBBBDDDDDD. Encode this sequence with Run-Length Coding. Also calculate the compression ratio.

Solution:
A4B4CA5C5A5B6D6


Compression ratio = 36/15=2.4

Question: Construct Huffman tree to encode the table below. What is the average number of bits needed for each character?                                                                                           
Symbol
A
B
C
D
E
F
G
Frequency
28
10
20
13
5
15
9

Answer: Huffman CodingHuffman coding is a lossless data compression algorithm. The most frequent character gets the smallest code and the least frequent character gets the largest code.

Steps to build Huffman Tree

Step 1. Build a min heap that contains 7 nodes where each node represents root of a tree with single node.

Symbol    Frequency
    E           5
    G           9
    B           10
    D           13
    F           15
    C           20
    A          28

Step 2. Extract two minimum frequency nodes from min heap. Add a new internal node with frequency 5 + 9 = 14.

Now min heap contains 6 nodes where 5 nodes are roots of trees with single element each, and one heap node is root of tree with 3 elements

B           10
D           13
Node     14
F           15
C           20
A           28

Step 3: Extract two minimum frequency nodes from heap. Add a new internal node with frequency 10 + 13 = 23

Now min heap contains 5 nodes where 3 nodes are roots of trees with single element each, and two heap nodes are root of tree with more than one nodes.

Node     14
F           15
C           20
Node     23
A           28

Step 4: Extract two minimum frequency nodes. Add a new internal node with frequency 14 + 15 = 29

Now min heap contains 4 nodes where 2 nodes are roots of trees with single element each, and two heap nodes are root of tree with more than one nodes.

C           20
Node     23
A           28
Node     29

Step 5: Extract two minimum frequency nodes. Add a new internal node with frequency 20 + 23 = 43

Now min heap contains 3 nodes.

A      28
Node 29
Node 43

Step 6: Extract two minimum frequency nodes. Add a new internal node with frequency 28 + 29 = 57

Now min heap contains 2 nodes.

Node 43
Node 57

Step 7: Extract two minimum frequency nodes. Add a new internal node with frequency 43 + 57 = 100


Now min heap contains only one node.

Node 100

Since the heap contains only one node, the algorithm stops here.

Traverse the tree formed starting from the root. Maintain an auxiliary array. While moving to the left child, write 0 to the array. While moving to the right child, write 1 to the array


The codes are as follows:


Symbol   code
    C          0
    B          10
    D          11
    A          10
    E          1100
    G          1101
    F          111

Average number of bits needed for each character

Entropy





2.6282 average number of bits are required to represent each symbol.

Question:
Suppose eight characters have a distribution A:(1), B:(1), C:(1), D:(2), E:(3), F:(5), G:(5), H:(10). Draw a Huffman tree for this distribution.

Solution:

Question:
Perform run length encoding for the following and calculate compression ratio AC0000AAAAB00000000A0AAA00000.

Solution:
Run length encoding
Before Encoding : AC0000AAAAB00000000A0AAA00000
After Encoding :   1A1C404A1B801A103A50
     
Compression Ratio
 Compression Ratio = Total number of bits before encoding/Total number of bits after encoding

Compression Ratio  = 29/20 = 1.45

Question:
Answer the following

a. Why is the display order and encoding order of MPEG frames different ? Given the display order of frames, find the coding order.  I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) I(11)
b. State the difference between frequency masking and temporal masking in MPEG audio. How are these two encoded in MPEG audio?
c. If the block size for a 2D DCT transform is 8 X8, and we use only the DC components to create a thumbnail image, what fraction of the original pixels would we be using?

Solution:

a.
P and B frames are much more complicated, since they depend on other frames.   So we have to buffer a previous frame and a future frame to decode P and B frames. The name "future frame" indicates that the frame should be displayed after the current frame.  However, it is actually encoded before the current frame. So the display order of frames in a MPEG sequence is different from the decoding order. Coding order : I(1)P(4)B(2)B(3)P(7)B(5)B(6)P(10)B(7)B(8)I(11)

b.
frequency masking:
When an audio signal consists of multiple frequencies the sensitivity of the ear changes with the relative amplitude of the signals. If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard.

temporal masking:

  • After the ear hears a loud sound, consisting of multiple frequencies, it takes a further short while before it can hear a quieter sound close in frequency. MPEG Audio encodes frequency masking by quantizing each filter bank with adaptive values from neighboring bands, defined by a look up table. 
  • Achieved with a 50% overlap between successive transform windows and then applying frequency masking

c. 1/64, because each 8 x8 block only has one DC

May 13, 2018

Multimedia Computing - MCQS


Which one of the following is the characteristic of a multimedia system?
A. high data rates
B. high storage
C. none of the mentioned
D. both (a) and (b)
ANSWER: both (a) and (b)

Color, Coordinate, Normal, Orientation, Position and scalar in animations are called as
A. Tools
B. Animators
C. Specifiers
D. Interpolators

The correct answer is: Interpolators

In graphical system, Hardware used to store Bitmap is
A. Processor
B. Frame buffer
C. Memory
D. None

The correct answer is: Frame buffer

On a black and white system with one bit per pixel, the frame buffer is commonly called as
A. Bitmap
B. Multi map
C. Pix map
D. Grayscale

The correct answer is: Bitmap

The process used to calculate patterns of dots such that values from 0 to 255 correspond to patterns that are more and more filled at darker pixel values is known as

A. Dithering
B. Halftone
C. Colour Matrix
D. Aliasing

The correct answer is: Dithering

Alpha Channel is not supported by
A. PNG
B. GIF
C. TIFF
D. BMP

The correct answer is: BMP

Bar graph representing the distribution of numerical data and an estimate of the probability distribution of a continuous variable
A. CLUT
B. Palette
C. Histogram
D. Gamut

The correct answer is: Histogram

This law states that "Colour matching is linear" and this means that linear combinations of lights made up of three primaries are just the linear set of weights used to make the combination times those primaries.
A. Weber's Law
B. White Point Correction Law
C. Grassman's Law
D. Gamma Correction Law

The correct answer is: Grassman's Law

The CIE colour model was developed to be 
A. completely independent of any device or other means of emission or reproduction and is based as closely     as possible on how humans perceive colour.
B. partially independent of any device or other means of emission or reproduction and is based as rarely as                possible on how humans reject colour
C. completely dependent of any device or other means of emission or reproduction and is based as rarely                     as possible on how humans reject colour.
D. completely dependent of any device or other means of emission or reproduction and is based as closely                   as possible on how humans perceive colour

The correct answer is: completely independent of any device or other means of emission or reproduction and is based as closely as possible on how humans perceive colour.


For mapping tri-stimulus values or chromaticity coordinates in image capture, encoding, or reproduction
A. Black Point Correction is used
B. White Point Correction is used
C. Gamma Correction is used
D. CIE Correction is used

The correct answer is: White Point Correction is used

Luma in a video refers to the

A. Select one:
B. Resolution
C. Brightness
D. Wavelength

The correct answer is: Brightness

-----Is any combination of text, graphics art, sounds, animation and video delivered to you by computer or electronic device?
A. network
B. Hyper media
C. Multimedia
D. Visual media

The correct answer is: Multimedia

I in HSI Colour Coordinate Scheme stands for
A. Independent
B. Interactive
C. Intensity
D. Image

The correct answer is: Intensity

which algorithm uses character frequency for compression
None
A. Huffman coding
B. Arithmetic coding
C. LZW

The correct answer is: Huffman coding

Which technique performs decorrelation of input signall in a data independent manner

A. Huffman
B. Quantization
C. None
D. Discrete Cosine Transform

The correct answer is: Discrete Cosine Transform


Which algorithm uses fixed length codewords to represent variable length strings of symbols/ characters
A. LZW
B. Huffman
C. Shannon-Fan
D. Arithmetic

The correct answer is: LZW


A _______ can be added to your presentation and then used to go to a variety of locations ---- for example, a web address, an e-mail address, a custom show or document, just to name a few.

A. menulink
B. slidelink
C. hyperlink

The correct answer is: hyperlink


A smaller version of an image is called a:

A. thumbnail
B. clipart
C. portable network graphic
D. bitmap

The correct answer is: thumbnail

Compression ratio is the ratio of :
A. None of these
B. compressed file size to the original file size
C. the number of pixels in a frame of the original size to those in a frame of the compressed file
D. the original file size to the size of the compressed file

The correct answer is: the original file size to the size of the compressed file

Lossy techniques provide ___________ when compared to lossless techniques
A. None of these
B. lower compression ratios
C. much higher compression ratios
D. similar compression ratios

The correct answer is: much higher compression ratios

The basic idea behind Huffman coding is to
A. compress data by using fewer bits to encode fewer frequently repeating characters
B. expand data by using fewer bits to encode more frequently repeating characters
C.compress data by using fewer bits to encode more frequently repeating characters
D. compress data by using more bits to encode more frequently repeating characters

The correct answer is: compress data by using fewer bits to encode more frequently repeating characters

A Huffman code: A = 1, B = 000, C = 001, D = 01, P(A) = 0.4, P(B) = 0.1, P(C) = 0.2, P(D) = 0.3 The average number of bits per letter is
A. 2.0 bit
B. 2.1 bit
C. 8.0 bit
D. 1.9 bit

The correct answer is: 1.9 bit

XML stands for

A. EXtra Multi Language
B. EXtensible Making Language
C. EXprimental Markup Language
D. EXtensible Markup Language

The correct answer is: EXtensible Markup Language

The idea with wavelets is to represent a complicated function by
A. lines
B. square functions
C. simple basic functions
D. sinus functions

The correct answer is: simple basic functions

Multimedia elements are typically sewn together into a project using ____
A. Authoring tools
B. Audio tools
C. Multimedia tools
D. Video tool

The correct answer is: Authoring tools

---------Is used for building multimedia-enabled websites as well asIntrenet applications in HTML,XML,and other formats
A. Lingo script
B. VRML
C. Macromedia flash
D. Dreamweaver

The correct answer is: Dreamweaver

----- allows you to trigger events such as moving to a different keyframe or requiring the movie to stop

A. 3D sprite
B. Tweening
C. Action Script
D. Macromedia director

The correct answer is: Action Script

-----is sophisticated Pc based program for editing WAV files
A. Cubase
B. Sound forge
C. Pro Tools
D. Cool edit

The correct answer is: Sound forge

This permits specifying temporarily scripted interaction among any media types and user input
A. SMIL Correct
B. HTML
C. XML
D. XSL

The correct answer is: SMIL

This is a powerful modeling,animation,and rendering package for animation and special effects in films and games.
A. Maya
B. Render Man
C. 3D studio max
D. Softimage XSI

The correct answer is: Softimage XSI

Question.
The sample rate of Au file format is

Select one:
a. 5700Hz originally headerless
b. 8KHz originally headerless
c. 12KHz originally with header
d. 10000Hz originally with header

Answer: 8KHz originally headerless

Question.
In which type of streaming multimedia file is delivered to the client, but not shared?

Select one:
a. Compression
b. Real-time streaming
c. Progressive download
d. None

Answer: Real-time streaming

Question.
Images included in many software titles are called _________.

Select one:
a. .jpg files
b. .tiff files
c. Popups
d. Clipart

Answer: Clipart

Question.
Hardware that creates sound from a mathematical representation

Select one:
a. Synthesizer
b. Speaker
c. Stampers
d. None

Answer: Synthesizer

Question.
DPCM result for the following series +2    +4    -3    -2    +9    +7    -6    -5 is

Select one:
a. +2    +6    +9    +1    +10    +16    +11    +6
b. +4   +6    +3    +1    +10    +17    +11    +6
c. +2    +6    +3    +1    -10    +17    +11    +6
d. +2    +6    +3    +1    +10    +17    +11    +6

Answer: +2    +6    +3    +1    +10    +17    +11    +6

Question.
H.323 uses G.71 or G.723.1 for

Select one:
a. Compression
b. Communication
c. Conferencing
d. Controlling

Answer: Compression

Question.
CD-I (Compact Disc-Interactive) standard defines the storage of interactive multimedia data while follows

Select one:
a. The Green Book
b. The Red Book
c. The Blue Book
d. The Black Book

Answer: The Green Book

Question.
----- is the  process  of applying a bank of band-pass filters to the analog signal

Select one:
a. Superband filtering
b. None
c. Low-pass filter
d. Subband filtering

Answer: Subband filtering

Question.
Find the Coding and transmission order of IBBPBBPBBI Display order

Select one:
a. IPBBBPBIBB
b. IPBBPBBPBB
c. IPBBPBBIBB
d. IPBBPBBBBI

Answer: IPBBPBBIBB

Question.
Nyquist rate defined for correct sampling is

Select one:
a. Sampling rate equal to at least half the maximum frequency
b. Sampling rate equal to at least equal the maximum frequency
c. Sampling rate equal to at least thrice the maximum frequency
d. Sampling rate equal to at least twice the maximum frequency

Answer: Sampling rate equal to at least twice the maximum frequency

Question.
Multimedia system require hard real time scheduling

Select one:
a. for security
b. to ensure critical tasks will be serviced within timing deadlines
c. to minimize the delay
d. to deliver the media file to the client

Answer: to ensure critical tasks will be serviced within timing deadlines

Question.
In Composite Video, colour and intensity signals are mixed into

Select one:
a. Multiple carrier wave
b. Double carrier wave
c. A single carrier wave
d. Mixed carrier wave

Answer: A single carrier wave

Question.
Find out the correct statement

Select one:
a. Temporal redundancy is included in P frame coding as well as I frame coding
b. Temporal redundancy is included in P frame coding whereas I frame coding only includes Spatial redundancy removal
c. Temporal redundancy is included in I frame coding whereas P frame coding only includes Spatial redundancy removal
d. Spatial redundancy is included in P frame coding as well as I frame coding

Answer: Temporal redundancy is included in P frame coding whereas I frame coding only includes Spatial redundancy removal

Question.
Interleaving the audio and video segments of a video clip together in a data file is:

Select one:
a. Flattening
b. Helical scan
c. Flare
d. None

Answer: Flattening

Question.
A video consists of a sequence of

Select one:
a. Slots
b. Frames
c. Signals
d. Packets

Answer: Frames

Question.
Adding _________ to objects on your slides not only controls the flow of information, but adds interest to your presentation

Select one:
a. Background
b. Animation
c. Transition
d. Popups

Answer: Animation

Question.
CD-XA allows the storage of

Select one:
a. Digital Audio, Text, Graphics and Video
b. Only Audio Data
c. Only Text Data
d. Only Video Data

Answer: Digital Audio, Text, Graphics and Video

Question.
The text color in a presentation should contrast with the ________ color.

Select one:
a. CPU
b. Frame
c. Stack
d. Background

Answer: Background

Question.
Component Video uses three separate video signals for the

Select one:
a. Red, Yellow and Blue Planes
b. Red, Green and Blue Planes
c. Cyan, Green and Blue Planes
d. Red, Green and Magenta Planes

Answer: Red, Green and Blue Planes

Question.
If frames are displayed on screen fast enough, we get an impression of

Select one:
a. Motion
b. Signals
c. Bits
d. Packets

Answer: Motion

Question.
To receive signal, a translator is needed to decode signal and encode it again at a

A.High Quality.
B.Lower Quality.
C.Same Quality.
D.Bad Quality.

Answer: Lower Quality

Question.
Session Initiation Protocol (SIP), is very

A.Independent.
B.Flexible.
C.Important.
D.Layered.

Answer: Flexible

Question.
Establishing a session in Session Initiation Protocol (SIP), requires a three-way

A.Protocols.
B.System.
C.Ports.
D.Handshake.

Answer: Handshake

Question.
Moving Picture Experts Group (MPEG) is used to compress

A.Frames.
B.Images.
C.Audio.
D.Video.

Answer: Video

Question.
A combination of an encryption algorithm and a decryption algorithm is called a

A.plain text.
B.cipher.
C.original text.
D.shift cipher.

Answer: cipher

Question.
Most common compression technique that is used to create CD-quality audio is based on perceptual encoding technique is called

A.Predictive Encoding.
B.Perceptual Encoding.
C.MPEG.
D.JPEG.

Answer: Perceptual Encoding.

Question.
In Audio and Video Compression, each frame is divided into small grids, called picture elements or

A.Frame.
B.Packets.
C.Pixels.
D.Mega Pixels.

Answer: Pixels

Question.
 ----- is the  process  of applying a bank of band-pass filters to the analog signal

Select one:
a. Superband filtering
b. Subband filtering
c. Low-pass filter
d. None

Answer: Subband filtering

May 06, 2018

Data Warehouse - MCQS

Question.
The data Warehouse is______.
A. read write only.
B. write only.
C. read only.
D. none.
ANSWER: A

Question.
__________ is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of
management decisions.
A. Web Mining.
B. Data Warehousing.
C. Data Mining.
D. Text Mining.

ANSWER: B

Question.
If you have to summarize data as per hierarchy in dimension attributes, which is appropriate SQL clause?

Select one:
SUM
HAVING
WHERE
ROLLUP

Answer: ROLLUP

Question.
Which of the following is Apache data warehouse?

Select one:
Hbase
Sqoop
HDFS
Hive

Answer: Hive

Question.
A Data Mart whose source is directly from transactional systems, legacy applications, or external data feeds

Select one:
Dependent Data Mart
Intra-entry Data Mart
Independent Data Mart
Inter-entry Data Mart

Answer: Independent Data Mart

Question.
In a star schema, the number of rows in a dimension table is

Select one:
1. Larger than the fact table
2. Smaller than the fact table
3. Same as in the fact table

4. None of the options


AnswerSmaller than the fact table

Question.
The difference between ROLLUP and CUBE in SQL is

Select one:
CUBE summarizes for all combinations while ROLLUP does along hierarchy
CUBE summarizes along a hierarchy while ROLLUP does for all combinations
CUBE and ROLLUP handle NULLs differently
CUBE and ROLLUP are the same

Answer: CUBE summarizes for all combinations while ROLLUP does along hierarchy

Question.
Data scrubbing is which of the following?

Select one:
A process to load the data in the data warehouse and to create the necessary indexes
A process to upgrade the quality of data before it is moved into a data warehouse
A process to aggregate data after it is moved into a data warehouse
A process to reject data from the data warehouse and to create the necessary indexes


Answer: A process to upgrade the quality of data before it is moved into a data warehouse

Question.
Which of the following is NOT a kind of data warehouse application?

Select one:
Data mining
Analytical processing
Information processing
Transaction processing

AnswerTransaction processing

Question.
Which one of the following is NOT a characteristic of metadata?

Select one:
Self-describing
Includes user data
Data about data
Helps user to build ad-hoc queries

Answer: Includes user data

Question.
A _______ is a table on disk that contains the result set of a query

Select one:
Materialized View
Pivot
View
Table

AnswerMaterialized View

Question.
Which one is NOT the Codd’s rule for OLAP?

Select one:
Restricted cross dimensional operation
Transparency
Consistent Reporting Performance
Intuitive Data Manipulation

Answer: Restricted cross dimensional operation

Question.
A bitmap index is most appropriate on which of these columns

Select one:
Salary
Gender
Date of Birth
Name

Answer: Gender

Question.
SQL Windowing is based on

Select one:
Partitions defined for the table
Indexes
Materialized Views
Partitions defined in query

Answer: Partitions defined in query

Question.
The SQL analytical function to divide records into equal sized groups is

Select one:
PERCENTILE
GROUP BY
NTILE
QUARTILE

Answer: NTILE

Question.
The operation of moving from coarser granularity data to a finer granularity  is called

Select one:
Drill down
Dice
Rollup
Slice


AnswerDrill down

Question.
Views in SQL are of the type

Select one:
Always Physical
Sometimes physical
Always Virtual
Always up-to-date

AnswerSometimes physical

Question.
MOLAP stores aggregate values in which of the following structure?

Select one:
Table
Index
View
Array

AnswerArray

Question.
Which of the following OLAP techniques offers best latency?

Select one:
ROLAP
MOLAP
DOLAP
HOLAP

Answer: ROLAP

Question.
Aggregate navigator is based on

Select one:
Indexes
Summary tables
Logical Views
Materialized Views

AnswerMaterialized Views

Question.
Summarization for a table is to be done using 4 attributes. How many SQL statements using GROUP BY are required to produce equivalent results?

Select one:
4
8
64
16

Answer: 16

Question.
Difference between Rank and Sort in SQL is

Select one:
Sort allows aggregates while Rank does not
Rank gives a separate sequence of numbers
Sort is based on attribute while Rank is not
They are same

AnswerRank gives a separate sequence of numbers

Question.
Dimensions D1 and D2 are not considered conforming if

Select one:
D1 columns are subset of D2
D1 identical to D2
D1 and D2 are present on the same server
D1 is rolled up subset of D2

AnswerD1 and D2 are present on the same server

Question.
In SQL, what is the difference between GROUP BY and PARTITION BY clauses ?

Select one:
They are same.
SELECT clause
Type of attributes
WHERE clause

AnswerWHERE clause

Question.
When were analytic functions introduced in SQL?

Select one:
2011
1999
1992
2003


Answer2003

Question.
What is the difference between Rank and Dense_Rank in SQL?

Select one:
Rank and Dense_Rank are the same
Dense_Rank uses consecutive numbers
Dense_Rank hides some results
Dense_Rank rejects duplicates


Answer: Dense_Rank uses consecutive numbers

Question.
The type of relationship in star schema from dimension to fact is

Select one:
Many-to-Many
One-to-Many
One-to-One
Many-to-one 


Answer: one-to-many 

Question.
A table can not be partitioned by

Select one:
Hash
List
Range
Rank

AnswerRank

Question.
A snowflake schema is which of the following type of table?

Select one:
1. Fact
2. Normalized Dimension
3. Factless Dimension
4. All of the above

Answer: All of the above

Question
Enterprise Data Warehouse is a _________________warehouse.

Select one:
1. Public
2. Non Centralized
3. Centralized
4. Private

Answer: Centralized

Question
Which of the following is not a data component of a data warehouse?

Select one:
1. Component Key.
2. Metadata
3. Current detail data.

4. Lightly summarized data.

Answer: Component Key

Question
Data warehouses commonly use a _______________key to uniquely identify an entity for various purposes.

Select one:
1. Composite
2. Primary
3. Super
4. Surrogate

Answer: Surrogate

April 26, 2018

Important QnA on Data Warehouse - Mid Sem Excercise


Text Books
T1 Ponniah P, “Data Warehousing Fundamentals”, Wiley Student Edition, 2012
T2 Kimball R, “The Data Warehouse Toolkit”, 3e, John Wiley, 2013

Question.1
Match the following :

Answer:
Nonvolatile data
Read only
Dual data granularity
Details and summary
Dependent data mart
Data from main DW
Disparate data
Because of multiple sources
Decision support
Subject-oriented
Data staging
Workbench for data Integration
Data mining
Knowledge discovery
Meta data
Roadmap for users
Operational systems
Application flavor
Internal data
Private spreadsheets

Question.2
Classify metadata for following as Data Acquisition, Data Storage, or Information Delivery metadata.

1) Data Archival Rules
2) Predefined queries and reports
3) Source system from which data come
4) Information about last successful update
5) Types of aggregations
6) Information about data transformation rules

Answer
1) Data Archival Rules : Data Storage
2) Predefined queries and reports : Information Delivery 
3) Source system from which data come : Data Acquisition 
4) Information about last successful update: Data Storage
5) Types of aggregations: Data Acquisition
6) Information about data transformation rules : Data Acquisition 

Question.3
Provide short answers to the following : 

a. How do you differentiate a Data Warehouse from an application database?
b. What is a conformed dimension?
c. Differentiate initial load, incremental load, and full refresh
d. How can MapReduce processing help DW?
e. What is the difference between WINDOW and GROUP BY clauses in SQL SELECT statements?

Answer
A. Differentiate a Data Warehouse from an application database:

Database:
  • It’s normally used for online transaction processing (OLTP) however it can be used for other purposes such as data warehousing. It records data from the user for history.
  • Tables and joins are complex because they are generalized (for RDMS). This is done to reduce unnecessary data and save storage space.
  • Optimized for write operation.
Data Warehouse:
  • It’s normally used for used for Online Analytical Processing (OLAP). It reads the historical data for the Users for business decisions.
  • Optimized for read operations.
  • Tables and joins are simple since they are De-normalized. It is done to reduce the response time for analytical queries.
Answer
BConformed dimension:
  • A conformed dimension is a dimension that has exactly the same meaning and content when being referred from different fact tables. 
  • It can refer to multiple tables in multiple data marts within within a single organization.
  • Example, Except for the primary dimension, two dimension tables are exactly the same, which are not considered to be conformed dimensions.
Answer
C. Differentiate between initial load, full refresh and incremental load

Initial Load : It is the process of populating data warehousing tables for the very first time.

full refresh : When loading the data for the first time, all set records are loaded on a stretch based on the volume. It wipes and reloads all the contents of tables with fresh data.

Incremental Load : In this load we apply dynamic changes when needed in a specific period. The schedule is prescribed for each period (to get the latest record)

Answer
D
MapReduce programming paradigm is very powerful and flexible, It has been widely used for parallel computing in data-intensive areas due to its good scalability.

How MapReduce processing help DW:

* MapReduce makes it easier to write a distributed computing program by providing
  • inter-process communication
  • fault-tolerance
  • load balancing
  • and task scheduling.
* MapReduce integration allows fully parallel, distributed jobs against your data with locality awareness[Batch processing].
* This reduces difficult programming work, for the user only has to create a configuration file with some lines of code to declare the dimensions and fact objects and necessary change functions.

Answer
E. Difference between WINDOW and GROUP BY clauses in SQL SELECT statements:
  • Group by is an aggregate whereas over() is a window function.
  • When a group by clause is used all the columns in the select list should either be in group by or should be in an aggregate function. 
  • While in over() clause, we don't have any such restriction. Aggregates or other functions can be calculated using over function along-with other columns in the select list.
Question.4
For following statements, indicate true or false with proper  justification:

A. It is a good practice to drop the indexes before the initial load.                    
B. It is equally important to include unstructured data along with structured data in a data
Warehouse
C. Technical architecture consists of the vendor tools.
D. The importance of metadata is the same in a data warehouse as it is in an operational
system.
E. Backing up the data warehouse is not necessary because you can recover data from the source systems.
F. Metadata standards facilitate metadata interchange among tools.

Answer:
A. True: Index entry creation during mass load can be very time consuming. Therefore, to speed up load, drop the index before loading. You can rebuild or reproduce the index when the load is completed.

B. True: Both types of data are vital in the modern digital enterprise, but they must be managed differently. Because most of the business-relevant information originates in unstructured form.

C.True: If the vendor's investments in new technologies can be leveraged, their customers do not have to solve these same problems. Buying the new package is to take the advantages of the lessons the vendor have learned in how to apply the new technologies.

D. True: In an operational system, users get information thru predefined screens and reports. In DW, users seek information thru ad-hoc queries.

E. False: Information in DW is accumulated over long periods and elaborately prepossessed

F. True: We may find that the system vendor suggests selecting parts of different schemas and constructing a pack and mix schema of your own. This is known as pragmatic approach, we need to understand of data loss through truncation and mapping. in this situation Metadata standard facilitate the interchange between system rather than describe a form of storage.

Question.5
While designing fact table it is being advised to use non-intelligent keys; but while designing summary table it is advised to use intelligent key. Why ?

Answer:
Unlike fact tables, summary tables are re-created on a regular basis, possibly every time new data is loaded. The purpose of the summary is to speed up queries, and reduce data joins with dimension data takes significant amounts of time.

Question. 6
List some challenges (at least 5) in ETL testing.

Answer:
ETL: Stands for Extract-Transform-Load and a typical process of loading data from a source system to the actual data warehouse/other data integration projects.

ETL Testing: It is done to ensure that the data that has been loaded from a source to the destination after business transformation is accurate.

ETL Testing Challenges
  1. Lack of proper flow of business information  
  2. Existence of many ambiguous software requirements.
  3. Loss of data might be there during the ETL process
  4. Production sample data is not a true representation of all possible business processes.
  5.  Existence of apparent trouble acquiring and building test data
  6. It requires SQL programming: This has become a major issue for the testers because they are manual testers and have limited SQL coding skills, thus making data testing very difficult.
Question.7
Given a SALES table with attributes(Product, Region, Sales), where each row represents sales of a product in a region. Using WINDOW functions, Write SQL queries that show

A. Sale of each Product in each region along with average sales over all regions, products in table
B. Sale of each product in each region along with average sales of the product among all regions.

Answer:
Window functions:
Window functions are similar to aggregate functions, but there is one important difference. When we use aggregate functions with the GROUP BY clause, we “lose” the individual rows. The function is performed on the rows as an entire group. This is not the case when we use window functions, we can generate a result set with some attributes of an individual row together with the results of the window function.

A. Sale of each Product in each region along with average sales over all regions, products in table

SELECT Product,
       Region,
       AVG(Sales) OVER(PARTITION BY Region) AS Average_Sales
  FROM SALES;

B. Sale of each product in each region along with average sales of the product among all regions.

SELECT Product,
       Region,
       AVG(Sales) OVER(PARTITION BY Product) AS Average_Sales
  FROM SALES;

Question.8
Consider a retail store data cube with dimensions Product, Date, and Store, and the measure attribute Sales
a. Draw the lattice of cuboids for this data cube without considering hierarchies.
b. Suppose that the dimensional attributes have the following hierarchies defined. How many cuboids would the data cube contain for this set of hierarchies?

Answer:

A.
Drawing a lattice of cuboids for the dimension


B. Note: we are note sure about answer. Kindly post in comment box, if you have a correct answer.

Question.9
Consider the following relation on Stock Trading.    

Record Id
Security Name
Trading Volume
Closing Price
Exchange
1
SBI
55075000
275.5
NSE
2
ICICI
23500000
285.25
BSE
3
Axis Bank
4567895
345.45
BSE
4
Bank of Baroda
32700
178
BSE
5
Yes Bank
670000
875.75
BSE
6
Kotak Mahindra Bank
50750000
945.7
NSE
7
HDFC Bank
45730000
955.9
BSE
8
Allahabad Bank
45000
156.3
BSE
9
Karnataka Bank
5600
145.3
BSE
10
Syndicate Bank
746000
135.7
NSE
11
Canara Bank
34670
245.6
NSE
12
Indusnd Bank
44550000
456.8
NSE

a. Construct a bitmap index for the attributes Exchange and Trading Volume for this table. 
b. Indicate how these two bitmap-indices can be used to answer the query: Select all those tuples which traded in NSE and that have a trading volume of more than 30 million shares.

Answer:

a.
Bitmap on Exchange
NSE
1
0
0
0
0
1
0
0
0
1
1
1
BSE
0
1
1
1
1
0
1
1
1
0
0
0

Bitmap on Trading Volume

55075000
1
0
0
0
0
0
0
0
0
0
0
0
23500000
0
1
0
0
0
0
0
0
0
0
0
0
4567895
0
0
1
0
0
0
0
0
0
0
0
0
32700
0
0
0
1
0
0
0
0
0
0
0
0
670000
0
0
0
0
1
0
0
0
0
0
0
0
50750000
0
0
0
0
0
1
0
0
0
0
0
0
45730000
0
0
0
0
0
0
1
0
0
0
0
0
45000
0
0
0
0
0
0
0
1
0
0
0
0
5600
0
0
0
0
0
0
0
0
1
0
0
0
746000
0
0
0
0
0
0
0
0
0
1
0
0
34670
0
0
0
0
0
0
0
0
0
0
1
0
44550000
0
0
0
0
0
0
0
0
0
0
0
1

b.
NSE: (1 0 0 0 0 1 0 0 0 1 1 1)
TRADING VOLUME: (1 0 0 0 0 1 1 0 0 0 0 1)

We notice that the 1st, 6th and 12th bit in our question are traded at both the condition. We simply AND both bitmaps together.

Solution:
NSE: (1 0 0 0 0 1 0 0 0 1 1 1)
TdV: (1 0 0 0 0 1 1 0 0 0 0 1) AND
------------------------------------------
         (1 0 0 0 0 1 0 0 0 0 0 1)

Given that the 1st, the 6th and the 12th bits of the resulting bitmap are set to 1, we can know that these stocks are traded in NSE and that have a trading volume of more than 30 million shares.

NOTE: BELOW IS THE EXAMPLE FOR THE SAME QUESTION, UNDERSTAND AND GET THE SOLUTIONS FOR THE SAME[THIS IS JUST FOR UNDERSTANDING]

Bitmap Index:
Bitmap indexes are used by DBMSs to accelerate decision support queries. The main advantage of the bitmap index is that complex logical selection operations can be done very quickly by minimizing the search space before low cost Boolean operations like OR, AND, and NOT, before moving on to primary source data.

A Bitmap index: consists of one bitmap for each value that an attribute can take.

We will illustrate our discussion with a simple example about stock trading. Let us see the following table on stocks traded at different stock exchanges and consider the last column (Exchange).
Example:


We notice that stocks are traded at two different stock exchanges at NASDAQ and at NYSE. Indeed, the attribute Exchange has two distinct attribute values. In addition, we see that our stock examples include 12 different stocks, which are uniquely identified by their record ID given in the first column.
Simple bitmap for Stocks and their corresponding trading values:

NASDAQ: (1 0 0 0 0 1 0 0 0 1 1 1)
NYSE: (0 1 1 1 1 0 1 1 1 0 0 0)

We need two bitmaps because we have two distinct attribute values for the stock exchange. Each bitmap consists of 12 bit values since our example comprises 12 different stocks.

For instance, the last bit of the bitmap of NYSE is set to 0 because the last stock is not traded at NYSE. However, the second stock is traded at NYSE and, thus, the bit 2 is set to 1. In general, a bit is set to 1 if the stock is traded at the particular stock exchange, and it is set to 0 otherwise. Therefore, we have a straightforward way of describing the stock exchange by means of bitmaps.