Showing posts with label Multimedia Computing. Show all posts
Showing posts with label Multimedia Computing. Show all posts

July 18, 2018

Important QnA on Multimedia Computing - Comprehensive


Text Books

T1 Ze-Nian Li & Mark S. Drew, "Fundamentals of Multimedia", Pearson Education, 2004


Question.1 
Explain the difference between Temporal and Frequency masking.

Answer:
Frequency Masking: When an audio signal consists of multiple frequencies the sensitivity of the ear changes with the relative amplitude of the signals. If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard. 

Temporal masking
  • After the ear hears a loud sound, consisting of multiple frequencies, it takes a further short while before it can hear a quieter sound close in frequency.
  • Temporal Masking accomplished 50% overlap between consecutive transform windows and then relating frequency masking.
  • MPEG Audio encrypts frequency masking by quantizing each filter bank with adaptive values from nearby bands, defined by a look up table.
Question. 2
Consider the following set of protocols (SIP, RTSP, RSVP, RTCP, RTP on top of UDP. If you want to design a protocol stack (control and data plane) for Video‐On‐Demand (VOD) service between client and server, (a) which protocols would you use and why, and (b) in which order would you apply your selected protocols? Explain how the protocol stack of selected protocols would be used ?

Answer:
To design the Video‐On‐Demand service,
RTP protocol : for transmission of the video . RTP/UDP enables real‐time transmission. 
RTCP : The accompanying control protocol for RTP is the RTCP that would allow the receiver to provide feedback to the sender if some parameters need to be adjusted during the streaming session. 
RTSP : Since VOD will use commands such as play, stop, pause, fast forward, rewind, so anyone should use the RTSP protocol because it specifies signaling for completing these control tasks between the client and the server.
RSVP: If an IP level also requires the built-in reservation capabilities of VOD traffic, it is important to enable the RSVP protocol to start integrated Guaranteed Services using the RSVP Reservation Protocol.
 
RSVP to reserve bandwidth for the VOD session. 

Once the resources are reserved, VOD traffic should be transmitted via RTP which is on top of UDP/IP. 


Along with this, RTP RTCP and RTSP should provide (1) RTP (Control) to provide control feedback about the status of traffic and receiver, and (2) controlling the channel through signaling (RTSP) control , Stop, and otherwise control the VOD playback.

Question. 3
Compress the string “DEAF!DEFEATED” using LZW algorithm. Find compression ratio by

supposing that originally the characters are represented in 8-bit (B=66, A = 65, ,., Z = 90, Y = 89, , ! = 33).

Answer:
Input: DEAF!DEFEATED

Previous
Current
Output
Code
String

065
A
068
D
069
E
070
F
084
T
033
!
D
E
068
256
DE
E
A
069
257
EA
A
F
065
258
AF
F
!
070
259
F!
!
D
033
260
!D
D
E



DE
F
256
261
DEF
F
E
070
262
FE
E
A



EA
T
257
263
EAT
T
E
084
264
TE
E
D
069
265
ED
D
End Of File
068



The compressed output is 068 069 065 070 033 256 070 257 084 069 068
Compression ratio = 13/11 = 1.18181

Question:
Compress the string “PROPER_PROPERTY” using LZW algorithm by using codeword of 10 bit. Find compression ratio by assuming that originally the characters are represented in 8-bit (A = 65, B=66... Y = 89, Z = 90,  _ = 95).

Answer

Input: PROPER_PROPERTY
                                               
S
C
Output
Code
String

069
E
079
O
080
P
082
R
084
T
089
Y
_
95
P
R
080
256
PR
R
O
082
257
RO
O
P
079
258
OP
P
E
080
259
PE
E
R
069
260
ER
R
_
082
261
R_
_
P
095
262
_P
P
R



PR
O
256
263
PRO
O
P



OP
E
258
264
OPE
E
R



ER
T
260
265
ERT
T
Y
084
266
TY
Y
NULL
089




The compressed output is 080 082 079 080 069 082 095 256 258 260 084 089

May 20, 2018

Multimedia Computing MID Sem Solution


Question.  List three pattern substitution based compression algorithms. For each algorithm, give one application where the method is used with respect
to multimedia data.

Answer:
  • Repetitive Sequence Suppression.
  • Run-length Encoding.
  • Pattern Substitution.
Repetitive Sequence Suppression Example: 1 from Silence suppression in audio, ‘white space’ in text, simple uniform backgrounds in images.

Run-length Encoding : 1 from Computer graphics generated images, Faxes, part of JPEG (latter stage) pipeline.

Pattern Substitution : 1 from Pattern recognition/token substitution, Entropy coding (Huffman), LZW/GIF, vector quantization.

Question. What is the basic concept used in defining an Information Theoretic approach to data compression?

Answer:
The entropy of an information source S, defined as:
,
is the basis Information Theoretic compression algorithms.

Question: What advantages does the arithmetic coding algorithm offer over Huffman coding algorithm with respect to data compression?

Answer:
  • Good compression ratio (better than Huffman coding), entropy around the Shannon Ideal value. – Huffman coding uses an integer number of bits for each symbol,
    • hence k is never less than 1.
    • – Use decimal number of bits 
Disadvantages with the arithmetic coding algorithm: 
  • Memory: potentially large symbol tables needed.
  • Speed due possibly complex computations due to large symbol tables.
Question: Given the following Differential Pulse Code Modulated (DPCM) Sequence reconstruct the original signal.
+4 + 2 + 3 - 2 + 3 - 1 + 1 + 1

Answer
DPCM decoding: Simply start with accumulator zero for each number add the value to current accumulator, output accumulator value.
So solution is:
4 6 9 7 10 9 10 11

Question: Given the following Run Length Encoded (RLE) Sequence reconstruct the
original 2D 8x8 (binary) data array.
(0, 8),
(0, 1), (1, 1), (0, 4), (1, 1), (0, 1),
(0, 1), (1, 2), (0, 2), (1, 2), (0, 1),
(0, 1), (1, 6), (0, 1),
(0, 2), (1, 4), (0, 2),
(0, 3), (1, 2), (0, 3),
(0, 2), (1, 1), (0, 2), (1, 1), (0, 2),
(0, 1), (1, 1), (0, 4), (1, 1), (0, 1)

Answer:
The format of RLE is for each pair (colour, length) so just ‘parse’ each row, to expand colour to length number of values to get the solution:

Question:
Calculate the uncompressed digital output if a video signal is sampled using the following values: 25 frames per second, 160 x 120 pixels, True (Full) color depth.

Answer:
True color = 24 bits (3 bytes) per pixel

So  number of bytes per second is
24*160*120*25 =  11,520,000 bits

Question:
If a suitable CD stereo quality audio signal is included with the video signal in part (c), how long would it take for the signal to be transmitted on a 128 kbps channel?

Answer:
CD quality audio = 22,050 Hz. Sampling rate = 44,100 Hz.
Stereo  audio = 44100*2 * 16 bits  =1,411,200 bits per second

So uncompressed bytes stream is 11,520,000 + 1,411,200 =  12,931,200 bits per second. Channel is 128 kbps. Hence time taken = 101 seconds

Question:
For compressing an image for power point slide, which compression would you prefer-regular JPEG or JPEG 2000? Why?

Answer:
JPEG – 2000 – handles computer generated images (sharp edges etc.) better. Compression ratio is 20% higher than original JPEG

Question: What are the key differences between the JPEG and MPEG I-Frame compression
pipeline?

Answer:
Four main differences:
  • JPEG uses YIQ whilst MPEG use YUV (YCrCb) colour space.
  • MPEG used larger block size DCT windows 16 even 32 as opposed to JPEG’s 8.
  • Different quantization— MPEG usually uses a constant quantizationvalue.
  • Only Discrete Pulse Code Modulation (DPCM) on DC component in JPEG on zig zag scan. AC (JEPG) and complete zig zag scan get RLE.
Question: What processes give rise to the lossy nature of JPEG/MPEG video compression?

Answer:
Lossy steps:
  • Colour space subsampling in IQ or UV components.
  • Quantization reduces bits needed for DCT components. 
Question
(A) In MPEG audio compression, what is frequency masking?

Answer
When an audio signal consists of multiple frequencies the sensitivity of the ear changes with the relative amplitude of the signals. If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard.

(B) Briefly describe the cause of frequency masking in the human auditory system?

Answer:
Frequency Masking:
  • Stereocilia in inner ear get excited as fluid pressure waves flow over them.
  • Stereocilia of different length and tightness on Basilar membrane so resonate in sympathy to different frequencies of fluid waves (banks of stereocilia at each frequency band).
  • Stereocilia already excited by a frequency cannot be further excited by a lower amplitude near frequency wave.
(C) In MPEG audio compression, what is temporal masking?

Answer:
After the ear hears a loud sound, consisting of multiple frequencies, it takes a further short while before it can hear a quieter sound close in frequency.

(D) Briefly describe the cause of temporal masking in the human auditory system?

Answer:
  • (Like frequency masking) Stereocilia in inner ear get excited as fluid pressure waves flow over them and respond to different frequencies.
  • Stereocilia already excited by a certain frequency will take a while to return to rest state, as inner ear is a closed fluid chamber and pressure waves will eventually dampen down.
  • Similar to frequency masking Stereocilia in a ’dampening state’ may not respond to a a lower amplitude near frequency wave.
(E) Briefly describe, using a suitable diagram if necessary, the MPEG-1 audio compression algorithm, outlining how frequency masking and temporal masking are encoded.

Answer:
MPEG audio compression basically works by:
  • Dividing the audio signal up into a set of frequency subbands (Filtering)
  • Use filter banks to achieve this.
  • Subbands approximate critical bands.
  • Each band quantised according to the audibility of quantization noise.
Frequency masking and temporal masking are encoded by:

Frequency Masking: MPEG Audio encodes this by quantising each filter bank with adaptive values from neighbouring bands energy, defined by a look up table.

Temporal Masking: Not so easy to model as frequency masking. MP3 achieves this with a 50% overlap between successive transform windows gives window sizes of 36 or 12 and applies basic frequency masking as above.

Question: Construct coding tree for ROADROADIES by Shannon-Fano Algorithm and calculate the entropy. Get the Compression Ratio also for the same.

Answer:
Shannon-Fano Algorithm: 
  1. Sort the symbols according to the frequency count of their occurrences.
  2. Recursively divide the symbols into two parts, each with approximately the same number of counts, until all the parts contain only 1 symbol.
  • Below are the letters with their occurrence counts.[occurrence counts should be in descending form]



log2(1/pi) is calculated as:
Ex: For symbol R:
pi=Count/total number of counts = 2/11=0.18181
log2(1/0.18181) = 2.459 approx.

Code is calculated as:
Ex: Traversing the tree from top to down.

No. of bits used is calculated as:
Ex: For symbol R:> count*code[number of bits/digits]= 2*1=2
For symbol I:> count*code[number of bits]= 1*5=5

Calculate Entropy:



=2.7312

Hence, 2.7312 number of bits are required to represent each symbol. [It can be asked as, What is the average number of bits needed for each character?]

Compression Ratio:  This can be found as:


Default ASCII encoding: 8 bits per symbol
Result after Shannon Fano Coding: 2.7312 bits per symbol

Compression Ratio = (11*8)/(11*2.7312) = 2.929

Question:
Consider the following uncompressed sequence of character bytes: AAAABBBBCAAAAACCCCCAAAAABBBBBBDDDDDD. Encode this sequence with Run-Length Coding. Also calculate the compression ratio.

Solution:
A4B4CA5C5A5B6D6


Compression ratio = 36/15=2.4

Question: Construct Huffman tree to encode the table below. What is the average number of bits needed for each character?                                                                                           
Symbol
A
B
C
D
E
F
G
Frequency
28
10
20
13
5
15
9

Answer: Huffman CodingHuffman coding is a lossless data compression algorithm. The most frequent character gets the smallest code and the least frequent character gets the largest code.

Steps to build Huffman Tree

Step 1. Build a min heap that contains 7 nodes where each node represents root of a tree with single node.

Symbol    Frequency
    E           5
    G           9
    B           10
    D           13
    F           15
    C           20
    A          28

Step 2. Extract two minimum frequency nodes from min heap. Add a new internal node with frequency 5 + 9 = 14.

Now min heap contains 6 nodes where 5 nodes are roots of trees with single element each, and one heap node is root of tree with 3 elements

B           10
D           13
Node     14
F           15
C           20
A           28

Step 3: Extract two minimum frequency nodes from heap. Add a new internal node with frequency 10 + 13 = 23

Now min heap contains 5 nodes where 3 nodes are roots of trees with single element each, and two heap nodes are root of tree with more than one nodes.

Node     14
F           15
C           20
Node     23
A           28

Step 4: Extract two minimum frequency nodes. Add a new internal node with frequency 14 + 15 = 29

Now min heap contains 4 nodes where 2 nodes are roots of trees with single element each, and two heap nodes are root of tree with more than one nodes.

C           20
Node     23
A           28
Node     29

Step 5: Extract two minimum frequency nodes. Add a new internal node with frequency 20 + 23 = 43

Now min heap contains 3 nodes.

A      28
Node 29
Node 43

Step 6: Extract two minimum frequency nodes. Add a new internal node with frequency 28 + 29 = 57

Now min heap contains 2 nodes.

Node 43
Node 57

Step 7: Extract two minimum frequency nodes. Add a new internal node with frequency 43 + 57 = 100


Now min heap contains only one node.

Node 100

Since the heap contains only one node, the algorithm stops here.

Traverse the tree formed starting from the root. Maintain an auxiliary array. While moving to the left child, write 0 to the array. While moving to the right child, write 1 to the array


The codes are as follows:


Symbol   code
    C          0
    B          10
    D          11
    A          10
    E          1100
    G          1101
    F          111

Average number of bits needed for each character

Entropy





2.6282 average number of bits are required to represent each symbol.

Question:
Suppose eight characters have a distribution A:(1), B:(1), C:(1), D:(2), E:(3), F:(5), G:(5), H:(10). Draw a Huffman tree for this distribution.

Solution:

Question:
Perform run length encoding for the following and calculate compression ratio AC0000AAAAB00000000A0AAA00000.

Solution:
Run length encoding
Before Encoding : AC0000AAAAB00000000A0AAA00000
After Encoding :   1A1C404A1B801A103A50
     
Compression Ratio
 Compression Ratio = Total number of bits before encoding/Total number of bits after encoding

Compression Ratio  = 29/20 = 1.45

Question:
Answer the following

a. Why is the display order and encoding order of MPEG frames different ? Given the display order of frames, find the coding order.  I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) I(11)
b. State the difference between frequency masking and temporal masking in MPEG audio. How are these two encoded in MPEG audio?
c. If the block size for a 2D DCT transform is 8 X8, and we use only the DC components to create a thumbnail image, what fraction of the original pixels would we be using?

Solution:

a.
P and B frames are much more complicated, since they depend on other frames.   So we have to buffer a previous frame and a future frame to decode P and B frames. The name "future frame" indicates that the frame should be displayed after the current frame.  However, it is actually encoded before the current frame. So the display order of frames in a MPEG sequence is different from the decoding order. Coding order : I(1)P(4)B(2)B(3)P(7)B(5)B(6)P(10)B(7)B(8)I(11)

b.
frequency masking:
When an audio signal consists of multiple frequencies the sensitivity of the ear changes with the relative amplitude of the signals. If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard.

temporal masking:

  • After the ear hears a loud sound, consisting of multiple frequencies, it takes a further short while before it can hear a quieter sound close in frequency. MPEG Audio encodes frequency masking by quantizing each filter bank with adaptive values from neighboring bands, defined by a look up table. 
  • Achieved with a 50% overlap between successive transform windows and then applying frequency masking

c. 1/64, because each 8 x8 block only has one DC