October 23, 2018

Data Mining - MCQS



Question

Which of the following activities is NOT a data mining task?
Select one:
a. Monitoring the heart rate of a patient for abnormalities
b. Monitoring and predicting failures in a hydropower plant
c. Predicting the future stock price of a company using historical records
d. Extracting the frequencies of a sound wave
The correct answer is: Extracting the frequencies of a sound wave

Question
Which of the following is not a data mining task?

Select one:
a. Feature Subset Detection
b. Association Rule Discovery
c. Regression
d. Sequential Pattern Discovery

The correct answer is: Feature Subset Detection
Question
Value set {poor, average, good, excellent} is an example of
Select one:
a. Nominal attribute
b. Numeric attribute
c. Continuous attribute
d. Ordinal attribute
The correct answer is: Ordinal attribute

Question
Which data mining task can be used for predicting wind velocities as a function of temperature, humidity, air pressure, etc.?

Select one:
a. Cluster Analysis
b. Regression
c. Clasification
d. Sequential pattern discovery

The correct answer is: Regression

Question
Identify the example of sequence data

Select one:
a. weather forecast
b. data matrix
c. market basket data
d. genomic data

The correct answer is: genomic data

Question
In a data mining task where it is not clear what type of patterns could be interesting, the data mining system should
Select one:
a. handle different granularities of data and patterns
b. perform all possible data mining tasks
c. allow interaction with the user to guide the mining process 
d. perform both descriptive and predictive tasks
The correct answer is: allow interaction with the user to guide the mining process
Question 
Removing duplicate records is a data mining process called________
Select one:
a. data isolation
b. recovery
c. data pruning
d. data cleaning 
The correct answer is: data cleaning
Question 
Various visualization techniques are used in ___________ step of KDD
Select one:
a. selection
b. interpretation 
c. transformation
d. data mining
The correct answer is: interpretation
Question 
Which of the following is not a Visualization Method?
Select one:
a. Hierarchical visualization technique
b. Tuple based visualization Technique
c. Icon based visualization techniques
d. Pixel oriented visualization technique 
The correct answer is: Tuple based visualization Technique

Question
Data set {brown, black, blue, green , red} is example of

Select one:
a. Continuous attribute
b. Ordinal attribute
c. Numeric attribute
d. Nominal attribute

The correct answer is: Nominal attribute

Question
Which of the following is NOT a data quality related issue?

Select one:
a. Attribute value range
b. Outlier records
c. Missing values
d. Duplicate records

The correct answer is: Attribute value range

Question
To detect fraudulent usage of credit cards, the following data mining task should be used

Select one:
a. Outlier analysis
b. prediction
c. association analysis
d. feature selection

The correct answer is: Outlier analysis
Question 
Which of the following is NOT example of ordinal attributes?
Select one:
a. Ordered numbers
b. Military ranks
c. Zip codes
d. Movie ratings 
The correct answer is: Zip codes
Question 
Which of the following is not a data pre-processing methods
Select one:
a. Data Cleaning
b. Data Visualization 
c. Data Discretization
d. Data Reduction
The correct answer is: Data Visualization
Question 
Incorrect or invalid data is known as _________
Select one:
a. Outlier
b. Missing data
c. Changing data
d. Noisy data 
The correct answer is: Noisy data

Question
Which of the following is an Entity identification problem?

Select one:
a. One person with different email address
b. One person's name written in different way
c. Title for person
d. One person with multiple phone numbers

The correct answer is: One person's name written in different way
Question 
Data Visualization in mining cannot be done using
Select one:
a. Graphs
b. Information Graphics
c. Charts
d. Photos 
The correct answer is: Photos
Question
Nominal and ordinal attributes can be collectively referred to as_________ attributes
Select one:
a. perfect
b. consistent
c. qualitative
d. optimized
The correct answer is: qualitative
Question 
The number of item sets of cardinality 4 from the items lists {A, B, C, D, E}
Select one:
a. 20
b. 2 
c. 10
d. 5
The correct answer is: 5
Question 
Identify the example of Nominal attribute
Select one:
a. Salary
b. Temperature 
c. Gender
d. Mass
The correct answer is: Gender
Question 
Which of the following are descriptive data mining activities?
Select one:
a. Clustering
b. Deviation detection 
c. Regression
d. Classification
The correct answer is: Clustering

Question 
Which statement is not TRUE regarding a data mining task?
Select one:
a. Deviation detection is a predictive data mining task
b. Classification is a predictive data mining task 
c. Clustering is a descriptive data mining task
d. Regression is a descriptive data mining task
The correct answer is: Regression is a descriptive data mining task
Question 
Correlation analysis is used for
Select one:
a. identifying redundant attributes
b. eliminating noise
c. handling missing values
d. handling different data formats 
The correct answer is: identifying redundant attributes
Question 
In Binning, we first sort data and partition into (equal-frequency) bins and then which of the following is not a valid step
Select one:
a. smooth by bin boundaries
b. smooth by bin median 
c. smooth by bin values
d. smooth by bin means
The correct answer is: smooth by bin values

Question 
Which of the following is NOT data mining efficiency/scalability issue?
Select one:
a. The running time of a data mining algorithm
b. Incremental execution
c. Data partitioning 
d. Easy to use user interface
The correct answer is: Easy to use user interface
Question 
Synonym for data mining is
Select one:
a. Data Warehouse
b. Knowledge discovery in database
c. Business intelligence
d. OLAP 
The correct answer is: Knowledge discovery in database
Question 
Data scrubbing can be defined as
Select one:
a. Check field overloading
b. Delete redundant tuples
c. Use simple domain knowledge (e.g., postal code, spell-check) to detect errors and make ions
d. Analyzing data to discover rules and relationship to detect violators 
The correct answer is: Use simple domain knowledge (e.g., postal code, spell-check) to detect errors and make ions
Question 
Dimensionality reduction reduces the data set size by removing _________
Select one:
a. irrelevant attributes 
b. composite attributes
c. derived attributes
d. relevant attributes
The correct answer is: irrelevant attributes
Question 
In asymmetric attibute
Select one:
a. Range of values is important
b. No value is considered important over other values
c. Only non-zero value is important
d. All values are equals 
The correct answer is: Only non-zero value is important
Question 
Which of the following is not a data mining task?
Select one:
a. Feature Subset Detection 
b. Regression
c. Sequential Pattern Discovery
d. Association Rule Discovery
The correct answer is: Feature Subset Detection
Question 
Which of the following is NOT an example of data quality related issue?
Select one:
a. Using a field for different purposes 
b. Contradicting values
c. Noise
d. Multiple date formats 
The correct answer is: Multiple date formats
Question 
Similarity is a numerical measure whose value is
Select one:
a. Higher when objects are more alike
b. Lower when objects are more alike
c. Increases with Minkowski distance
d. Higher when objects are not alike 
The correct answer is: Higher when objects are more alike
Question 
The dissimilarity between two data objects is
Select one:
a. Lower when objects are more alike
b. Higher when objects are more alike
c. Lower when objects are  not alike
d. Applies only categorical attributes 
The correct answer is: Lower when objects are more alike
Question 
The important characteristics of structured data are

Select one:
a. Resolution, Distribution, Dimensionality ,Objects
b. Sparsity, Centroid, Distribution , Dimensionality
c. Dimensionality, Sparsity, Resolution, Distribution 
d. Sparsity, Resolution, Distribution, Tuples

The correct answer is: Dimensionality, Sparsity, Resolution, Distribution

Question 
Which of the following statement is not TRUE for a Tag Cloud

Select one:
a. Tag cloud is a visualization of statistics of user-generated tags 
b. Tag cloud can be used for numeric data only
c. The importance of a tag is indicated by font size or color
d. Tags may be listed alphabetically in a tag cloud

The correct answer is: Tag cloud can be used for numeric data only

Question 
Which of the following data mining task is known as Market Basket Analysis?

Select one:
a. Clasification
b. Regression
c. Association Analysis 
d. Outlier Analysis

The correct answer is: Association Analysis

Question 
Which of the following is not a Data discretization Method?

Select one:
a. Histogram analysis
b. Cluster Analysis
c. Data compression
d. Binning

The correct answer is: Data compression

Question 
Which of the following activities is a data mining task?

Select one:
a. Monitoring the heart rate of a patient for abnormalities
b. Dividing the customers of a company according to their profitability 
c. Extracting the frequencies of a sound wave
d. Predicting the outcomes of tossing a (fair) pair of dice

The correct answer is: Monitoring the heart rate of a patient for abnormalities

Question 
Sorted data (attribute values ) for price are: 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34. Identify which is NOT a bin smoothed by boundaries?

Select one:
a. Bin 2: 21, 21, 25, 25
b. Bin 1: 4, 4, 4, 15 
c. Bin 1: 4, 4, 15, 15
d. Bin 3: 26, 26, 26, 34

The correct answer is: Bin 1: 4, 4, 15, 15

Question
The difference between supervised learning and unsupervised learning is given by

Select one:
a. unlike unsupervised learning, supervised learning needs labeled data
b. unlike unsupervised learning, supervised learning can be used to detect outliers
c. unlike supervised leaning, unsupervised learning can form new classes
d. there is no difference

The Correct answer is: unlike unsupervised learning, supervised learning needs labeled data

Question 
The Data Sets are made up of

Select one:
a. Data Objects
b. Attributes 
c. Dimensions
d. Database

The correct answer is: Data Objects

Design and Analysis of Algorithms - MCQS

Question

If one was to apply Master theorem to recurrence equation T(n)=3.T(n/2)+n^2, what would be the values of a and b?

Select one:
a. a=3,b=3
b. a=3,b=2
c. A=2,b=2
d. a=2,b=3

The correct answer is: a=3,b=2

Question 
Time complexity of knapsack 0/1

where n is the number of items and W is the capacity of knapsack.

Select one:
a. O(nW)
b. O(n)
c. O(W)

The correct answer is: O(nW)

Question 
The complexity of searching an element from a set of n elements using Binary search algorithm is

Select one:
a. O(n log n)
b. O(n2)
c. O(n)
d. O(log n)

The correct answer is: O(log n)

Question 
Time complexity of matrix chain multiplication

Select one:
a. O(n2)
b. O(nlogn)
c. O(n)
d. O(n3)

The correct answer is: O(n3)

Question 
Time complexity of LCS

Select one:
a. O(n!)
b. O(m!)
c. O(mn)

The correct answer is: O(mn)

Question 
Apply Master theorem to T(n)=3.T(n/2)+n^2 and write what is f(n)

Select one:
a. f(n)=n/2
b. f(n)=n/2+n^2
c. f(n)=n^2
d. f(n)=3n/2

The correct answer is: f(n)=n^2

Question
Data Structure used for the Merge Sort

Select one:
a. Two Pointers and an Extra Array
b. Two Pointers
c. 2N/2 pointers and N/2 Extra Arrays
d. Two pointers and N Extra Arrays

The correct answer is: Two Pointers and an Extra Array

Question 
Steps of Divide and Conquer approach

Select one:
a. Combine, Divide and Conquer
b. Divide, Combine and Conquer
c. Combine, Conquer and Divide
d. Divide, Conquer and Combine

The correct answer is: Divide, Conquer and Combine

Question 
Which of the following sorting algorithms does not have a worst case running time of O(n2) ?

Select one:
a. Quick sort
b. Bubble sort
c. Insertion sort
d. Merge sort

The correct answer is: Merge sort

Question 
Data Structure used for the Merge Sort

Select one:
a. Two pointers and N Extra Arrays
b. Two Pointers and an Extra Array
c. Two Pointers
d. 2N/2 pointers and N/2 Extra Arrays

The correct answer is: Two Pointers and an Extra Array

Question
RANDOMIZE-IN-PLACE(A)
n=A.length
For i=1 to n
Swap A[i] with A[RANDOM(I,n.]
The above procedure RANDOMIZE-IN-PLACE(A) computes

Select one:
a. a different deliberate permutation
b. a uniform deliberate permutation
c. a uniform random permutation
d. a different random permutation

The correct answer is: a uniform random permutation

Question 
Time complexities of three algorithms are given. Which should execute the slowest for large values of N?

Select one:
a. O(N ½)
b. O(N)
c. O(log n)

The correct answer is: O(N)

Question 
If length of the rod is 8 and the values of different pieces are given as following, then the maximum obtainable value is 22 (by cutting in two pieces of lengths 2 and 6.
length  | 1   2   3   4   5   6   7   8
--------------------------------------------
price    | 1   5   8   9  10  17  17  20
What is the worst case running time for the above problem

Select one:
a. O(log n)
b. O(n2)
c. O(n log n)
d. O(2n)

The correct answer is: O(n2)

Question 

71. PERMUTE-BY-SHORTING(A)
n=A.length
Let P[1…n] be a new array
For i=1 to n
    P[i]=RANDOM(1,n3)
Sort A, using P as sort keys
The time complexity of above algorithm is

Select one:
a. T(n ln n)
b. T(n)
c. T(n3) In
d. T(n2)

The correct answer is: T(n ln n)

Question 
 ______ is a condition that is always true at a particular point in an algorithm.

Select one:
a. assertion
b. exception
c. constant
d. invariant

The correct answer is: invariant

Question 
RANDOMIZE-IN-PLACE(A)
n=A.length
For i=1 to n
Swap A[i] with A[RANDOM(I,n.]
The time complexity of above algorithm is

Select one:
a. O(n)
b. O(n2)
c. O(n ln n)
d. O(n3)

The correct answer is: O(n)

Question 
Merge Sort divides the list in

Select one:
a. Two parts, may not be equal
b. N parts, may not be equal
c. N equal parts
d. Two equal parts

The correct answer is: Two equal parts

Question 
Time Complexity of Optimal binary search tree.

Select one:
a. O(logn)
b. O(n!) In
c. O(n)

The correct answer is: O(logn)

Question 
Division Pattern of Problems in Divide and Conquer approach

Select one:
a. Random
b. Parallel
c. Recursive
d. Iterative

The correct answer is: Recursive

Question
In dynamic programming, the output to stage n become the input to

Select one:
a. stage n itself
b. stage n-1
c. stage n-2
d. stage n+1 In

The correct answer is: stage n-1

Question 
RANDOMIZE-IN-PLACE(A)
n=A.length
For i=1 to n
Swap A[i] with A[RANDOM(I,n)]
The above procedure RANDOMIZE-IN-PLACE(A) permutation occurs with probability

Select one:
a. Probability n
b. Probability n2
c. Probability n!
d. Probability 1/n!

The correct answer is: Probability 1/n!

Question 
Which case of Master’s theorem is applicable in the recurrence relation T(n)=0.5*T(n/2)+1/n?

Select one:
a. Case 3
b. Case 2
c. Case 1
d. Master’s theorem is not applicable

The correct answer is: Master’s theorem is not applicable

Question 
In the development of dynamic programming the value of an optimal solution is computed in

Select one:
a. Top up fashion
b. Bottom up fashion
c. In any way

The correct answer is: Bottom up fashion

Question 
Run Time of Merge Sort is

Select one:
a. BIG O of N log N
b. Theta of N log N
c. Gamma of n log N
d. Omega of N log N

The correct answer is: Theta of N log N

Question 
The number of operations in Matrix multiplications M1, M2, M3, M4 and M5 of sizes 5X10, 10X100, 100X2, 2X20 and 20X50

Select one:
a. 4600
b. 12890
c. 6900
d. 5830

The correct answer is: 4600

Question 
Which of the given options provides the increasing order of asymptotic complexity of functions f1, f2, f3 and f4?
  f1(n) = 2^n
  f2(n) = n^(3/2)
  f3(n) = nLogn
  f4(n) = n^(Logn)

Select one:
a. f2, f3, f1, f4
b. f3, f2, f1, f4
c. f3, f2, f4, f1
d. f2, f3, f4, f1

The correct answer is: f3, f2, f4, f1

Question 
The running time of quick sort depends on the selection of.

Select one:
a. Number of input
b. Arrangements of the elements
c. Selection of pivot elements
d. Number of passes

The correct answer is: Selection of pivot elements

Question 
RANDOMIZED-HIRE – ASSISTANT (n)
Randomly permute the list of candidates
Best=0
For i=1 to n
    interview candidate i
    If candidate I is better than candidate best
        best=i
        hire candidates i
    The expected hiring cost of the procedure is.

Select one:
a. O( n2)
b. O(n log n)
c. O(ln n)
d. O( n)

The correct answer is: O(ln n)

Question 
In dynamic programming, the output to stage n become the input to

Select one:
a. Objective function
b. Optimum solution
c. Feasible solution
d. Decision stages

The correct answer is: Decision stages

Question 
Master theorem applies to recurrences of the form (a=1 and b>1) are two constants.

Select one:
a. T(n)=n.T(n/2)+b.f(n)
b. T(n)=n.T(n-3)+b
c. T(n)=a.T(n-1)+b
d. T(n)=a.T(n/b)+f(n)

The correct answer is: T(n)=a.T(n/b)+f(n)

Question 
A sort which relatively passes through a list to exchange the first element with any elementless than it and then repeats with a new first element is called________.

Select one:
a. Insertion sort
b. Bubble sort
c. heap sort
d. Quick sort

The correct answer is: Insertion sort

August 14, 2018

Important QnA on Database Management Systems (DBMS) - Comprehensive


Question) Draw an ER diagram that captures the following information:
The Prescriptions-R-X chain of pharmacies has offered to give you a free lifetime supply of medicine if you design its database. Assume the rising cost of health care, you approve. Here’s the information that you gather:
Patients are recognized by an SSN, and their names, addresses, and ages must be logged.
Doctors are known by an SSN. For individual doctor, the name, specialty and field, and years of experience need to be logged.Each pharmaceutical company is recognized by name and has a phone number.
For each drug, the trade name and formula must be noted. Each drug is traded by a certain  pharmaceutical company, and the trade name identifies a drug distinctively from amongst the products of that company. If a pharmaceutical company is removed, you must not keep track of its products any longer.
Each pharmacy has a name, address, and phone number.Every patient has a primary physician. Each doctor has at least one patient.Each pharmacy vends several drugs and has a certain price for each. A drug might be sold at numerous pharmacies, and the value could differ from one pharmacy to other.
Doctors prescribe drugs for patients. A doctor can prescribe one or additional drugs for quite a few patients, and a patient can acquire prescriptions from numerous doctors. Each prescription has a date and a quantity associated with it. You can accept that, if a doctor recommends the identical drug for the same patient more than once, only the last such medicament needs to be kept.
Pharmaceutical companies have long-term contracts with pharmacies. A pharmaceutical company can bond with quite a lot of pharmacies, and a pharmacy can deal with a number of pharmaceutical companies. For every contract/bond or deal, you have to collect  a start date, an end date, and the text of the contract.
Pharmacies appoint a supervisor for each contract. There should constantly be a supervisor for each contract, but the contract supervisor or the managing authority may change over the period of the contract.
Answer)
Key points for ER diagram:
  • Entities correctly identified.
  • Attributes correctly identified.
  • Primary keys correctly identified.
  • Relationships and cardinality correctly identified.



Question)A Consider the following relational schema and translate the following SQL-query into an expression of the relational algebra.
• Student (snum, sname, major, level, age) 
• Class (name, meets at, room, fid) 
• Enrolled (snum, cname) 
• Faculty (fid, fname, deptid) 
a) 
                      SELECT S.sname 
                      FROM Student S 
                      WHERE S.snum 
                      NOT IN (
                      SELECT E.snum 
                      FROM 
                      Enrolled E)
Answer)


b)   
                              SELECT C.name 
                              FROM Class C 
                              WHERE C.room = ’R128’ OR C.name IN 
                      (SELECT E.cname 
                      FROM Enrolled E 
                     GROUP BY E.cname 
                     HAVING COUNT(*) >= 5)
Answer)


Question) 
EmpMaster (eid: number, ename: varchar, sal: number, age: number, deptid: number)   
Dept(deptid: number, budgetAmt: number, floor: number, mgreid: number)
Salaries range from Rs.10,000 to Rs.100,000, ages vary from 20 to 80, each department has about five employees on average, there are 10 floors, and budgets vary from RS.10,000 to Rs 100,000. You can assume uniform distribution of values.
Query: Print ename, age, and sal for all employees.
Query: Find the deptids of departments that are on the 10th floor and have a budget of less than Rs25,000     
a) For each of the following queries, which index would you choose to speed up the query? 
b)If your database system does not consider index-only plans (i.e.,data records are always retrieved even if enough information is available in the index entry), how would your answer change? 
Explain briefly.

Answer)
a)We can build an unclustered hash index on ename, age, sal fields of EmpMaster as then we could do “On index only” test. If our system does not comprise of” index only “plans then we should not make index for this query. Subsequently this query needs us to access all the Emp records, an index won’t help us, and so we can access the records using a filescan.

b)For the second scenario we can create a clustered dense B+tree index on floor, budget fields of Dept, since the records would be well-ordered on these fields then. So when performing this query, the first record with floor=10 must be fetched, and then the other records with floor =10 can be read in ascending order of budget. This plan, which is the best for this query, is not an index-only plan.

Question) Modern disk drives store more sectors on the outer tracks than the inner tracks.
As the rotation speed is persistent, the sequential data transferal rate is also greater on the outer tracks.The seek time and rotational delay are unchanged. Given this information, explain good strategies for placing files with the following kinds of access patterns:
a. Frequent, random accesses to a small file (e.g., catalog relations).
b. Sequential scans of a large file (e.g., selection from a relation with no index).
c. Random accesses to a large file via an index (e.g., selection from a relation via the index).
d. Sequential scans of a small file.

Answer)
a.Place the file in the middle tracks. Sequential speed is not an issue due to the small size of the file, and the seek time is minimized by placing files in the center.
b. Place the file in the outer tracks. Sequential speed is most important and outer tracks maximize it. 
c.   Place the file and index on the inner tracks. The DBMS will alternately access pages of the index and of the file, and so the two should reside in close proximity to reduce seek times. When we place the file and the index on the inner tracks we save valuable space as well on the faster (outer) tracks for additional files that are retrieved successively.
d. Place small files in the inner half of the disk. A scan of a minor  file is effectually random I/O because the cost is conquered by the cost of the first initial seek to the commencement of the file.

Question) What are checkpoints and why are they important? List the actions taken by the recovery manager during checkpoints? 

Answer) Checkpoint is a type of entry in the log. A checkpoint record is printed into the log from time to time at that point when the system prints out to the database on disk all DBMS buffers that have been reformed. As a result of this, all transactions that have their [commit, T] entries in the log before a checkpoint entry do not need to have their WRITE operations redone in case of a system crash, since all their updates will be recorded in the database on disk during check pointing. Actions taken by the recovery manager during Checkpoint The recovery manager of a DBMS must decide at what intervals to take a checkpoint. The interval may be measured in time say, every m minutes or in the number t of committed transactions since the last checkpoint, where the values of m or t are system parameters.  
Checkpoints taken by recovery manager consists of the following actions:
 1. Suspend execution of transactions temporarily. 
2. Force-write all main memory buffers that have been modified to disk..
3. Write a [checkpoint] record to the log, and force-write the log to disk. 
4. Resume executing transactions. As a consequence of Step 2, a checkpoint record in the log may also include additional information, such as a list of active transaction ids, and the locations (addresses) of the first and most recent (last) records in the log for each active transaction. This can facilitate undoing transaction operations in the event that a transaction must be rolled back.

Question) Consider an empty B+-tree with n=4, insert the following record keys in the specified order: 
80, 25, 69, 14, 58, 3, 47, 91, 36, 81. 
Answer)



Question)  Consider the following partial Schedule S involving two transactions T1 and T2. Only the read and the write operations have been shown. The read action on data item P is signified by read (P) and the write action on data item P is signified by write (P).  What would the result when the transaction T1 fails instantly after time instance 9.
Answer)
Schedule S is non-recoverable and cannot ensure transaction atomicity because T2 reads value of 'A' which is written by T1 and T2 is committed before T1. 

Question) Consider two transactions:
                               T1: BEGIN A=A+100, B=B-100 END
                               T2: BEGIN A=1.06*A, B=1.06*B END
• 1st TXN transfers $100 from B’s account to A’s
• 2nd TXN credits both accounts with 6% interest.
Assume at first A and B each have $1000. What are the authorized outcomes of running T1 and T2???

Answer) There is no guarantee that T1 will execute before T2 or vice-versa, if both are submitted together. But, the net effect must be equal to these two transactions running consecutively in some random order.
Legal outcomes: A=1166, B=954 or A=1160, B=960

Question) Suppose you are given a relation R with four attributes ABCD. For each of the following sets of FDs, assuming those are the only dependencies that hold for R.                                                 
(a) Identify the candidate key(s) for R. 
(b) Identify the best normal form that R fulfills (1NF, 2NF, 3NF, or BCNF).
(c) If R is not in BCNF, decompose it into a set of BCNF relations that preserve the dependencies. 
 Perform the above tasks for the following set of functional Dependencies:         
1. C → D, C → A, B → C
2. B → C, D → A
3. ABC → D, D → A
4. A → B, BC → D, A → C
5. AB → C, AB → D, C → A, D → B

Answer)
1. (a) Candidate keys: B  
   (b) R is in 2NF but not 3NF.
   (c) C → D and C → A both cause violations of BCNF. One way to obtain a (lossless) join preserving decomposition is to decompose R into AC, BC, and CD.

2. (a) Candidate keys: BD  
    (b) R is in 1NF form and not 2NF.
    (c) Equally B → C and D → A cause BCNF desecration or violation. The decomposition: AD, BC, BD (achieved by first decomposing to AD, BCD) is BCNF and lossless and join-preserving.

3. (a) Candidate keys: ABC, BCD  
    (b) R is in 3NF and not BCNF.
    (c) ABCD is not in BCNF form because D → A and D is not a key for it. However when we split up R as AD, BCD we cannot reserve the dependence ABC → D. Hence there is no BCNF decomposition.

4. (a) Candidate keys: A
    (b) R is in 2NF and not 3NF (as FD: BC → D).
    (c) BC → D violates BCNF since BC does not comprise a key. Therefore we divide up R as in: BCD, ABC.

5. (a) Candidate keys: AB, BC, CD, AD
    (b) R is in 3NF and is not BCNF (as FD: C → A). 
    (c) C → A and D → B both cause violations. Hence we decompose into: AC, BCD but this does not preserve AB → C   and AB → D, and BCD is yet still not BCNF because D → B. Therefore we need to decompose more into: AC, BD, CD. However, when we try to revive the lost functional dependencies by adding ABC and ABD,  these relations are not in BCNF form. Therefore, we can say there is no BCNF decomposition.

August 13, 2018

Important QnA On Operating System


Question )Deadlock prevention algorithms prevent deadlocks by restraining how requests can be made. A different method for avoiding deadlocks is to require additional information about how resources are to be requested.
For example, consider the following snapshot.                

a. How many resources are there of type (A, B, C)?
b. What are the contents of the Need matrix?
c. Is this system currently in a safe state?  Justify your answer.
d. If  a  request  from  P3  reaches  for (1, 0, 0 ),  can  the  request be
      safely granted immediately?
Answer)
a. (3,14,11) resources = total + avail
b. Need = Max - Allocation.
A B C
 P0 0 0 0
 P1 0 7 5
 P2 1 0 0
 P3 0 0 2
c.Yes, because the processes can be implemented and processed in the order P0, P2, P1, P3.
d.

Question )Consider a system in which memory contains the following hole size in memory direction order:
 20K, 10K, 18K, 9K, 12K and 15K.          
Which hole is taken for successive segment request of
a. 6 KB
b. 10 KB
c. 12 KB
d. 9 KB
for the First fit, Best Fit, and Worst Fit.  Clearly show the memory allocation using a diagram. Which algorithm makes the most efficient use of memory?
 Answer)
       First fit: (a) 20K (b) remainder of 20K (c) 18K (d) 10K           
       Best fit: (a) 9K (b) 10K (c) 12K (d) 15K 
       Worst fit: (a) 20K (b) 18K (c) 15K (d) remainder of 20K 

Question) Suppose two processes enter the ready queue with the following properties: Process 1 has a total of 8 units of work to perform, but after every 2 units of work, it must perform 1 unit of I/O (so the minimum completion time of this process is 12 units). Assume that there is no work to be done following the last I/O operation. Process 2 has a total of 20 units of work to perform. This process arrives just behind P1. Show the resultant schedule for the shortest-job-first (preemptive) and the round-robin algorithms. Assume a time slice of 4 units of RR. What is the completion time of each process under each algorithm? 
Answer)
SJF:
Completion time of process p1 = 12 unit
Completion time of process p2 = 28 unit
Round- robin (R R) algorithm:
Completion time of process p1 = 20 unit
Completion time of process p2 = 28 unit

Question) A data of 700GB is to be stored in disks of 200GB each. 
a. How many such disks are required if the following RAID structures are implemented? Justify your answers with explanation and diagrams.
i. RAID  1
ii. RAID  0 
iii. RAID  6 
iv. RAID  5
b.If the system implements RAID 4, how many I/O operations are required for a write operation?
Answer)
a)1) 8 disks     2)    4 disks     3)    4+2=6 disks    4)  4+1= 5 disks
(with explanation and diagrams)

b)5 I/o operations

Question) Consider a system with a 90% hit ratio, 40 nano-seconds times to search the associative registers, 600 nano-seconds times to access memory. Find the time to access a page :       
a. When the page number is in associative memory? 
b. When the time to access a page when not in associative memory?
c. Find the effective memory access time? 
Answer)
(a) The time needed is 40 nano seconds to fetch the page from the associative memory and 600 nano seconds to read the preferred word from memory.
Time = 40+600 = 640 nano seconds

 (b)The time when the page is not in associative memory
Time=40+600+600=1240 nano seconds
One memory access extra is required to read the page table from memory.

 (c)Effective memory access time = page number in associative memory + page number, not in associative memory.
Page number in associative memory = 0.9x640
Page number not in associative memory=0.1x1240
Effective memory access time = 0.9 x 640 + 0.1 x 1240 = 700 nano seconds

Question) Consider a physical memory of size 264 KB which is managed by buddies’ scheme.  Suppose a set of five processes namely P1, P2, P3, P4 and P5demand memory requirement of size 2028KB, 112KB, 288 KB, 598KB and 25KB respectively. What would be the size of the memory blocks allocated to these four programs? Is it a fair allocation? Justify.
Answer)
As Buddy system allocates contiguous blocks of fixed size memory this is the power factor of 2. Therefore, if a process request memory which doesn’t fit into the appropriately sized units, the next highest power of 2 sizes of memory is allocated. In our case, allocated block size are as follows,
Process P1 which request 2028KB, the nearest power of 2 is 2048KB (211KB) which could satisfy the request. Therefore, the buddy scheme allocates a contiguous memory block of size 211 KB. Similarly, the process P2 gets memory block of size 27  KB, process P3 gets memory block of size 29 KB and process p4 gets memory block of size 210 KB and P5 gets 25.
Justification: The serious drawback of this approach is that it is extremely inefficient in terms of memory utilization. The reason is that this scheme services all the requests by always allocating memory which must be rounded up to a power of 2. This results in the substantial amount of internal fragmentation, since, on an average, it allocates almost double the size of requested memory which is simply a waste on the part of memory usage that often summarily leads to nearly half of the memory remained unutilized. However, the distinct advantage of this scheme is, that deallocation is quite fast, and subsequent coalescing of released memory to form a larger free contiguous memory block for subsequent use can be carried out at much faster speed than its other competitors. 

Question) Linux’s security model is closely related to typical UNIX security mechanisms. Linux performs access control by assigning objects a protection mask that specifies which access modes are to be granted to processes. Give any four examples of file protection modes with allowed file accesses.
Answer) Any four with an explanation 
Question) Consider a logical address space of eight pages of 1024 bytes each, mapped onto a physical memory of 32 frames.   
a. How many bits are there in the logical address? Identify in the logical address the bits used as an offset and the bits used as the page number.
b. How many bits are there in the physical address? Identify in the physical address the bits used as an offset and the bits used as the frame number.
Answer)
(a) Logical address: 13 bits (8 pages X 1024 bytes/page = 8196 bytes – requires 213 addresses). Bits for offset: Bits 0 to 9 (10 bits for offset into 1024, 210 byte page), 3 bits identify page number (23= 8 pages)   
(b) Physical address: 15 bits (32 frames X 1024 byes/frame = 32768 bytes, requires 215 addresses). Bits for offset: Bits 0 to 9 (10 bits for offset into 1024, 210 byte frame), 5 bits identify frame number (25= 32 frames)   

Question) What are the differences between user-level threads and kernel level threads?
Answer)
 (1) User-level threads are unidentified by the kernel, whereas the kernel is conscious of kernel threads. 
(2) On systems with either M:1 or M:N mapping, user threads are organized and planned by the thread library and the kernel plans kernel threads. 
(3) Kernel threads may not be associated with a process, however, every single user thread fits a specific process. Kernel threads are usually more costly to be maintained than user threads as they need to be denoted with a kernel data structure. 

Question) Describe the actions taken by a kernel to context-switch between kernel-level threads.
Answer)
Context switching between kernel threads typically requires saving the value of the CPU registers from the thread being switched out and restoring the CPU registers of the new thread being scheduled

Question) In demand paging, a victim frame is selected using the page replacement algorithm, if there is no free frame in the memory. In general, the page replacement algorithm with the lowest page fault rate is selected as the scheme. Consider the following page-reference string in a 3-frames system 
1  2  3  2  1  5  2  1  6  2  5  6  3 1  3  6  1  2   4   3         
a. How many page faults would occur with FIFO page replacement? 
b.How many page faults would occur with LRU page replacement?                  
c.How many page faults would occur with OPT page replacement?  
d.Check for Belady’s anomaly considering 4 frames for FIFO.       
Answer)
               FIFO -14 9 Page faults       
       LRU – 11 9 Page faults     
       OPTIMAL – 9 Page faults
               Checking for Belady’s anomaly

Question) Given the following queue 95, 180, 34, 119, 11, 123, 62, 64 with the Read-Write head initially at the track 50 and the tail track being at 199. Apply the following algorithms and find the total head movements. Which algorithm results in lesser head movement so as to achieve a faster seek time? 
a.First Come-First Serve (FCFS)
b.Shortest Seek Time First (SSTF)
c.Elevator (SCAN) 
d.Circular SCAN (C-SCAN)
e.C-LOOK 
Answer)
(a)FCFS - For this case, it went from 50 to 95 to 180 and so on. From 50 to 95 it moved 45 tracks. In this example, it had a total head movement of 640 tracks. 

(b)SSTF- In this case request is serviced according to next shortest distance. Starting at 50, the following shortest distance would be 62 and not 34 since it is merely 12 tracks far from 62 and 16 tracks far from 34. The process would continue until all the process is taken care of. For instance, the subsequent case would be to move from 62 to 64 and not 34 since there are only 2 tracks among them and not even 18 if it were to go the other way around. 
50, 62, 64, 34, 11, 95, 119, 123, 180
Total Head movements - 236 tracks

(c)SCAN - This approach works like an elevator does. It tests down to the closest end and then when it hits the bottommost it tests up examining the requests that it didn't get while going down. If a request arises in when it is scanned it will not be serviced up until the process originates back down or moves back up.
50, 34, 11, 0, 62, 64, 95, 119, 123, 180 
Total Head movements - 230 tracks

(d)C-SCAN - Circular scanning works just like the elevator to some extent. It begins its scan toward the closest end and works it all the way to the final end of the system. Once it hits the bottom or top it jumps to the other end and moves in the same direction. Keep in mind that the huge jump doesn't count as a head movement. 
The total head movement for this algorithm -187 track
50, 34, 11, 0, 180, 123, 119, 95, 64, 62 

(e)C-LOOK - This is just an enhanced version of C-SCAN. In this, the scanning doesn't go past the last request in the direction that it is moving. It too jumps to the other end but not all the way to the end. Just to the furthest request. 
C-LOOK total head movements - 157 tracks.
50, 34, 11, 180, 123, 119, 95, 64, 62 

August 04, 2018

OOAD - Case Studies

Case Studies for Exam Preparation:

Question.
Give two examples of information about a problem domain that can be captured in UML Activity Diagrams, and two ways in which these diagrams can be useful for Requirements Analysis.

Answer:
Examples of info represented:
  1. Activity diagrams show how tasks depend on one another in a business process. For example, they can show that some tasks can be carried out in parallel, while other tasks have to be synchronized so that one doesn’t start until the tasks on which it depends are complete.
  2. Activity diagrams show the flow of activity between different business units, using swimlanes. In other words, they show how different business units must work together to carry out some process.
Activity diagrams are useful in requirements analysis:
  1. To understand the existing business processes, showing how an organization currently solves the problem.
  2. For sketching proposed solutions to the stakeholders to show how a proposed solution would change the way in which the organization works.
Question.
Draw a UML Class Diagram representing the following elements from the problem domain for a hockey league. A hockey league is made up of at least four hockey teams.  Each hockey team is composed of six to twelve players, and one player captains the team. A team has a name and a record.  Players have a number and a position.  Hockey teams play games against each other.  Each game has a score and a location.  Teams are sometimes lead by a coach. A coach has a level of accreditation and a number of years of experience, and can coach multiple teams.  Coaches and players are people, and people have names and addresses. Draw a class diagram for this information, and be sure to label all associations with appropriate multiplicities.

Answer:
http://www.waseian.com/2018/08/object-oriented-analysis-and-design_4.html

Notes: Captain could alternatively be represented as a second, named association between player and team. Assumptions: each player only plays on one team, each captain only captains one team, each team only plays in one league.

Question.
Do you know that it costs a lot of money to get a ’Certified Java Programmer’ certificate? It could cost you thousands of euros. Let’s say we will create a browser-based training scheme to support people to get ready for a certification exam. A user can appeal a quiz for the system. The system chooses a collection of questions from its database, and constitutes them together to create a quiz. It charges the user’s answers, and provides hints if the customer desires it.
Additionally, we also have professors who deliver questions and tip-offs. And similarly examinators who must declare problems to make sure they are not also trivial, and are sensical. Draw a use case diagram to create this classic system. Work out more or less of your use cases. As we don’t have real stake holders now, you are allowed to fill in facts you think is sensical for this case.


Answer:
http://www.waseian.com/2018/08/object-oriented-analysis-and-design_4.html

We’ll assume multiple choice quiz.
Use case: Make quiz.
Primary actor: User
Secondary actors: Pre-condition: The system has at least 10 questions.
Post-condition:
Main flow:
1. The use-case is activated when the user requests it.
2. The user specifies the difficulty level.
3. The system chooses 10 questions, and proposes them as a quiz to the user.
4. The system starts a regulated timer.
5. For every question: 5a. The user selects an answer, or skip. [Extension point]
6. If the user has done the quiz, or the timer turns out, the quiz is finished.
Use case: Provide hint
Primary actor: User
Secondary actors: Pre-condition: The user requests for a hint. Post-condition:
Main flow:
1. The system provides a hint. The verbosity of the hint is determined by the difficulty level set previously by the user.
2. Return to to Make quiz’ main flow.

Question.
Suppose we want to develop software for an alarm clock. The clock shows the time of day. By means of buttons, the user can customize the hours and minutes fields independently, and choose among 12 and 24-hour display. It is possible to set one or two alarms. When an alarm fires, it will sound some noise. The user can turn it off, or choose to ’snooze’. If the user does not answer back at all, the alarm will turn off itself later after 2 minutes. ’Snoozing’ means to turn off the sound, but the alarm will ring again after few minutes of delay. This ’snoozing time’ can be pre-adjusted.
Identify the top-level functional requirement for the clock, and model it with a use case diagram.

Answer:
http://www.waseian.com/2018/08/object-oriented-analysis-and-design_4.html
Use case: Snooze.
Primary actor: User
Secondary actors: Pre-condition: An alarm is firing.
Post-condition: Main flow:
1. The use-case is activated when the user hits the snooze button.
2. The alarm is turned off.
3. Wait for snooze time.
4. Include use case ’Make sound’

Use case: Make sound
Primary actor: System
Secondary actors: Pre-condition:
Post-condition:
Main flow: The use case starts when it is called.
What it does is to just make some noisy sound.

Question.
Draw an UML state diagram corresponding to the computer keyboard state machine. Also draw an extended state diagram to model the behavior of the keyboard which depends on the number of characters typed on it so far and that after, say, 1,000 keystrokes, the keyboard breaks down and enters the final state.(Assume normal function of the key board of a personal computer which you use regularly)

Answer:
http://www.waseian.com/2018/08/object-oriented-analysis-and-design_4.html

state diagram representing the computer keyboard state machine

Question.
The Gang of Four says that the intent of the Adapter pattern is to "change the interface of a class into another interface that the clients expect. Adapter lets classes work together with each other that could not otherwise as of mismatched interfaces."
What does this imply? Give an example.

Answer:
This says that I have a class that desires to interact with a different class through a assured set of method calls. When the interface of that different class does not offer these method calls, the Adapter creates up a fresh interface to do the interpretation. For instance a reporting application that wishes to pull data from two diverse database systems. My application needs to use a "GetDate" method to attract info from the database, but the database systems don't deliver that through their API. Therefore I write an Adapter that gives the GetDate method in its interface and is accountable for pulling the data appropriately.

Question.
The Façade pattern and the Adapter pattern may seem similar. What is the essential difference between the two? Explain with an example.

Answer:
In both circumstances, there is a pre-existing class or classes that have functionality needed. So for both cases, I build an intermediate object with interfaces that my system needs to use and that has duty for mapping that to the pre-existing class. Equally Façade and Adapter are wrappers. The Adapter is used when our client previously has predefined interfaces that it believes to use and when I want to practice polymorphism.