January 02, 2019

Data Mining - MCQS 2


Question
This clustering approach initially assumes that each data instance represents a single cluster.

Select one:
a. expectation maximization
b. K-Means clustering
c. agglomerative clustering
d. conceptual clustering

The correct answer is:agglomerative clustering

Question
The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you?

Select one:
a. The attributes are not linearly related.
b. As the value of one attribute decreases the value of the second attribute increases.
c. As the value of one attribute increases the value of the second attribute also increases.
d. The attributes show a linear relationship

The correct answer is: As the value of one attribute decreases the value of the second attribute increases.

Question
Time Complexity of k-means is given by

Select one:
a. O(mn)
b. O(tkn)
c. O(kn)
d. O(t2kn)

The correct answer is: O(tkn)

Question
Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that

Select one:
a. Y is false when X is known to be false.
b. Y is true when X is known to be true.
c. X is true when Y is known to be true
d. X is false when Y is known to be false.

The correct answer is: Y is true when X is known to be true.

November 14, 2018

Cloud Computing in simple terms - Overview of Cloud Computing

Cloud Computing in simple terms:

Welcome to Cloud Computing, The first question in our mind is - What is Cloud computing?

I am pretty sure, you also looking for the same. So we will directly jump into the interesting scenario-based language. We have tons of information from various sources as NIST(National Institute of Standards and Technology), 451 Group, analyst firm like Gartner, Wikipedia etc.

As per our information - It is very large where several services and data you access all over the network. The software and data which you access for your work doesn't exist on your computer and set on the servers. This concept of using services stored on other system called cloud computing.
There was a time when several clients want to access information from several terminals but the mainframe technology was too costly so to save money they want something new. At that time it was just a hope but now we have a cloud today.

In a simple term - It's a term for a service made available over the network on-demand for an optimized highly scalable service provider, the name cloud computing was inspired by the cloud symbol. Exact information and origin of the cloud are still debated but attributes are agreed upon.

Accessing Cloud:
We can access Cloud using various devices shown below in the diagram:


November 12, 2018

Data Mining - Mid Sem Solutions


Question:
Give an example for each of the following preprocessing activates
a. Incomplete
b. Inconsistent

Answer:
Data Processing: It is a data mining technique that involves transforming raw data into an understandable format. Our Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Hence it is needed for resolving such issues.
"Preprocessing is needed to improve data quality"

A. Incomplete: Lacking attribute values, lacking certain attributes of interest, or containing only aggregate data.
E.g. Many tuples have no recorded value for several attributes,
Occupation = “ ” (missing data)

B. Inconsistent: Containing discrepancies in codes or names.
E.g.
Age = “42”, Birthday = “03/07/2010”
Was rating “1, 2, 3”, now rating “A, B, C”
discrepancy between duplicate records

Software Testing Methodologies - Mid Sem Solutions




Question
One of your friends has written a program to search a string in a string and requested you to test the below function using equivalent class portioning :

function strpos_generic( $haystack, $needle, $nth,  $insensitive)
Following are terminology definitions:
  • $haystack= the string in which you need to search value
  • $needle= the character that needs to be searched, the $needle should be a single character
  • $nth= occurrence you want to find, the value can be number between 1,2,3
  • $insensitive= 1- case insensitive, 0 or any other value is case sensitive
  • Passing as Null as parameter in haystack or needle is not a valid scenario and will return boolean false
  • The function will return mixed integer either the position of the $nth occurrence of $needle in $haystack, or Boolean false if it can't be found.
a) Derive positive and negative domain equivalence class portioning with the value in the following format     

$haystack
$needle
$nth
$insensitive




                                                                             
b) And drive test case for weak robust variant in following format and use the variable efficiently and cover  all the paths:
 
Sl.no
$haystack
$needle
$nth
$insensitive
Expected  results

Answer:
NOTE : The above question is for your exercise,kindly try solving it using below reference questions

Question
ABC Transportation Company has implemented an online ticket booking system for their buses, and they have created a simple screen to search the bus and following are some of the business rules they have incorporated in the screen.
From and To location should be minimum 3 characters and maximum  of  25 characters
Date of travel cannot be lesser than today’s date
Time : Morning, Afternoon, Night
No of Passengers can be 1 to 6
Service class can be: Premium, Deluxe, Express
a) Identify the positive and negative domain.
b) Write test case for weak robust variant.

November 05, 2018

Design and Analysis of Algorithms - Mid Sem Solution


Question 1A)
The time factor when determining the efficiency of algorithm is measured by
a) Counting microseconds
b) Counting the number of key operations
c) Counting the number of statements
d) Counting the kilobytes of algorithm

Answer: b
Justification: It is hardware and language independent , rest are dependent on hardware or on software

1B) The concept of order Big O is important because
a) It can be used to decide the best algorithm that solves a given problem
b) It determines the maximum size of a problem that can be solved in a given  amount of time
c) It is the lower bound of the growth rate of algorithm
d) Both A and B

Answer: a
Justification: We know that Big Oh notation is used in time complexity . the reason why we called Big Oh because it says "at any condition this is worst time that an algorithm could take " where as Big Omega says its is "best time that you can get , You cant get better than this.

1C) In worst case Quick Sort has order
A) O (n log n)            B. O (n2 /2)                C. O (log n)                         D. O (n2 /4)

Answer: A
Justification: In the worst case of quick sort has order O(n2). Quick sort is the quickest comparison-based sorting algorithm. It is very fast and requires less additional space, only O(n log n) space is required.

1D) Consider the following program segment. What is the Space complexity?
Begin
i=0;
S=0;
S=s+1;
Return s;
End

Answer: [If i,S considered as integers] => Space Complexity = 4 +(4*2)  = 12 bytes, where additional 4 bytes is for return value

To calculate the Space complexity of an algorithm, we need to follow the below instruction first:
It is the amount of memory used by the algorithm (including the input values to the algorithm) to execute and produce the result.

Space Complexity = Auxiliary Space + Input space

Calculating the Space Complexity:

For calculating the space complexity, we need to know the value of memory used by different type of datatype variables, which generally varies for different operating systems, but the method for calculating the space complexity remains the same.
TypeSize
bool, char, unsigned char, signed char1 byte
short, unsigned short,2 bytes
float, int, unsigned int, long, unsigned long4 bytes
double, long double, long long8 bytes
1E)  What is the time complexity of optimal binary search
(1) a)O(n) b) O(1) c)O(2n/2 d) O(n2)

Answer: d
Justification:
An optimal binary search tree is a binary search tree for which the nodes are arranged on levels such that the tree cost is minimum.

For Reference:
The complexity of linear search algorithm is
a) O(n)
b) O(log n)
c) O(n2)
d) O(n log n)

Answer: a
Justification: It refers to n values complexity in the algorithm which can be reduced by choosing the other algorithms.

The complexity of Bubble sort algorithm is
a) O(n)
b) O(log n)
c) O(n2)
d) O(n log n)

Answer: c
Justification: Bubble sort, is a simple sorting algorithm that works by repeatedly stepping through the list to be sorted, comparing each pair of adjacent items and swapping them if they are in the wrong order.

Question
Randomised algorithms are also called as __________________ and whose behavior is dependent  on  _______________ in decision making as part of its logic

Answer: probabilistic algorithm and randomness

Question:
Match the following    


Problem

Recurrence Equation
1
Binary Search 
A
tn – tn-1 = 1
2
Merge Sort
B
T(n)=2T(n/2)+n-1
3
Sequential Search
C
tn =  tn-1 + n
4
Factorial
D
T(n)=7T(n/2)+18(n/2)2
5
Strassen Matrix Multiplication
E
T(n)=T(n/2)+1
6
Selection Sort
F
T(n)=T(n/2)+n-1









Answer:

T(n) = Time Complexity
 

Problem

Recurrence Equation
1
Binary Search
E
T(n)=T(n/2)+1
2
Merge Sort
B
T(n)=2T(n/2)+n-1
3
Sequential Search
A
tn – tn-1 = 1
4
Factorial
F
T(n)=T(n/2)+n-1
5
Strassen Matrix Multiplication
D
T(n)=7T(n/2)+18(n/2)2
6
Selection Sort
C
tn =  tn-1 + n

Additional Reference:


7
Insertion sort
-
T(N) = T(N-1) + N-1
8
Tree traversal
-
T(n) = 2T(n/2) + 1
9
Quicksort
-
(n) = 2T(n/2) + n
10
Master Theorm
-
T(n)=aT(n/b)+f(n)












Tip:
Merge Sort: The merge sort algorithm deals with the problem of sorting a list of n elements. It is able to sort a list of n elements in O(n log n) run-time, which is considerably faster than insertion sort, which takes O(n2).
Merge sort uses a divide and conquer method:
1. If the length of the list is 1, the list is sorted. Return the list.
2. Otherwise, split the list in two (roughly) equal halves and then recursively merge sort the two halves
3. Merge the two sorted halves into one sorted list.