Note: This is just a reference paper which you can go through, we are facing some issue with the website. If you have any more important question/answer, let us know.
Share it on our Email - 1trickyworld1@gmail.com
Question:
For the following vectors x and y, calculate the cosine similarity and euclidean distance measures:
x =(4,4,4,4), y=(2,2,2,2)
Solution:
Cosine
x ● y = 4*2 + 4*2 + 4*2 + 4*2 = 32
||x|| = sqrt(4*4 + 4*4 + 4*4 + 4*4) = sqrt (64) = 8
||y|| = sqrt(2*2 + 2*2 + 2*2 + 2*2) = sqrt (16) = 4
cos(x,y) = (x ● y) / (||x||*||y||) = (32)/ (8*4)
cos(x,y) = 1
x ● y = 4*2 + 4*2 + 4*2 + 4*2 = 32
||x|| = sqrt(4*4 + 4*4 + 4*4 + 4*4) = sqrt (64) = 8
||y|| = sqrt(2*2 + 2*2 + 2*2 + 2*2) = sqrt (16) = 4
cos(x,y) = (x ● y) / (||x||*||y||) = (32)/ (8*4)
cos(x,y) = 1
Euclidean
d(x, y) = sqrt((4-2)^2 + (4-2)^2 + (4-2)^2 + (4-2)^2)
Euclidean distance = 4
d(x, y) = sqrt((4-2)^2 + (4-2)^2 + (4-2)^2 + (4-2)^2)
Euclidean distance = 4
Question:
Consider the one-dimensional data set shown on the below table
X
|
0.6
|
3.2
|
4.5
|
4.6
|
4.9
|
5.2
|
5.6
|
5.8
|
7.1
|
9.5
|
Y
|
-
|
-
|
+
|
+
|
+
|
-
|
-
|
+
|
-
|
-
|
Classify the data point x=5.0 according to its 3- and 9- nearest neighbors (Using majority Vote)
Answer:
We need to first find the
difference of each data set with respect to x=5.0, Refer the below table for
the same.
x
|
X
|
Difference (x & X)
|
Y
|
5.0
|
0.6
|
4.4
|
−
|
5.0
|
3.2
|
1.8
|
−
|
5.0
|
4.5
|
0.5
|
+
|
5.0
|
4.6
|
0.4
|
+
|
5.0
|
4.9
|
0.1
|
+
|
5.0
|
5.2
|
0.2
|
−
|
5.0
|
5.6
|
0.6
|
−
|
5.0
|
5.8
|
0.8
|
+
|
5.0
|
7.1
|
2.1
|
−
|
5.0
|
9.5
|
4.5
|
−
|
As asked,
Using 3- nearest neighbors method,
3 Closest points to the point x=5.0 will be the one who has least difference
among them - > 4.9, 5.2, 4.6
Classes -> + − +
Using Majority Vote, 3-nearest
neighbor: +
Using 9- nearest neighbors method,
9 Closest points to the point x=5.0 will be the one who has least difference
among them - > 4.9, 5.2, 4.6, 4.5, 5.6, 5.8, 3.2, 7.1, 0.6
Classes -> + − + + − + − − −
Using Majority Vote, 9-nearest
neighbor: −Question:
Suppose a group of 12 sales price records has been sorted as follows:
5; 10; 11; 13; 15; 35; 50; 55; 72; 90; 204; 215:
Partition them into three bins by each of the following methods.
(a) equal-frequency partitioning
(b) equal-width partitioning
(c) clustering
Answer:
(a) equal-frequency (equidepth) partitioning:
Partition the data into equidepth bins of depth 4: [given as n=4]
Bin 1: 5, 10, 11, 13
Bin 2: 15, 35, 50, 55
Bin 3: 72, 90, 204, 215
(b) equal-width partitioning:
Partitioning the data into 3 equi-width bins will require the width to be (215−5)/3 = 70.
We get interval like- (1,70),(71,140),(141,210),(211,280)
Bin 1: 5, 10, 11, 13, 15, 35, 50, 55
Bin 2:72, 90
Bin 3: 204
Bin 4: 215
(c) clustering:
Using K-means clustering to partition the data into three bins we get
Bin 1: 5, 10, 11, 13, 15, 35
Bin 2: 50, 55, 72, 90
Bin 3: 204, 215