00 cosine similiarity
Consine Similiarity¶
adalah sebuah fungsi yang mengukur kemiripan (similiarity) dua vector yang bernilai dengan menghitung consinus dari sududt vector tersebut. Fungsi ini banyak digunakan luas dalam bidang Data Scient dan Machine Learning terutama untuk tujuan analisa text, pencarian dokumen dan rekomendasi.
Formula¶
$$ \text{Cosine Similarity} = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|} $$
atau
$$ \text{Cosine Similarity} = \frac{\sum_{i=1}^{n} A_i B_i}{\sqrt{\sum_{i=1}^{n} A_i^2} \sqrt{\sum_{i=1}^{n} B_i^2}} $$
Contoh¶
Katakan kalimat pet, dog, dan lion memiliki nilai embeddings masing-masing [2.0,0.1,1.9], [1.5,1.0,1.2], dan [13.5,13.5,13.5]
In [3]:
Copied!
import numpy as np
import math as m
from sympy import *
import matplotlib.pyplot as plt
def length_mag(data):
return m.sqrt(sum(pow(data,2)))
embedding_pet = np.array([2.0,0.1,1.9])
embedding_dog = np.array([1.5,0.1,1.2])
embedding_lion = np.array([13.5,13.5,13.5])
product_a_dot_b = embedding_pet * embedding_dog
product_pow_pet = length_mag(embedding_pet)
product_pow_dog = length_mag(embedding_dog)
consine_similirity = sum(product_a_dot_b / (product_pow_dog * product_pow_pet))
print(consine_similirity)
import numpy as np
import math as m
from sympy import *
import matplotlib.pyplot as plt
def length_mag(data):
return m.sqrt(sum(pow(data,2)))
embedding_pet = np.array([2.0,0.1,1.9])
embedding_dog = np.array([1.5,0.1,1.2])
embedding_lion = np.array([13.5,13.5,13.5])
product_a_dot_b = embedding_pet * embedding_dog
product_pow_pet = length_mag(embedding_pet)
product_pow_dog = length_mag(embedding_dog)
consine_similirity = sum(product_a_dot_b / (product_pow_dog * product_pow_pet))
print(consine_similirity)
0.9962706226617222
In [4]:
Copied!
fig, ax = plt.subplots()
ax.set_title("Similiarity between Pet and Dog")
ax.plot(embedding_pet,embedding_dog, marker="o", linestyle = '--', color='red')
plt.show()
fig, ax = plt.subplots()
ax.set_title("Similiarity between Pet and Dog")
ax.plot(embedding_pet,embedding_dog, marker="o", linestyle = '--', color='red')
plt.show()