00 cosine similiarity
Consine Similiarity¶
adalah sebuah fungsi yang mengukur kemiripan (similiarity) dua vector yang bernilai dengan menghitung consinus dari sududt vector tersebut. Fungsi ini banyak digunakan luas dalam bidang Data Scient dan Machine Learning terutama untuk tujuan analisa text, pencarian dokumen dan rekomendasi.
Formula¶
$$ \text{Cosine Similarity} = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|} $$
atau
$$ \text{Cosine Similarity} = \frac{\sum_{i=1}^{n} A_i B_i}{\sqrt{\sum_{i=1}^{n} A_i^2} \sqrt{\sum_{i=1}^{n} B_i^2}} $$
Contoh¶
Katakan kalimat pet, dog, dan lion memiliki nilai embeddings masing-masing [2.0,0.1,1.9], [1.5,1.0,1.2], dan [13.5,13.5,13.5]
In [1]:
Copied!
import numpy as np
import math as m
from sympy import *
import matplotlib.pyplot as plt
import pandas as pd
def length_mag(data):
return m.sqrt(sum(pow(data,2)))
embedding_pet = np.array([2.0,0.1,1.9])
embedding_dog = np.array([1.5,0.1,1.2])
embedding_lion = np.array([13.5,13.5,13.5])
product_a_dot_b = embedding_pet * embedding_dog
product_pow_pet = length_mag(embedding_pet)
product_pow_dog = length_mag(embedding_dog)
import numpy as np
import math as m
from sympy import *
import matplotlib.pyplot as plt
import pandas as pd
def length_mag(data):
return m.sqrt(sum(pow(data,2)))
embedding_pet = np.array([2.0,0.1,1.9])
embedding_dog = np.array([1.5,0.1,1.2])
embedding_lion = np.array([13.5,13.5,13.5])
product_a_dot_b = embedding_pet * embedding_dog
product_pow_pet = length_mag(embedding_pet)
product_pow_dog = length_mag(embedding_dog)
In [2]:
Copied!
data = {'embd_pet':embedding_pet,'embd_dog':embedding_dog,'dot_prduct_pet_and_dog':product_a_dot_b}
df = pd.DataFrame(data)
df
data = {'embd_pet':embedding_pet,'embd_dog':embedding_dog,'dot_prduct_pet_and_dog':product_a_dot_b}
df = pd.DataFrame(data)
df
Out[2]:
| embd_pet | embd_dog | dot_prduct_pet_and_dog | |
|---|---|---|---|
| 0 | 2.0 | 1.5 | 3.00 |
| 1 | 0.1 | 0.1 | 0.01 |
| 2 | 1.9 | 1.2 | 2.28 |
In [3]:
Copied!
consine_similirity = sum(product_a_dot_b / (product_pow_dog * product_pow_pet))
print(f"Consine product_a_dot_b {product_a_dot_b}")
print(f"product_pow_pet {product_pow_pet}")
print(f"product_pow_dog {product_pow_dog}")
print("-"*10)
print(f"Consine Similiarity {consine_similirity}")
consine_similirity = sum(product_a_dot_b / (product_pow_dog * product_pow_pet))
print(f"Consine product_a_dot_b {product_a_dot_b}")
print(f"product_pow_pet {product_pow_pet}")
print(f"product_pow_dog {product_pow_dog}")
print("-"*10)
print(f"Consine Similiarity {consine_similirity}")
Consine product_a_dot_b [3. 0.01 2.28] product_pow_pet 2.760434748368452 product_pow_dog 1.9235384061671343 ---------- Consine Similiarity 0.9962706226617222
In [4]:
Copied!
fig, ax = plt.subplots()
ax.set_title("Similiarity between Pet and Dog")
ax.plot(embedding_pet,embedding_dog, marker="o", linestyle = '--', color='red')
plt.show()
fig, ax = plt.subplots()
ax.set_title("Similiarity between Pet and Dog")
ax.plot(embedding_pet,embedding_dog, marker="o", linestyle = '--', color='red')
plt.show()
Or we can ultilize function on Numpy¶
In [5]:
Copied!
cosine_similarity_np = np.dot(embedding_pet, embedding_dog) / (
np.linalg.norm(embedding_pet) * np.linalg.norm(embedding_dog)
)
print(cosine_similarity_np)
cosine_similarity_np = np.dot(embedding_pet, embedding_dog) / (
np.linalg.norm(embedding_pet) * np.linalg.norm(embedding_dog)
)
print(cosine_similarity_np)
0.9962706226617221