๐ง๐ฒ๐ ๐ ๐ฆ๐ถ๐บ๐ถ๐น๐ฎ๐ฟ๐ถ๐๐ ๐ถ๐ป ๐ก๐๐ฃ
You want to find similarity between two texts. I tested several ways to do this.
Cosine similarity works best. It turns text into vectors. It looks at the angle between vectors. 0 means the texts are different. 1 means they are same.
Use dynamic programming for custom logic. This helps when you need specific character rules.
Use ROUGE to get a confidence score. It counts n-grams in both strings. You track three metrics:
- Recall: Overlap vs original text length.
- Precision: Overlap vs generated text length.
- F1 Score: The balance between both.
Focus on precision for close matches. Balance it with recall for total matches.
I also tried regex patterns. I used rewards and penalties for wildcards. This improved the results.
Source: https://dev.to/sirisha_chiruvolu_f5136d5/finding-similarity-scores-between-text-in-natural-language-processing-239n Optional learning community: https://t.me/GyaanSetuAi