This is a great algorithm to find similarity coefficient of the two words.
Author is Viterbi or Bellman.I didn't find anything about it in the Internet.
For example exists dictionary: mother, uncle, cat, dog. And you typed smth like
"maaather", as the result will be "maaather" -> mother.
This algorithm possible to use in spell checking,sound recognition etc.
Algorithm description will be later
Here is the source code:
Author is Viterbi or Bellman.I didn't find anything about it in the Internet.
For example exists dictionary: mother, uncle, cat, dog. And you typed smth like
"maaather", as the result will be "maaather" -> mother.
This algorithm possible to use in spell checking,sound recognition etc.
Algorithm description will be later
Here is the source code:
/** * * This class is for finding coefficient of similarity of the two words * using Viterbi(Bellman) algorithm * @author Sergey */ public class Analyzer { private double init[][]; private double result[][]; private int n; private int m; private void calculateASCIIDifference(String str1, String str2) { try{ n = str1.length(); m = str2.length(); init = new double[n][m]; for (int i = n - 1,i2 = 0; i >= 0; i--, i2++) { for (int j = 0; j < m; j++) { init[i2][j] = Math.abs((int) str1.charAt(i) - (int) str2.charAt(j)); } } } catch(Exception e){ System.out.println("calculateASCIIDifference exception: "+e.toString()); } } private double[][] getASCIIDifference() { return init; } private void setASCIIDifference(double[][] a,int n, int m) { this.n = n; this.m = m; this.init = a; } /** * * Calculates similarity index, minimal value is more similar * </br> * you should always compare source with template word like this: * </br> * calculateSimilarity(iphone 5, iphone) </br> * calculateSimilarity(nokia, iphone) </br> * calculateSimilarity(inone, iphone) </br> * * @param str1 - source word * @param str2 - template word * @return */ public double calculateSimilarity(String str1, String str2) { calculateASCIIDifference(str1,str2); try{ result = new double[n][m]; double res, val1, val2, val3; //initialize first element result[n - 1][0] = init[n - 1][0]; //initialize first column for (int i = n - 2; i >= 0; i--) { result[i][0] = result[i + 1][0] + init[i][0]; } //initialize first row(from down) for (int j = 1; j < m; j++) { result[n - 1][j] = result[n - 1][j - 1] + init[n - 1][j]; } //initialize others for (int i = n - 2; i >= 0; i--) { for (int j = 1; j < m; j++) { val1 = result[i][j - 1] + init[i][j]; val2 = result[i + 1][j] + init[i][j]; val3 = result[i + 1][j - 1] + (init[i][j] * 2); //minimum of the 3 val's res = val1 < val2 ? val1 : val2; res = res < val3 ? res : val3; result[i][j] = res; } } return getSimilarityValue(); } catch(Exception e){ System.out.println("calculateSimilarity exception: "+e.toString()); } return -1; } //getting result of Similarity private double getSimilarityValue() { //need to devide if m != n if(m != n) return result[0][m - 1] / (m + n); else return result[0][m - 1]; } //getting result array private double[][] getSimilarityArray() { return result; } //to test public static void main(String[] args) { double result1,result2; Analyzer an = new Analyzer(); result1 = an.calculateSimilarity("mather","mother"); result2 = an.calculateSimilarity("father","mother"); //result1 will be less than result2 => result1 is more similar System.out.println(result1 + " " + result2); } }
Good links:
Related stackoverflow question
Levenshtein distance, soundex and metaphone
This is not accurate enough
ReplyDeletefor example:
"iphone" and "iphone": Similarity is 0
"iphone" and "iphone5": Similarity is 3.69
"iphone" and "iphone 5": Similarity is 8.36
and
"iphone" and "casemate": Similarity is 3.57 ????
Why?
maybe the program does not compute similarity but their distance....so there is obvious to find that the first case have distance equal 0
Deleteyou should always compare with same word like this:
Deleteiphone -> iphone 5 (gives 8)
casemate -> iphone5 (gives 160)
8 < 160
Why this is a fuzzy search?
ReplyDeleteToru
Delete