This is a great algorithm to find similarity coefficient of the two words.

Author is Viterbi or Bellman.I didn't find anything about it in the Internet.

For example exists dictionary: mother, uncle, cat, dog. And you typed smth like

"maaather", as the result will be "maaather" -> mother.

This algorithm possible to use in spell checking,sound recognition etc.

Algorithm description will be later

Author is Viterbi or Bellman.I didn't find anything about it in the Internet.

For example exists dictionary: mother, uncle, cat, dog. And you typed smth like

"maaather", as the result will be "maaather" -> mother.

This algorithm possible to use in spell checking,sound recognition etc.

Algorithm description will be later

**Here is the source code:**/** * * This class is for finding coefficient of similarity of the two words * using Viterbi(Bellman) algorithm * @author Sergey */ public class Analyzer { private double init[][]; private double result[][]; private int n; private int m; private void calculateASCIIDifference(String str1, String str2) { try{ n = str1.length(); m = str2.length(); init = new double[n][m]; for (int i = n - 1,i2 = 0; i >= 0; i--, i2++) { for (int j = 0; j < m; j++) { init[i2][j] = Math.abs((int) str1.charAt(i) - (int) str2.charAt(j)); } } } catch(Exception e){ System.out.println("calculateASCIIDifference exception: "+e.toString()); } } private double[][] getASCIIDifference() { return init; } private void setASCIIDifference(double[][] a,int n, int m) { this.n = n; this.m = m; this.init = a; } /** * * Calculates similarity index, minimal value is more similar * </br> * you should always compare source with template word like this: * </br> * calculateSimilarity(iphone 5, iphone) </br> * calculateSimilarity(nokia, iphone) </br> * calculateSimilarity(inone, iphone) </br> * * @param str1 - source word * @param str2 - template word * @return */ public double calculateSimilarity(String str1, String str2) { calculateASCIIDifference(str1,str2); try{ result = new double[n][m]; double res, val1, val2, val3; //initialize first element result[n - 1][0] = init[n - 1][0]; //initialize first column for (int i = n - 2; i >= 0; i--) { result[i][0] = result[i + 1][0] + init[i][0]; } //initialize first row(from down) for (int j = 1; j < m; j++) { result[n - 1][j] = result[n - 1][j - 1] + init[n - 1][j]; } //initialize others for (int i = n - 2; i >= 0; i--) { for (int j = 1; j < m; j++) { val1 = result[i][j - 1] + init[i][j]; val2 = result[i + 1][j] + init[i][j]; val3 = result[i + 1][j - 1] + (init[i][j] * 2); //minimum of the 3 val's res = val1 < val2 ? val1 : val2; res = res < val3 ? res : val3; result[i][j] = res; } } return getSimilarityValue(); } catch(Exception e){ System.out.println("calculateSimilarity exception: "+e.toString()); } return -1; } //getting result of Similarity private double getSimilarityValue() { //need to devide if m != n if(m != n) return result[0][m - 1] / (m + n); else return result[0][m - 1]; } //getting result array private double[][] getSimilarityArray() { return result; } //to test public static void main(String[] args) { double result1,result2; Analyzer an = new Analyzer(); result1 = an.calculateSimilarity("mather","mother"); result2 = an.calculateSimilarity("father","mother"); //result1 will be less than result2 => result1 is more similar System.out.println(result1 + " " + result2); } }

**Good links:**

Related stackoverflow question

Levenshtein distance, soundex and metaphone

This is not accurate enough

ReplyDeletefor example:

"iphone" and "iphone": Similarity is 0

"iphone" and "iphone5": Similarity is 3.69

"iphone" and "iphone 5": Similarity is 8.36

and

"iphone" and "casemate": Similarity is 3.57 ????

Why?

maybe the program does not compute similarity but their distance....so there is obvious to find that the first case have distance equal 0

Deleteyou should always compare with same word like this:

Deleteiphone -> iphone 5 (gives 8)

casemate -> iphone5 (gives 160)

8 < 160

Why this is a fuzzy search?

ReplyDeleteToru

Delete