I have two datasets: dataset1 & dataset2, which have a common column called SAX which is a string object.
dataset1=
SAX
0 gangsyu
1 zicobgm
2 eerptow
3 cqbsynt
4 zvmqben
.. ...
475 rfikekw
476 bnbzvqx
477 rsuhgax
478 ckhloio
479 lbzujtw
480 rows × 2 columns
and
dataset2=
SAX
0 gdmgsyu
1 zifgbgm
2 esdftow
3 cqtjgnt
4 znweben
.. ...
475 rfikekw
476 bnbzvqx
477 rsuhgax
478 ckhloio
479 lbzujtw
480 rows × 2 columns
I need the output to be a column which is the sum of the number of the edits/changes it takes for SAX(dataset1) to become SAX(dataset2).The variation is basically what I am considering as an “edit/change” . (Example shown below)
Taking the first row of SAX from dataset1 and dataset 2 and comparing.
"gangsyu" and "gdmgsyu"
First character "g" is a match, so move on.
Second character is not a match, it takes 3 edits to change "a" to "d".
Third character is not a match, it takes 1 edit to change "n" to "m"
Rest of the characters match.
I want the column to be a sum of the edits/changes which is 3+1 = 4.(shown below)
dataset3=
sum_edits
0 4 (for the example shown right above)
1 0
2 1
3 2
4 0
.. ...
475 3
476 0
477 8
478 1
479 4
480 rows × 2 columns
Is there a function/method to accomplish this? Would appreciate it a lot.
Thanks.
1 post - 1 participant