Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Need help with removing duplicate characters with Regex

$
0
0

@mohitlearns wrote:

Hello,

I understand the following code in parts. Need your help in demystifying the whole code block.
The following code helps remove the repeating character in a word.
e.g. It can convert “wooooowwww” to a “wow”…“Yesssss” to a “Yes” etc

old_word = 'finalllyyy' 
repeat_pattern = re.compile(r'(\w*)(\w)\2(\w*)') 
match_substitution = r'\1\2\3' 
new_word = repeat_pattern.sub(match_substitution,old_word) 

This is how I understand it.

repeat_pattern = re.compile(r'(\w*)(\w)\2(\w*)') 

Group 1 which captures a set of zero or more characters. Group 2 captures a character and also captures a back reference to it. Finally, capture another set of characters in group 3. Is my understanding correct?

new_word = repeat_pattern.sub(match_substitution,old_word)

The above snippet is doing some sort of replacement but I am not sure how. Can someone explain?

Thanks

Mohit

Posts: 2

Participants: 2

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles