Perl Weekly Challenge: Good Old-Fashioned Replacement Boy?

Perl Weekly Challenge 328 is about replacing strings, and calling one of the problems “Good” made me think of a song my wife claims is mostly forgotten by mainstream fans of Queen: Good Old-Fashioned Lover Boy.

So let’s crank out some good old-fashioned code.

Task 1: Replace all ?

You are given a string containing only lower case English letters and ?.

Write a script to replace all ? in the given string so that the string doesn’t contain consecutive repeating characters.

Example 1

Input: $str = "a?z"
Output: "abz"

There can be many strings, one of them is "abz".
The choices are 'a' to 'z' but we can't use either 'a' or 'z' to replace the '?'.

Example 2

Input: $str = "pe?k"
Output: "peak"

Example 3

Input: $str = "gra?te"
Output: "grabte"

Approach

This calls for a regular expression: we match all ? characters in the string along with the characters before them and after them. For each ? we match, we start with a list of lowercase English letters, remove the characters before and after from the list, and then grab the first letter that’s left.

One of the things that’s not directly addressed in the problem definition (or any of the examples) is what to do with runs of multiple ? characters. What I’m going to do is keep replacing each of the characters so they don’t produce consecutive repeating characters.

Raku

Things I had to remember in coding the Raku solution:

  • Raku parameters are read-only, and is rw will not work because I’m passing a string constant, so I have to use is copy.
  • I decided I didn’t want my replacements to look like strings of ababab, so I pulled .pick() out of my memory!
  • I had to look up how Raku does named captures.
  • And using recursion to do any looping will make this a lot easier when I port it to Elixir.
sub replaceAllQuestion($str is copy) {
  # return the unmodified string if there are no ? characters
  return $str unless $str ~~ /$<before> = . \?+ $<after> = ./;

  # since we matched something, let's replace the first ? with a
  # character that won't produce a consecutive repeating character
  my $replace = ('a' .. 'z')
              .grep({$_ ne $<before> && $_ ne $<after> })
              .pick(); # let's not always use the first character

  # we're EXPLICITLY only replacing the first ?
  $str ~~ s/\?/$replace/;

  # recursively call this function to replace any remaining ?s
  return replaceAllQuestion($str);
}

View the entire Raku script for this task on GitHub.

$ raku/ch-1.raku
Example 1:
Input: $str = "a?z"
Output: "ayz"

Example 2:
Input: $str = "pe?k"
Output: "pejk"

Example 3:
Input: $str = "gra?te"
Output: "graxte"

Example 4:
Input: $str = "so?c?r"
Output: "soicbr"

Example 5:
Input: $str = "mi??i??i??i"
Output: "miraiycibzi"

Example 6:
Input: $str = "i??????????????????n"
Output: "ifhrifxyabeaxzuebdon"

Perl

Making the Raku solution be a Perl solution was easy: I had to look up named captures because it’s not a feature I use often for work, and I remembered that I used shuffle for PWC 289, but everything else was pretty much the standard Raku ⇒ Perl translation.

use List::AllUtils qw( shuffle );

sub replaceAllQuestion($str) {
  # return the unmodified string if there are no ? characters
  return $str unless $str =~ /(?<before> .) \?+ (?<after> .)/x;

  # since we matched something, let's replace the first ? with a
  # character that won't produce a consecutive repeating character
  my $replace = (
    shuffle ( # let's not always use the first character
      grep {$_ ne $+{before} && $_ ne $+{after} } ('a' .. 'z')
    )
  )[0]; # but pull the first character off the shuffled list

  # we're EXPLICITLY only replacing the first ?
  $str =~ s/\?/$replace/;

  # recursively call this function to replace any remaining ?s
  return replaceAllQuestion($str);
}

View the entire Perl script for this task on GitHub.

Python

When I first ran this, I wound up with runs of the same character, and I had to actually read the documentation for re.sub so I could see there was a count=1 parameter I needed to pass so it only replaced the first occurrence of ?.

from   random import choice
import re
from   string import ascii_lowercase

hasQuestion = re.compile(
  r'(?P<before> . ) \?+ (?P<after> . )',
  re.X # ignore whitespace
)

def replaceAllQuestion(str):
  # return the unmodified string if there are no ? characters
  m = re.search(hasQuestion, str)
  if not m: return str

  # since we matched something, let's replace the first ? with a
  # character that won't produce a consecutive repeating character
  replace = choice([ # let's not always use the first character
    c for c in ascii_lowercase
      if c != m.group('before') and c != m.group('after')
  ])

  # we're EXPLICITLY only replacing the first ?
  str = re.sub(r'\?', replace, str, count=1)

  # recursively call this function to replace any remaining ?s
  return replaceAllQuestion(str)

View the entire Python script for this task on GitHub.

Elixir

As I said up in my Raku solution, using a recursive algorithm meant that I wouldn’t need to re-think anything when it came time to implement it in Elixir. Generating the range of characters was a challenge at first

$ iex
Erlang/OTP 27 [erts-15.2.7] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [dtrace]

Interactive Elixir (1.18.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> "a".."z"
** (ArgumentError) ranges (first..last//step) expect both sides to be integers, got: "a".."z"
    (elixir 1.18.3) lib/kernel.ex:4160: Kernel.validate_range!/2
    (elixir 1.18.3) expanding macro: Kernel.".."/2
    iex:1: (file)

But then I read deeper and saw that using a ? in front of a character literal reveals its code point, and that led me to find out that ?a..?z generates a list of characters from a – z. Enum.take_random/2 yields a random character out of that list. But when I tried to use that character in a replacement, I got an error:

$ elixir/ch-1.exs
Example 1:
Input: $str = "a?z"
** (FunctionClauseError) no function clause matching in Regex.precompile_replacement/1

    The following arguments were given to Regex.precompile_replacement/1:

        # 1
        ~c"n"

    Attempted function clauses (showing 6 out of 6):

        defp precompile_replacement(replacement) when is_function(replacement)
        defp precompile_replacement("")
        defp precompile_replacement(<<92, 103, 123, rest::binary>>) when byte_size(rest) > 0
        defp precompile_replacement(<<92, 92, rest::binary>>)
        defp precompile_replacement(<<92, x, rest::binary>>) when is_integer(x) and x >= 48 and x <= 57
        defp precompile_replacement(<<x, rest::binary>>)

    (elixir 1.18.3) lib/regex.ex:744: Regex.precompile_replacement/1
    (elixir 1.18.3) lib/regex.ex:739: Regex.replace/4
    elixir/ch-1.exs:14: PWC.replaceAllQuestion/1
    elixir/ch-1.exs:25: PWC.solution/1
    elixir/ch-1.exs:30: (file)

Ah, because it’s a character, and Regex.replace/4 wants a string as its third argument. So even though I learned to read the documentation thoroughly enough to know I had to use global: false as a fourth parameter in order to only replace the first ?, I didn’t think about the ramifications of the third argument being String.t until I ran into the error (and, to be fair, once I got the error, I immediately knew what the problem was).

def replace_all_question(str) do
  match = Regex.named_captures(
    ~r/(?<before>.)\?+(?<after>.)/,
    str
  )
  if match do
    # since we matched something, let's replace the first ?
    # with a character that won't produce a consecutive
    # repeating character
    replace = ?a..?z |> Enum.take_random(1) |> List.to_string
    str = Regex.replace(~r/\?/, str, replace, global: false)
    # recursively call this function to replace any remaining ?s
    replace_all_question(str)
  else
    # return the unmodified string if there are no ? characters
    str
  end
end

View the entire Elixir script for this task on GitHub.


Task 2: Good String

You are given a string made up of lower and upper case English letters only.

Write a script to return the good string of the given string. A string is called good string if it doesn’t have two adjacent same characters, one in upper case and other is lower case.

Just to be explicit, you can only remove pair if they are same characters, one in lower case and other in upper case, order is not important.

Example 1

Input: $str = "WeEeekly"
Output: "Weekly"

We can remove either, "eE" or "Ee" to make it good.

Example 2

Input: $str = "abBAdD"
Output: ""

We remove "bB" first: "aAdD"
Then we remove "aA": "dD"
Finally remove "dD".

Example 3

Input: $str = "abc"
Output: "abc"

Approach

Again, this smacks of regular expression and recursion, especially because example 2 shows us iterating through the string and removing pairs, some of which weren’t adjacent characters before the first removal. What we want to do is capture a character that’s either lower case or upper case, and then swap the case on the character and match to see if that character is present.

Raku

I knew what I wanted to do, but getting the syntax of the regular expression correct in Raku was my biggest challenge. I went through a couple iterations inspired by this code in the Raku Regexes manual page before I finally landed on the code I used. Remember that <lower> and <upper> are Raku’s pre-defined character classes to match lower and upper case letters.

sub goodString($str is copy) {
  my $c;
  # return the unmodified string if there are no matches
  return $str
    unless $str ~~ /$<match> = ( (<lower>) {$c = $0.uc;} $c ||
                                 (<upper>) {$c = $0.lc;} $c )/;

  # replace the match with the empty string
  my $replace = $<match>;
  # when I try to use $<match> in the replacement, it doesn't work
  $str ~~ s/$replace//;

  # recursively call to eliminate additional pairs
  goodString($str);
}

View the entire Raku script for this task on GitHub.

$ raku/ch-2.raku
Example 1:
Input: $str = "WeEeekly"
Output: "Weekly"

Example 2:
Input: $str = "abBAdD"
Output: ""

Example 3:
Input: $str = "abc"
Output: "abc"

Perl

I tried doing the Perl regular expression the way I did the Raku regular expression, but I wasn’t getting what I wanted. For example,

$str =~ /(?<match> (\p{Ll}) (?{ $c = uc($^N) }) $c |
                   (\p{Lu}) (?{ $c = lc($^N) }) $c )/x;

wound up matching the first W in WeEeekly and nothing else. I was just about ready to give up and iterate over characters manually when I saw the following in perlre:

(??{ code })
This is a “postponed” regular subexpression. It behaves in exactly the same way as a (?{ code }) code block as described above, except that its return value, rather than being assigned to $^R, is treated as a pattern, compiled if it’s a string (or used as-is if it’s a qr// object), then matched as if it were inserted instead of this construct.

Wait! That’s exactly what I want! Instead of assigning a variable in the code block, the return value of the code block is matched in the string! In a way, this is even cleaner than the Raku version:

sub goodString($str) {
  # return the unmodified string if there are no matches
  return $str
    unless $str =~ /(?<match> (\p{Ll}) (??{ uc($^N) }) |
                              (\p{Lu}) (??{ lc($^N) }) )/x;

  # replace the match with the empty string
  my $replace = $+{match};
  # when I try to use $<match> in the replacement, it doesn't work
  $str =~ s/$replace//;

  # recursively call to eliminate additional pairs
  goodString($str);
}

I feel it would be a good thing to link to the unicode properties \p{Ll} and \p{Lu} and where the heck $^N came from for documentation.

View the entire Perl script for this task on GitHub.

Python

However, if I’m going to use unicode character properties, I can’t use Python’s re, I have to install the regex module (which I did before back in PWC 289). But complicating things even more is that Python doesn’t support interpolating Python in a regular expression, so we can’t do the trick we did in Perl.

Not to worry, though! There are more tricks we can pull. What we can do is use a regex to search for a sequence of two letters, then programmatically check to see if those letters are of differing case. If they are, return the match. If they’re not, check the string after the match to see if there are any more sequences of two letters.

Once we have that logic done up in its own function, the rest of the algorithm looks pretty much the same as the Raku and Perl solutions.

import regex

def hasBadMatch(str):
  # sanity check: only check if it's long enough
  if len(str) < 2: return False

  # match a sequence of two of the same character,
  # but be case-insensitive
  m = regex.search(r'(\p{L})\1', str, regex.I)

  # we didn't match, so return false
  if not m: return False

  # get the characters matched
  char1 = m.group(0)[0:1]
  char2 = m.group(0)[1:]

  if (char1 != char2 # it's not the same character, same case
      and 
      # they're the same character with different case!
      (char1 == char2.lower() or char1 == char2.upper())):
    return m

  # the match wasn't of differeing case, so search the
  # rest of the string AFTER the match
  hasBadMatch(str[m.end():])

def goodString(str):
  # see if the string has a sequence of the same character twice,
  # once upper case and once lower case, in any order
  m = hasBadMatch(str)

  # return the unmodified string if there are no matches
  if not m: return str

  # we don't need a regular expression, since we have the
  # character pair in m.group(0), we can use str.replace()
  #
  # recursively call the function with the new string
  return goodString(str.replace(m.group(0), '', count=1))

View the entire Python script for this task on GitHub.

Elixir

And even though Elixir is in the same boat Python is in—interpolating Elixir code into a regular expression—since we already solved that problem in Python, we can do the same thing in Elixir.

One thing a coworker pointed out to me: the Elixir style guide says function names in Elixir “must” be snake case, not camelCase. The compiler accepts it, but those with Elixir experience won’t.

# sanity check: only check if it's long enough
def has_bad_match(str) when length(str) < 2, do: str

def has_bad_match(str) do
  # match a sequence of two of the same character,
  # but be case-insensitive
  match = Regex.run(~r/(\p{L})\1/i, str, capture: :first)
  if match do
    # pull the match out of the list returned by Regex.run/3
    match = match |> List.first
    # in case we need to search later in the string, get
    # the portion of the string AFTER the match
    after_match = String.split(str, match, parts: 2)
                |> List.last
    # get the characters matched
    char1 = match |> String.first
    char2 = match |> String.last
    if char1 != char2 do
      # it's not the same character, same case
      if char1 == String.downcase(char2) or
         char1 == String.upcase(char2) do
        # they're the same character with different case!
        match
      else
        # they're different characters
        has_bad_match(after_match)
      end
    else
      # it IS the same character, same case
      has_bad_match(after_match)
    end
  else
    # we didn't match, so return false
    nil
  end
end

def good_string(str) do
  # see if the string has a sequence of the same character twice,
  # once upper case and once lower case, in any order
  match = has_bad_match(str)

  if match do
    # we don't need a regular expression, since we have the
    # character pair in match, we can use String.replace/4
    #
    # recursively call the function with the new string
    good_string(String.replace(str, match, "", global: false))
  else
    # return the unmodified string if there are no matches
    str
  end
end

View the entire Elixir script for this task on GitHub.


Here’s all my solutions in GItHub: https://github.com/packy/perlweeklychallenge-club/tree/master/challenge-328/packy-anderson

One thought on “Perl Weekly Challenge: Good Old-Fashioned Replacement Boy?

  1. Pingback: Perl Weekly Challenge: It MUST be nice! | Packy’s Place

Comments are closed.