Perl Weekly Challenge: It MUST be nice!

This week’s Perl Weekly Challenge has a task that wants strings to be “nice”, but if musical theater has taught me anything, it’s that nice is different than good (remember “good“?). But then my wife pointed out that if the task wants the string to be nice, then it must be nice.

So let’s get Washington on our side with Perl Weekly Challenge 329.

Task 1: Counter Integers

You are given a string containing only lower case English letters and digits.

Write a script to replace every non-digit character with a space and then return all the distinct integers left.

Example 1

Input: $str = "the1weekly2challenge2"
Output: 1, 2

2 is appeared twice, so we count it one only.

Example 2

Input: $str = "go21od1lu5c7k"
Output: 21, 1, 5, 7

Example 3

Input: $str = "4p3e2r1l"
Output: 4, 3, 2, 1

Approach

The word I needed to see was “distinct”. So we’re replacing all non-digit characters with whitespace (sounds like a regular expression to me) and then grabbing the integers that remain (sounds like splitting on whitespace), and ensuing that they’re distinct (sounds like a Bag!).

Raku

In the Raku solution, I’m using the <lower> regex character class along with the global regex adverb, :g, the .trim method on the Str class with the .= mutator method call infix operator, the .split routine on the Str class and a BagHash class object (not a Bag, because I wanted it to be mutable while I looped through the output.

sub counterIntegers($str is copy) {
  # replace all lower case letters with space
  $str ~~ s:g/<lower>+/ /;

  # get rid of leading and trailing whitespace
  $str.=trim;

  # split on whitespace
  my @parts = $str.split(/\s+/);

  # build return array of distinct values 
  my @output;
  my $seen = BagHash.new;
  for @parts -> $i {
    @output.push($i) if $i$seen;
    $seen{$i}++;
  }

  return @output;
}

View the entire Raku script for this task on GitHub.

$ raku/ch-1.raku
Example 1:
Input: $str = "the1weekly2challenge2"
Output: 1, 2

Example 2:
Input: $str = "go21od1lu5c7k"
Output: 21, 1, 5, 7

Example 3:
Input: $str = "4p3e2r1l"
Output: 4, 3, 2, 1

Perl

In Perl, it becomes using the \p{Lowercase_Letter} unicode character property, since there’s no trim method we have to use a tried-and-true regular expression to trim whitespace from the beginning and end of the string, and then the good old split function. And since Perl doesn’t have a Bag type, we’re just using a good old hash.

sub counterIntegers($str) {
  # replace all lower case letters with space
  $str =~ s/\p{Lowercase_Letter}+/ /g;

  # get rid of leading and trailing whitespace
  $str =~ s/^\s+|\s+$//g;

  # split on whitespace
  my @parts = split /\s+/, $str;

  # build return array of distinct values 
  my @output;
  my %seen;
  foreach my $i ( @parts ) {
    push @output, $i unless exists $seen{$i};
    $seen{$i}++;
  }

  return @output;
}

View the entire Perl script for this task on GitHub.

Python

In Python, we’re able to take advantage of a neat little feature of sequence types like lists, tuples, and range: you can test whether a value is in the sequence by saying x in s. So instead of needing to maintain a Bag (which, in the past, I’ve implemented in Python as a Dict), I can just append elements to a list.

Like last week, I’m using the regex module instead of the re module because of the former’s support for unicode character properties.

import regex

def counterIntegers(strVar):
  # replace all lower case letters with space
  strVar = regex.sub(r'\p{Lowercase_Letter}+', ' ', strVar)

  # get rid of leading and trailing whitespace
  strVar = strVar.strip()

  # split on whitespace
  parts = strVar.split()

  # build return array of distinct values 
  output = []
  seen = []
  for i in parts:
    if i not in seen:
      output.append(i)
    seen.append(i)

  return output

View the entire Python script for this task on GitHub.

Elixir

But I’m exceedingly pleased with the way the Elixir solution turned out, mostly because the result of each step in my algorithm wound up being the input for the very next step, so I could chain them all together using Elixir’s |> pipe operator. To do the work, we used Regex.replace/4, String.trim/1, String.split/1, and my favorite function to implement loops, Enum.map_reduce/3. And back in PWC 327 I learned about Elixir’s MapSet module, so I didn’t need a Bag type.

defp find_distinct(int, {seen, output}) do
  {int,
    if not MapSet.member?(seen, int) do
      # add int to seen and output
      {MapSet.put(seen, int), output ++ [int]}
    else
      {seen, output}
    end
  }
end

def counter_integers(str) do
  # replace lower case letters with space, then
  # get rid of leading and trailing space, then
  # split on whitespace, then
  # build return array of distinct values
  {_, {_, output}}
    =  Regex.replace(~r/\p{L}+/, str, " ")
    |> String.trim
    |> String.split
    |> Enum.map_reduce({MapSet.new([]), []}, &find_distinct/2)

  output
end

View the entire Elixir script for this task on GitHub.


Task 2: Nice String

You are given a string made up of lower and upper case English letters only.

Write a script to return the longest substring of the given string which is nice. A string is nice if, for every letter of the alphabet that the string contains, it appears both in uppercase and lowercase.

Example 1

Input: $str = "YaaAho"
Output: "aaA"

Example 2

Input: $str = "cC"
Output: "cC"

Example 3

Input: $str = "A"
Output: ""

No nice string found.

Approach

There’s the brute force way to do this: write a function to determine if a string is nice, generate all the substrings of the given string, and keep the longest one that’s nice. But I’ keep’m thinking that there’s a faster way to do this.

Of course! This is just like the first problem. If we make a single pass through the string and determine the characters that do not appear in both upper and lower case version, we can then replace those characters with whitespace, split on the whitespace, and then just keep the longest result.

I’ve added my own example string, though. All the given examples have at most one “nice” substring. But the problem description said “the longest substring”, so I wanted to show that the code could pick the longest when there were multiple substrings that met the niceness criterion.

Raku

It occurred to me when I was thinking about the loop between lines 16-19 that I needed a way to say “swap the case of this letter”. A quick perusal of the methods available in class Str showed me there wasn’t a built-in method to do that, so I whipped a quick one of my own (lines 4-6). If making the character lowercase didn’t change the character, then return the uppercased character, otherwise return the lowercased character.

In this case, because the Bag was going to be built once and not modified later, I could use the immutable Bag class. Passing an array of the characters in the string (courtesy Str.comb on line 11) to the constructor populates the Bag. Then I loop over the keys on the Bag once, and if swapCase() of the character is also in the Bag, we know that character appears both in uppercase and lowercase, so we don’t add it to the list of characters we’re removing

Then on line 22 we build a character class of the character’s we’re replacing, and on line 25 we interpolate that character class into a substitution to replace those characters with spaces. You need to be careful here, though. At first, I was trying s:g/$replace/ / but I wasn’t getting the results I wanted, so I went back to the documentation. I had forgotten that there are four different ways of interpolating a variable into a regex as a pattern:

SyntaxDescription
$variableInterpolates stringified contents of variable literally.
$(code)Runs Raku code inside the regex, and interpolates the stringified return value literally.
<$variable>Interpolates stringified contents of variable as a regex.
<{code}>Runs Raku code inside the regex, and interpolates the stringified return value as a regex.

What I was doing was interpolating the characters in $replace literally, so in example 1, when I interpolated <[Y h o]>+ into the regular expression, it wound up actually being \<\["Y h o"\]\>\+, which would only match the literal string <[Y h o]>+. What I wanted was the <$replace> syntax, which interpolated the contents as a regex, thus matching what I wanted it to.

The rest is pretty straightforward. Lines 28 and 31 were lifted from my code for task 1, and lines 34-37 just identify the longest of the substrings we found.

sub swapCase($l) {
  return $l.lc eq $l ?? $l.uc !! $l.lc;
}

sub niceString($str is copy) {
  # convert the string to a Bag so we
  # know which characters are in it
  my $seen = Bag.new: $str.comb;

  # loop through the letters and make a list
  # of those that only appear in one case
  my @notBothCases;
  for $seen.keys -> $l {
    next if swapCase($l) ∈ $seen;
    @notBothCases.push($l);
  }

  # build a regex of the characters we're removing
  my $replace = '<[' ~ @notBothCases.join(' ') ~ ']>+';

  # replace those characters with space
  $str ~~ s:g/<$replace>/ /;

  # get rid of leading and trailing whitespace
  $str.=trim;

  # split on whitespace
  my @substrings = $str.split(/\s+/);

  # loop through the substrings we've found and find the longest
  my $longest = '';
  for @substrings -> $s {
    $longest = $s if $s.chars > $longest.chars;
  }

  return $longest;
} 

View the entire Raku script for this task on GitHub.

$ raku/ch-2.raku
Example 1:
Input: $str = "YaaAho"
Output: "aaA"

Example 2:
Input: $str = "cC"
Output: "cC"

Example 3:
Input: $str = "A"
Output: ""

Example 4:
Input: $str = "AaBcDEFdefGhIiJjKlm"
Output: "DEFdef"

Perl

When I went to translate this into Perl, however, I hit an error. In example 2, I got the error

Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE ]+/ at perl/ch-2.pl line 25.

I knew immediately what it was telling me: because there were no characters in example 2 that were not part of a nice substring, the regular expression I was interpolating on (then) line 25 was []+. Apparently, in Raku <[]>+ doesn’t pose a problem, but Perl doesn’t like []+. But the solution was easy: I just wrap building the regular expression and replacing the characters using that regular expression in an if (@notBothCases) { ... } condition; if there’s no characters to remove, we don’t need to take those steps, anyway.

Also, in Perl we don’t need to worry about how we interpolate the variable into the substitution regex; a careful reading of perldoc perlre says that the value of a variable interpolated into a regular expression is always interpreted as a regular expression, not as literal characters.

sub swapCase($l) {
  return lc($l) eq $l ? uc($l) : lc($l);
}

sub niceString($str) {
  # convert the string to a Bag so we
  # know which characters are in it
  my %seen = map { $_ => 1 } split //, $str;

  # loop through the letters and make a list
  # of those that only appear in one case
  my @notBothCases;
  foreach my $l ( keys %seen) {
    next if exists $seen{ swapCase($l) };
    push @notBothCases, $l;
  }

 if (@notBothCases) {
    # build a regex of the characters we're removing
    my $replace = '[' . join('', @notBothCases) . ']+';

    # replace those characters with space
    $str =~ s/$replace/ /g;
 }

  # get rid of leading and trailing whitespace
  $str =~ s/^\s+|\s+$//g;

  # split on whitespace
  my @substrings = split /\s+/, $str;

  # loop through the substrings we've found and find the longest
  my $longest = '';
  foreach my $s ( @substrings ) {
    $longest = $s if length($s) > length($longest);
  }

  return $longest;
}

View the entire Perl script for this task on GitHub.

Python

In Python, the biggest thing that tripped me up was having to remember that nothing worked “in-place”; I had to say strVar = replace.sub(' ', strVar) and strVar = strVar.strip() instead of just replace.sub(' ', strVar) and strVar.strip().

from collections import Counter
import re

def swapCase(l):
  return l.upper() if l.lower() == l else l.lower()

def niceString(strVar):
  # convert the string to a Bag, er, Counter
  # so we know which characters are in it
  seen = Counter(list(strVar))

  # loop through the letters and make a list
  # of those that only appear in one case
  notBothCases = []
  for l in list(seen):
    if swapCase(l) not in list(seen):
      notBothCases.append(l)

  if notBothCases:
    # build a regex of the characters we're removing
    replace = re.compile('[' + ''.join(notBothCases) + ']+')

    # replace those characters with space
    strVar = replace.sub(' ', strVar)

  # get rid of leading and trailing whitespace
  strVar = strVar.strip()

  # split on whitespace
  substrings = strVar.split()

  # loop through the substrings we've found and find the longest
  longest = ''
  for s in substrings:
    if len(s) > len(longest):
      longest = s

  return longest

View the entire Python script for this task on GitHub.

Elixir

The Elixir solution took a few more lines than I expected, but that’s probably because I commented the heck out of it. I did have swap_case/1 and find_longest/2 defined this way

def swap_case(l) do
  case do
    String.downcase(l) == l -> String.upcase(l)
    true                    -> String.downcase(l)
  end
end

defp find_longest(string, longest) do
  {
    string,
    case do
      String.length(string) > String.length(longest) -> string
      true                                           -> longest
    end
  }
end

but I wanted to tighten them up a little because it felt overly expansive. As with example 1, I’m using

def swap_case(l) do
  if String.downcase(l) == l,
    do: String.upcase(l),
    else: String.downcase(l)
end

defp find_longest(string, longest) do
  longest = if String.length(string) > String.length(longest),
    do: string, else: longest
  {string, longest}
end

def nice_string(str) do
  # convert the string to a Bag, er, MapSet
  # so we know which characters are in it
  seen = MapSet.new(String.codepoints(str))

  # filter the letters and make a list
  # of those that only appear in one case
  not_both_cases
    =  String.codepoints(str)
    |> Enum.filter(fn l ->
         not MapSet.member?(seen, swap_case(l))
       end)

  # build a regex of the characters we're removing
  {_, replace} = cond do
    # no characters to replace, use a regex of chars
    # we'll never see, like a SPACE
    Enum.empty?(not_both_cases) -> Regex.compile(" ")
    true -> ["["] ++ not_both_cases ++ ["]+"]
      |> Enum.join
      |> Regex.compile
  end

  # replace lower case letters with space, then
  # get rid of leading and trailing space, then
  # split on whitespace, then
  # build return array of distinct values
  {_, output}
    =  Regex.replace(replace, str, " ")
    |> String.trim
    |> String.split
    |> Enum.map_reduce("", &find_longest/2)

  output
end

View the entire Elixir script for this task on GitHub.


Here’s all my solutions in GItHub: https://github.com/packy/perlweeklychallenge-club/tree/master/challenge-329/packy-anderson