Perl Weekly Challenge 370‘s tasks are “Popular Word” and “Scramble String”. I’m sure I could have found some music to do with scrambling, but considering I started pairing these posts with music back when the task was “Seize The Day!” and my mind immediately went to Newsies, I decided I needed to stay in my musical theatre lane and go with the Popular option.

And, speaking of theater, the reason I’m so late this week is because every evening I’ve been running sound for a local production of Julius Cæsar. I had to wait to do my coding on the weekend.

Task 1: Popular Word

You are given a string paragraph and an array of the banned words.

Write a script to return the most popular word that is not banned. It is guaranteed there is at least one word that is not banned and the answer is unique. The words in paragraph are case-insensitive and the answer should be in lowercase. The words can not contain punctuation symbols.

Example 1

Input: $paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
       @banned = ("hit")
Output: "ball"

After removing punctuation and converting to lowercase, the word "hit" appears 3 times, and "ball" appears 2 times.
Since "hit" is on the banned list, we ignore it.

Example 2

Input: $paragraph = "Apple? apple! Apple, pear, orange, pear, apple, orange."
       @banned = ("apple", "pear")
Output: "orange"

"apple"  appears 4 times.
"pear"   appears 2 times.
"orange" appears 2 times.

"apple" and "pear" are both banned.
Even though "orange" has the same frequency as "pear", it is the only non-banned word with the highest frequency.

Example 3

Input: $paragraph = "A. a, a! A. B. b. b."
       @banned = ("b")
Output: "a"

"a" appears 4 times.
"b" appears 3 times.

The input has mixed casing and heavy punctuation.
The normalised, "a" is the clear winner, since "b" is banned, "a" is the only choice.

Example 4

Input: $paragraph = "Ball.ball,ball:apple!apple.banana"
       @banned = ("ball")
Output: "apple"

Here the punctuation acts as a delimiter.
"ball"   appears 3 times.
"apple"  appears 2 times.
"banana" appears 1 time.

Example 5

Input: $paragraph = "The dog chased the cat, but the dog was faster than the cat."
       @banned = ("the", "dog")
Output: "cat"

"the" appears 4 times.
"dog" appears 2 times.
"cat" appears 2 times.

"chased", "but", "was", "faster", "than" appear 1 time each.
"the" is the most frequent but is banned.
"dog" is the next most frequent but is also banned.
The next most frequent non-banned word is "cat".

Approach

I saw this and immediately thought “BAG!” Of all the built-in classes in Raku, I think my favorite is the Bag class. It’s like a hash, but it makes it really easy to count the occurrences of items.

Basically, we’re going to convert the paragraph into a list of words by removing punctuation from it, lowercasing it, splitting it up on whitespace, filtering out banned words, and then feeding that into a bag, and then pulling out the word that has the highest count in the bag.

Raku

I realized when I ran this that I couldn’t just strip the punctuation, I needed to convert it to whitespace, because example 4 has no whitespace. Fortunately, that was fixed by adding ," " to line 7. Sadly, I couldn’t make it single statement because I needed to make the banned list a Bag in a separate statement.

sub popular($paragraph, @banned) {
  my %banned = @banned.Bag;
  $paragraph
    .subst(/<-[a..zA..Z\s]>/," "):global # remove punctuation
    .lc                            # make lowercase
    .split(/\s+/, :skip-empty)     # split on whitespace
    .grep({ %banned{$_}:!exists }) # filter out banned words
    .Bag                           # count remaining words
    .max(*.value)                  # get word with max count
    .keys                          # convert key => val to key
    .first;                        # get the word from the list
}

sub popular($paragraph, @banned) {
  my %banned = @banned.Bag;
  $paragraph
    .subst(/<-[a..zA..Z\s]>/," "):global # remove punctuation
    .lc                            # make lowercase
    .split(/\s+/, :skip-empty)     # split on whitespace
    .grep({ %banned{$_}:!exists }) # filter out banned words
    .Bag                           # count remaining words
    .max(*.value)                  # get word with max count
    .keys                          # convert key => val to key
    .first;                        # get the word from the list
}

View the entire Raku script for this task on GitHub.

$ raku/ch-1.raku
Example 1:
Input: $paragraph = "Bob hit a ball, the hit BALL flew far
                     after it was hit."
       @banned = ("hit")
Output: "ball"

Example 2:
Input: $paragraph = "Apple? apple! Apple, pear, orange,
                     pear, apple, orange."
       @banned = ("apple", "pear")
Output: "orange"

Example 3:
Input: $paragraph = "A. a, a! A. B. b. b."
       @banned = ("b")
Output: "a"

Example 4:
Input: $paragraph = "Ball.ball,ball:apple!apple.banana"
       @banned = ("ball")
Output: "apple"

Example 5:
Input: $paragraph = "The dog chased the cat, but the dog was
                     faster than the cat."
       @banned = ("the", "dog")
Output: "cat"

$ raku/ch-1.raku
Example 1:
Input: $paragraph = "Bob hit a ball, the hit BALL flew far
                     after it was hit."
       @banned = ("hit")
Output: "ball"

Example 2:
Input: $paragraph = "Apple? apple! Apple, pear, orange,
                     pear, apple, orange."
       @banned = ("apple", "pear")
Output: "orange"

Example 3:
Input: $paragraph = "A. a, a! A. B. b. b."
       @banned = ("b")
Output: "a"

Example 4:
Input: $paragraph = "Ball.ball,ball:apple!apple.banana"
       @banned = ("ball")
Output: "apple"

Example 5:
Input: $paragraph = "The dog chased the cat, but the dog was
                     faster than the cat."
       @banned = ("the", "dog")
Output: "cat"

Perl

The Perl solution is essentially the same. I’m using List::UtilsBy::max_by to pull out the most frequent word, and I’m following Matthias Muth‘s example by using List::MoreUtils’ frequency to create the bag (like I did back in PWC350). I reworded the comments to reflect that we really have to read the statement that does most of the work back to front.

use List::MoreUtils qw( frequency );
use List::UtilsBy   qw( max_by );

sub popular($paragraph, @banned) {
  my %banned = map { $_ => 1 } @banned;

  $paragraph =~ s/[^a-zA-Z\s]/ /g; # remove punctuation
  my %popular = frequency          # count words after
    grep { !exists $banned{$_}}    # filter out banned words after
    split(" ",                     # split on whitespace after
      lc($paragraph)               # make lowercase
    );

  max_by { $popular{$_} } keys %popular; # get word with max count
}

use List::MoreUtils qw( frequency );
use List::UtilsBy   qw( max_by );

sub popular($paragraph, @banned) {
  my %banned = map { $_ => 1 } @banned;

  $paragraph =~ s/[^a-zA-Z\s]/ /g; # remove punctuation
  my %popular = frequency          # count words after
    grep { !exists $banned{$_}}    # filter out banned words after
    split(" ",                     # split on whitespace after
      lc($paragraph)               # make lowercase
    );

  max_by { $popular{$_} } keys %popular; # get word with max count
}

View the entire Perl script for this task on GitHub.

Python

In Python, however, I was able to get everything back into a single statement. There’s a most_common() method on collections's Counter class, the only snag being that it doesn’t just return a list of the keys of the most common n items in the counter, it returns a list containing tuples of the keys and counts for those items, so I need to unpack it twice.

from collections import Counter
import re

def popular(paragraph, banned):
  return (
    (
      Counter(                     # count remaining words
        [                          # after operations below
          w for w in 
            re.sub(
              r'[^a-zA-Z\s]', ' ', # remove punctuation
              paragraph.lower()    # make lowercase
            ).split()              # split on whitespace
            if w not in banned     # filter out banned words
        ]
      ).most_common(1) # returns the 1 most common element
    )[0] # most_common() returns a LIST of word => count
  )[0] # return just the word

from collections import Counter
import re

def popular(paragraph, banned):
  return (
    (
      Counter(                     # count remaining words
        [                          # after operations below
          w for w in 
            re.sub(
              r'[^a-zA-Z\s]', ' ', # remove punctuation
              paragraph.lower()    # make lowercase
            ).split()              # split on whitespace
            if w not in banned     # filter out banned words
        ]
      ).most_common(1) # returns the 1 most common element
    )[0] # most_common() returns a LIST of word => count
  )[0] # return just the word

View the entire Python script for this task on GitHub.

Elixir

And Elixir let me do it all in one chain of commands as well. Enum.max_by/4 was the key by letting me find the element in the map with the maximum value and returning a tuple of the key and the count.

def popular(paragraph, banned) do
  banned = banned |> Enum.frequencies
  # remove punctuation
  Regex.replace(~r/[^a-zA-Z\s]/, paragraph, " ")
  |> String.downcase # make lowercase
  |> String.split(" ", trim: true) # split on whitespace
   # filter out banned words
  |> Enum.filter(fn w -> ! Map.has_key?(banned, w) end)
  |> Enum.frequencies # count remaining words
  |> Enum.max_by(fn {_k, v} -> v end) # find maximum value
  |> elem(0) # return key of tuple
end

def popular(paragraph, banned) do
  banned = banned |> Enum.frequencies
  # remove punctuation
  Regex.replace(~r/[^a-zA-Z\s]/, paragraph, " ")
  |> String.downcase # make lowercase
  |> String.split(" ", trim: true) # split on whitespace
   # filter out banned words
  |> Enum.filter(fn w -> ! Map.has_key?(banned, w) end)
  |> Enum.frequencies # count remaining words
  |> Enum.max_by(fn {_k, v} -> v end) # find maximum value
  |> elem(0) # return key of tuple
end

View the entire Elixir script for this task on GitHub.

Task 2: Scramble String

You are given two strings A and B of the same length.

Write a script to return true if string B is a scramble of string A otherwise return false.

String B is a scramble of string A if A can be transformed into B by a single (recursive) scramble operation.

A scramble operation is:

- If the string consists of only one character, return the string.
- Divide the string X into two non-empty parts.
- Optionally, exchange the order of those parts.
- Optionally, scramble each of those parts.
- Concatenate the scrambled parts to return a single string.

Example 1

Input: $str1 = "abc", $str2 = "acb"
Output: true

"abc"
split: ["a", "bc"]
split: ["a", ["b", "c"]]
swap: ["a", ["c", "b"]]
concatenate: "acb"

Example 2

Input: $str1 = "abcd", $str2 = "cdba"
Output: true

"abcd"
split: ["ab", "cd"]
swap: ["cd", "ab"]
split: ["cd", ["a", "b"]]
swap: ["cd", ["b", "a"]]
concatenate: "cdba"

Example 3

Input: $str1 = "hello", $str2 = "hiiii"
Output: false

A fundamental rule of scrambled strings is that they must be anagrams.

Example 4

Input: $str1 = "ateer", $str2 = "eater"
Output: true

"ateer"
split: ["ate", "er"]
split: [["at", "e"], "er"]
swap: [["e", "at"], "er"]
concatenate: "eater"

Example 5

Input: $str1 = "abcd", $str2 = "bdac"
Output: false

Approach

Effectively, this is a tree traversal algorithm. The parent node is the $str1, and each time we split the string or swap pieces, we’re creating child nodes of that parent. The trick is to pass along a copy of $str2 each time we make a call so we can tell when we’ve found a stopping point for that branch.

I also found out there are sometimes more ways to get to the same conclusion. I wasn’t able to get my debugging output to look the same say Roger Bell West (the author of this challenge) did, but I did notice that my algorithm got to the same solution for example 2 through a different path:

"abcd"
split: ["a", "bcd"] (c, dba)
swap: ["bcd", "a"]
 split: ["b", "cd"] (c, db)
 swap: ["cd", "b"]
 concatenate: "cdb"
concatenate: "cdba"

Raku

I made a utility function to divide the string into two parts because I wanted that operation to read more easily in the main scramble function.

sub divide($string, $div) {
  my $end = $string.chars - 1;
  return (
    $string.substr(0..$div),
    $string.substr($div+1..$end)
  );
}

sub scramble($source, $target) {
  # If the string consists of only one character,
  # return the string
  return $source if $source.chars == 1;

  # if the string equals the target we're seeking,
  # return the string
  return $source if $source eq $target;

  my ($s1, $s2, $t1, $t2);
  for 0 .. $source.chars - 2 -> $div {
    # Divide the string X into two non-empty parts
    ($s1, $s2) = divide($source, $div);
    ($t1, $t2) = divide($target, $div);

    # exchange the order of those parts unless one part
    # matches the target
    unless ($s1 eq $t1 || $s2 eq $t2) {
      ($s2, $s1) = ($s1, $s2);
      # adjust the target split as well
      ($t1, $t2) = divide($target, $s1.chars-1);
    }

    # scramble each of those parts unless all parts
    # match their targets
    ($s1, $s2) = (scramble($s1, $t1), scramble($s2, $t2))
      unless ($s1 eq $t1 && $s2 eq $t2);

    last if $s1 ~ $s2 eq $target;
  }
  # Concatenate the scrambled parts to return a single string.
  return $s1 ~ $s2;
}

sub is_scrambled($source, $target) {
  scramble($source, $target) eq $target;
}

sub divide($string, $div) {
  my $end = $string.chars - 1;
  return (
    $string.substr(0..$div),
    $string.substr($div+1..$end)
  );
}

sub scramble($source, $target) {
  # If the string consists of only one character,
  # return the string
  return $source if $source.chars == 1;

  # if the string equals the target we're seeking,
  # return the string
  return $source if $source eq $target;

  my ($s1, $s2, $t1, $t2);
  for 0 .. $source.chars - 2 -> $div {
    # Divide the string X into two non-empty parts
    ($s1, $s2) = divide($source, $div);
    ($t1, $t2) = divide($target, $div);

    # exchange the order of those parts unless one part
    # matches the target
    unless ($s1 eq $t1 || $s2 eq $t2) {
      ($s2, $s1) = ($s1, $s2);
      # adjust the target split as well
      ($t1, $t2) = divide($target, $s1.chars-1);
    }

    # scramble each of those parts unless all parts
    # match their targets
    ($s1, $s2) = (scramble($s1, $t1), scramble($s2, $t2))
      unless ($s1 eq $t1 && $s2 eq $t2);

    last if $s1 ~ $s2 eq $target;
  }
  # Concatenate the scrambled parts to return a single string.
  return $s1 ~ $s2;
}

sub is_scrambled($source, $target) {
  scramble($source, $target) eq $target;
}

View the entire Raku script for this task on GitHub.

$ raku/ch-2.raku | less
Example 1:
Input: $str1 = "abc", $str2 = "acb"
Output: True

Example 2:
Input: $str1 = "abcd", $str2 = "cdba"
Output: True

Example 3:
Input: $str1 = "hello", $str2 = "hiiii"
Output: False

Example 4:
Input: $str1 = "ateer", $str2 = "eater"
Output: True

Example 5:
Input: $str1 = "abcd", $str2 = "bdac"
Output: False

$ raku/ch-2.raku | less
Example 1:
Input: $str1 = "abc", $str2 = "acb"
Output: True

Example 2:
Input: $str1 = "abcd", $str2 = "cdba"
Output: True

Example 3:
Input: $str1 = "hello", $str2 = "hiiii"
Output: False

Example 4:
Input: $str1 = "ateer", $str2 = "eater"
Output: True

Example 5:
Input: $str1 = "abcd", $str2 = "bdac"
Output: False

Perl

Making the utility function divide made translating this from Raku to Perl easier, because I only had to worry about the differences between Perl’s substr and Raku’s .substr in one place.

sub divide($string, $div) {
  return (
    substr($string, 0, $div+1),
    substr($string, $div+1)
  );
}

sub scramble($source, $target) {
  # If the string consists of only one character,
  # return the string
  return $source if length($source) == 1;

  # if the string equals the target we're seeking,
  # return the string
  return $source if $source eq $target;

  my ($s1, $s2, $t1, $t2);
  foreach my $div (0 .. length($source) - 2) {
    # Divide the string X into two non-empty parts
    ($s1, $s2) = divide($source, $div);
    ($t1, $t2) = divide($target, $div);

    # exchange the order of those parts unless one part
    # matches the target
    unless ($s1 eq $t1 || $s2 eq $t2) {
      ($s2, $s1) = ($s1, $s2);
      # adjust the target split as well
      ($t1, $t2) = divide($target, length($s1)-1);
    }

    # scramble each of those parts unless all parts
    # match their targets
    ($s1, $s2) = (scramble($s1, $t1), scramble($s2, $t2))
      unless ($s1 eq $t1 && $s2 eq $t2);

    last if $s1 . $s2 eq $target;
  }
  # Concatenate the scrambled parts to return a single string.
  return $s1 . $s2;
}

sub is_scrambled($source, $target) {
  scramble($source, $target) eq $target ? 'True' : 'False';
}

sub divide($string, $div) {
  return (
    substr($string, 0, $div+1),
    substr($string, $div+1)
  );
}

sub scramble($source, $target) {
  # If the string consists of only one character,
  # return the string
  return $source if length($source) == 1;

  # if the string equals the target we're seeking,
  # return the string
  return $source if $source eq $target;

  my ($s1, $s2, $t1, $t2);
  foreach my $div (0 .. length($source) - 2) {
    # Divide the string X into two non-empty parts
    ($s1, $s2) = divide($source, $div);
    ($t1, $t2) = divide($target, $div);

    # exchange the order of those parts unless one part
    # matches the target
    unless ($s1 eq $t1 || $s2 eq $t2) {
      ($s2, $s1) = ($s1, $s2);
      # adjust the target split as well
      ($t1, $t2) = divide($target, length($s1)-1);
    }

    # scramble each of those parts unless all parts
    # match their targets
    ($s1, $s2) = (scramble($s1, $t1), scramble($s2, $t2))
      unless ($s1 eq $t1 && $s2 eq $t2);

    last if $s1 . $s2 eq $target;
  }
  # Concatenate the scrambled parts to return a single string.
  return $s1 . $s2;
}

sub is_scrambled($source, $target) {
  scramble($source, $target) eq $target ? 'True' : 'False';
}

View the entire Perl script for this task on GitHub.

Python

The tricky part going from Raku to Python was remembering all the places I needed to adjust the boundaries: when slicing the string, it was string[div:], not string[div+1:], and I needed to remember range(1, N) produced numbers 1 through N-1. The one that had me the longest was on line 29, where I initially had len(s1)-1 and only example 4 was providing the wrong result.

def divide(string, div):
  return (
    string[0:div],
    string[div:]
  )

def scramble(source, target):
  # If the string consists of only one character,
  # return the string
  if len(source) == 1: return source

  # if the string equals the target we're seeking,
  # return the string
  if source == target: return source

  s1, s2 = "", ""
  for div in range(1, len(source)):
    # Divide the string X into two non-empty parts
    s1, s2 = divide(source, div)
    t1, t2 = divide(target, div)

    # exchange the order of those parts unless one part
    # matches the target
    if not (s1 == t1 or s2 == t2):
      s2, s1 = s1, s2
      # adjust the target split as well
      t1, t2 = divide(target, len(s1))

    # scramble each of those parts unless all parts
    # match their targets
    if not (s1 == t1 and s2 == t2):
      s1, s2 = scramble(s1, t1), scramble(s2, t2)

    if s1 + s2 == target: break

  # Concatenate the scrambled parts to return a single string.
  return s1 + s2

def is_scrambled(source, target):
  return scramble(source, target) == target

def divide(string, div):
  return (
    string[0:div],
    string[div:]
  )

def scramble(source, target):
  # If the string consists of only one character,
  # return the string
  if len(source) == 1: return source

  # if the string equals the target we're seeking,
  # return the string
  if source == target: return source

  s1, s2 = "", ""
  for div in range(1, len(source)):
    # Divide the string X into two non-empty parts
    s1, s2 = divide(source, div)
    t1, t2 = divide(target, div)

    # exchange the order of those parts unless one part
    # matches the target
    if not (s1 == t1 or s2 == t2):
      s2, s1 = s1, s2
      # adjust the target split as well
      t1, t2 = divide(target, len(s1))

    # scramble each of those parts unless all parts
    # match their targets
    if not (s1 == t1 and s2 == t2):
      s1, s2 = scramble(s1, t1), scramble(s2, t2)

    if s1 + s2 == target: break

  # Concatenate the scrambled parts to return a single string.
  return s1 + s2

def is_scrambled(source, target):
  return scramble(source, target) == target

View the entire Python script for this task on GitHub.

Elixir

The fun part about doing this in Elixir was figuring out how to loop from 1 -> length(source) with the ability to bail out of the loop early. As always, this probably means I should have a recursive function doing the loop, and I just don’t call the next iteration if I’ve satisfied my stopping condition.

def divide(string, div) do
  [
    String.slice(string, 0..div-1),
    String.slice(string, div..100)
  ]
end

def divide_at(div, source, len, _) when div == len, do: source

def divide_at(div, source, len, target) do
  # Divide the string X into two non-empty parts
  [s1, s2] = divide(source, div)
  [t1, t2] = divide(target, div)

  # exchange the order of those parts unless one part
  # matches the target
  [s1, s2, t1, t2] = if not (s1 == t1 or s2 == t2) do
    [s2, s1] ++ divide(target, String.length(s2))
  else
    [s1, s2, t1, t2]
  end

  # scramble each of those parts unless all parts
  # match their targets
  {s1, s2} = if not (s1 == t1 and s2 == t2) do
    {
      scramble(s1, String.length(s1), t1),
      scramble(s2, String.length(s2), t2)
    }
  else
    {s1, s2}
  end

  if s1 <> s2 == target do
    s1 <> s2
  else
    divide_at(div+1, source, len, target)
  end
end

# If the string consists of only one character,
# return the string
def scramble(source, len, _) when len == 1, do: source

# if the string equals the target we're seeking,
# return the string
def scramble(s, _, t) when s == t, do: s

def scramble(source, len, target) do
  divide_at(1, source, len, target)
end

def is_scrambled(source, target) do
  scramble(source, String.length(source), target) == target
end

def divide(string, div) do
  [
    String.slice(string, 0..div-1),
    String.slice(string, div..100)
  ]
end

def divide_at(div, source, len, _) when div == len, do: source

def divide_at(div, source, len, target) do
  # Divide the string X into two non-empty parts
  [s1, s2] = divide(source, div)
  [t1, t2] = divide(target, div)

  # exchange the order of those parts unless one part
  # matches the target
  [s1, s2, t1, t2] = if not (s1 == t1 or s2 == t2) do
    [s2, s1] ++ divide(target, String.length(s2))
  else
    [s1, s2, t1, t2]
  end

  # scramble each of those parts unless all parts
  # match their targets
  {s1, s2} = if not (s1 == t1 and s2 == t2) do
    {
      scramble(s1, String.length(s1), t1),
      scramble(s2, String.length(s2), t2)
    }
  else
    {s1, s2}
  end

  if s1 <> s2 == target do
    s1 <> s2
  else
    divide_at(div+1, source, len, target)
  end
end

# If the string consists of only one character,
# return the string
def scramble(source, len, _) when len == 1, do: source

# if the string equals the target we're seeking,
# return the string
def scramble(s, _, t) when s == t, do: s

def scramble(source, len, target) do
  divide_at(1, source, len, target)
end

def is_scrambled(source, target) do
  scramble(source, String.length(source), target) == target
end

View the entire Elixir script for this task on GitHub.

Here’s all my solutions in GitHub: https://github.com/packy/perlweeklychallenge-club/tree/challenge-370-packy-anderson/challenge-370/packy-anderson

Packy’s Place

Packy's just 'zis guy, you know?

Perl Weekly Challenge: The scramble can not stop you / from becoming popu-ler… lar

Task 1: Popular Word

Approach

Raku

Perl

Python

Elixir

Task 2: Scramble String

Approach

Raku

Perl

Python

Elixir

Related