Perl Weekly Challenge 370‘s tasks are “Popular Word” and “Scramble String”. I’m sure I could have found some music to do with scrambling, but considering I started pairing these posts with music back when the task was “Seize The Day!” and my mind immediately went to Newsies, I decided I needed to stay in my musical theatre lane and go with the Popular option.
And, speaking of theater, the reason I’m so late this week is because every evening I’ve been running sound for a local production of Julius Cæsar. I had to wait to do my coding on the weekend.
Task 1: Popular Word
You are given a string paragraph and an array of the banned words.
Write a script to return the most popular word that is not banned. It is guaranteed there is at least one word that is not banned and the answer is unique. The words in paragraph are case-insensitive and the answer should be in lowercase. The words can not contain punctuation symbols.
Input: $paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
@banned = ("hit")
Output: "ball"
After removing punctuation and converting to lowercase, the word "hit" appears 3 times, and "ball" appears 2 times.
Since "hit" is on the banned list, we ignore it.
Input: $paragraph = "Apple? apple! Apple, pear, orange, pear, apple, orange."
@banned = ("apple", "pear")
Output: "orange"
"apple" appears 4 times.
"pear" appears 2 times.
"orange" appears 2 times.
"apple" and "pear" are both banned.
Even though "orange" has the same frequency as "pear", it is the only non-banned word with the highest frequency.
Input: $paragraph = "A. a, a! A. B. b. b."
@banned = ("b")
Output: "a"
"a" appears 4 times.
"b" appears 3 times.
The input has mixed casing and heavy punctuation.
The normalised, "a" is the clear winner, since "b" is banned, "a" is the only choice.
Input: $paragraph = "Ball.ball,ball:apple!apple.banana"
@banned = ("ball")
Output: "apple"
Here the punctuation acts as a delimiter.
"ball" appears 3 times.
"apple" appears 2 times.
"banana" appears 1 time.
Input: $paragraph = "The dog chased the cat, but the dog was faster than the cat."
@banned = ("the", "dog")
Output: "cat"
"the" appears 4 times.
"dog" appears 2 times.
"cat" appears 2 times.
"chased", "but", "was", "faster", "than" appear 1 time each.
"the" is the most frequent but is banned.
"dog" is the next most frequent but is also banned.
The next most frequent non-banned word is "cat".
Approach
I saw this and immediately thought “BAG!” Of all the built-in classes in Raku, I think my favorite is the Bag class. It’s like a hash, but it makes it really easy to count the occurrences of items.
Basically, we’re going to convert the paragraph into a list of words by removing punctuation from it, lowercasing it, splitting it up on whitespace, filtering out banned words, and then feeding that into a bag, and then pulling out the word that has the highest count in the bag.
Raku
I realized when I ran this that I couldn’t just strip the punctuation, I needed to convert it to whitespace, because example 4 has no whitespace. Fortunately, that was fixed by adding ," " to line 7. Sadly, I couldn’t make it single statement because I needed to make the banned list a Bag in a separate statement.
sub popular($paragraph, @banned) {
my %banned = @banned.Bag;
$paragraph
.subst(/<-[a..zA..Z\s]>/," "):global # remove punctuation
.lc # make lowercase
.split(/\s+/, :skip-empty) # split on whitespace
.grep({ %banned{$_}:!exists }) # filter out banned words
.Bag # count remaining words
.max(*.value) # get word with max count
.keys # convert key => val to key
.first; # get the word from the list
}View the entire Raku script for this task on GitHub.
$ raku/ch-1.raku
Example 1:
Input: $paragraph = "Bob hit a ball, the hit BALL flew far
after it was hit."
@banned = ("hit")
Output: "ball"
Example 2:
Input: $paragraph = "Apple? apple! Apple, pear, orange,
pear, apple, orange."
@banned = ("apple", "pear")
Output: "orange"
Example 3:
Input: $paragraph = "A. a, a! A. B. b. b."
@banned = ("b")
Output: "a"
Example 4:
Input: $paragraph = "Ball.ball,ball:apple!apple.banana"
@banned = ("ball")
Output: "apple"
Example 5:
Input: $paragraph = "The dog chased the cat, but the dog was
faster than the cat."
@banned = ("the", "dog")
Output: "cat"Perl
The Perl solution is essentially the same. I’m using List::UtilsBy::may_by to pull out the most frequent word, and I’m following Matthias Muth‘s example by using List::MoreUtils’ frequency to create the bag (like I did back in PWC350). I reworded the comments to reflect that we really have to read the statement that does most of the work back to front.
use List::MoreUtils qw( frequency );
use List::UtilsBy qw( max_by );
sub popular($paragraph, @banned) {
my %banned = map { $_ => 1 } @banned;
$paragraph =~ s/[^a-zA-Z\s]/ /g; # remove punctuation
my %popular = frequency # count words after
grep { !exists $banned{$_}} # filter out banned words after
split(" ", # split on whitespace after
lc($paragraph) # make lowercase
);
max_by { $popular{$_} } keys %popular; # get word with max count
}View the entire Perl script for this task on GitHub.
Python
In Python, however, I was able to get everything back into a single statement. There’s a most_common() method on collections's Counter class, the only snag being that it doesn’t just return a list of the keys of the most common n items in the counter, it returns a list containing tuples of the keys and counts for those items, so I need to unpack it twice.
from collections import Counter
import re
def popular(paragraph, banned):
return (
(
Counter( # count remaining words
[ # after operations below
w for w in
re.sub(
r'[^a-zA-Z\s]', ' ', # remove punctuation
paragraph.lower() # make lowercase
).split() # split on whitespace
if w not in banned # filter out banned words
]
).most_common(1) # returns the 1 most common element
)[0] # most_common() returns a LIST of word => count
)[0] # return just the wordView the entire Python script for this task on GitHub.
Elixir
And Elixir let me do it all in one chain of commands as well. Enum.max_by/4 was the key by letting me find the element in the map with the maximum value and returning a tuple of the key and the count.
def popular(paragraph, banned) do
banned = banned |> Enum.frequencies
# remove punctuation
Regex.replace(~r/[^a-zA-Z\s]/, paragraph, " ")
|> String.downcase # make lowercase
|> String.split(" ", trim: true) # split on whitespace
# filter out banned words
|> Enum.filter(fn w -> ! Map.has_key?(banned, w) end)
|> Enum.frequencies # count remaining words
|> Enum.max_by(fn {_k, v} -> v end) # find maximum value
|> elem(0) # return key of tuple
endView the entire Elixir script for this task on GitHub.
Task 2: Scramble String
You are given two strings A and B of the same length.
Write a script to return true if string B is a scramble of string A otherwise return false.
String B is a scramble of string A if A can be transformed into B by a single (recursive) scramble operation.
A scramble operation is:
- If the string consists of only one character, return the string.
- Divide the string X into two non-empty parts.
- Optionally, exchange the order of those parts.
- Optionally, scramble each of those parts.
- Concatenate the scrambled parts to return a single string.
Input: $str1 = "abc", $str2 = "acb"
Output: true
"abc"
split: ["a", "bc"]
split: ["a", ["b", "c"]]
swap: ["a", ["c", "b"]]
concatenate: "acb"
Input: $str1 = "abcd", $str2 = "cdba"
Output: true
"abcd"
split: ["ab", "cd"]
swap: ["cd", "ab"]
split: ["cd", ["a", "b"]]
swap: ["cd", ["b", "a"]]
concatenate: "cdba"
Input: $str1 = "hello", $str2 = "hiiii"
Output: false
A fundamental rule of scrambled strings is that they must be anagrams.
Input: $str1 = "ateer", $str2 = "eater"
Output: true
"ateer"
split: ["ate", "er"]
split: [["at", "e"], "er"]
swap: [["e", "at"], "er"]
concatenate: "eater"
Input: $str1 = "abcd", $str2 = "bdac"
Output: false
Approach
Effectively, this is a tree traversal algorithm. The parent node is the $str1, and each time we split the string or swap pieces, we’re creating child nodes of that parent. The trick is to pass along a copy of $str2 each time we make a call so we can tell when we’ve found a stopping point for that branch.
I also found out there are sometimes more ways to get to the same conclusion. I wasn’t able to get my debugging output to look the same say Roger Bell West (the author of this challenge) did, but I did notice that my algorithm got to the same solution for example 2 through a different path:
"abcd"
split: ["a", "bcd"] (c, dba)
swap: ["bcd", "a"]
split: ["b", "cd"] (c, db)
swap: ["cd", "b"]
concatenate: "cdb"
concatenate: "cdba"
Raku
I made a utility function to divide the string into two parts because I wanted that operation to read more easily in the main scramble function.
sub divide($string, $div) {
my $end = $string.chars - 1;
return (
$string.substr(0..$div),
$string.substr($div+1..$end)
);
}
sub scramble($source, $target) {
# If the string consists of only one character,
# return the string
return $source if $source.chars == 1;
# if the string equals the target we're seeking,
# return the string
return $source if $source eq $target;
my ($s1, $s2, $t1, $t2);
for 0 .. $source.chars - 2 -> $div {
# Divide the string X into two non-empty parts
($s1, $s2) = divide($source, $div);
($t1, $t2) = divide($target, $div);
# exchange the order of those parts unless one part
# matches the target
unless ($s1 eq $t1 || $s2 eq $t2) {
($s2, $s1) = ($s1, $s2);
# adjust the target split as well
($t1, $t2) = divide($target, $s1.chars-1);
}
# scramble each of those parts unless all parts
# match their targets
($s1, $s2) = (scramble($s1, $t1), scramble($s2, $t2))
unless ($s1 eq $t1 && $s2 eq $t2);
last if $s1 ~ $s2 eq $target;
}
# Concatenate the scrambled parts to return a single string.
return $s1 ~ $s2;
}
sub is_scrambled($source, $target) {
scramble($source, $target) eq $target;
}View the entire Raku script for this task on GitHub.
$ raku/ch-2.raku | less
Example 1:
Input: $str1 = "abc", $str2 = "acb"
Output: True
Example 2:
Input: $str1 = "abcd", $str2 = "cdba"
Output: True
Example 3:
Input: $str1 = "hello", $str2 = "hiiii"
Output: False
Example 4:
Input: $str1 = "ateer", $str2 = "eater"
Output: True
Example 5:
Input: $str1 = "abcd", $str2 = "bdac"
Output: FalsePerl
Making the utility function divide made translating this from Raku to Perl easier, because I only had to worry about the differences between Perl’s substr and Raku’s .substr in one place.
sub divide($string, $div) {
return (
substr($string, 0, $div+1),
substr($string, $div+1)
);
}
sub scramble($source, $target) {
# If the string consists of only one character,
# return the string
return $source if length($source) == 1;
# if the string equals the target we're seeking,
# return the string
return $source if $source eq $target;
my ($s1, $s2, $t1, $t2);
foreach my $div (0 .. length($source) - 2) {
# Divide the string X into two non-empty parts
($s1, $s2) = divide($source, $div);
($t1, $t2) = divide($target, $div);
# exchange the order of those parts unless one part
# matches the target
unless ($s1 eq $t1 || $s2 eq $t2) {
($s2, $s1) = ($s1, $s2);
# adjust the target split as well
($t1, $t2) = divide($target, length($s1)-1);
}
# scramble each of those parts unless all parts
# match their targets
($s1, $s2) = (scramble($s1, $t1), scramble($s2, $t2))
unless ($s1 eq $t1 && $s2 eq $t2);
last if $s1 . $s2 eq $target;
}
# Concatenate the scrambled parts to return a single string.
return $s1 . $s2;
}
sub is_scrambled($source, $target) {
scramble($source, $target) eq $target ? 'True' : 'False';
}View the entire Perl script for this task on GitHub.
Python
The tricky part going from Raku to Python was remembering all the places I needed to adjust the boundaries: when slicing the string, it was string[div:], not string[div+1:], and I needed to remember range(1, N) produced numbers 1 through N-1. The one that had me the longest was on line 29, where I initially had len(s1)-1 and only example 4 was providing the wrong result.
def divide(string, div):
return (
string[0:div],
string[div:]
)
def scramble(source, target):
# If the string consists of only one character,
# return the string
if len(source) == 1: return source
# if the string equals the target we're seeking,
# return the string
if source == target: return source
s1, s2 = "", ""
for div in range(1, len(source)):
# Divide the string X into two non-empty parts
s1, s2 = divide(source, div)
t1, t2 = divide(target, div)
# exchange the order of those parts unless one part
# matches the target
if not (s1 == t1 or s2 == t2):
s2, s1 = s1, s2
# adjust the target split as well
t1, t2 = divide(target, len(s1))
# scramble each of those parts unless all parts
# match their targets
if not (s1 == t1 and s2 == t2):
s1, s2 = scramble(s1, t1), scramble(s2, t2)
if s1 + s2 == target: break
# Concatenate the scrambled parts to return a single string.
return s1 + s2
def is_scrambled(source, target):
return scramble(source, target) == targetView the entire Python script for this task on GitHub.
Elixir
The fun part about doing this in Elixir was figuring out how to loop from 1 -> length(source) with the ability to bail out of the loop early. As always, this probably means I should have a recursive function doing the loop, and I just don’t call the next iteration if I’ve satisfied my stopping condition.
def divide(string, div) do
[
String.slice(string, 0..div-1),
String.slice(string, div..100)
]
end
def divide_at(div, source, len, _) when div == len, do: source
def divide_at(div, source, len, target) do
# Divide the string X into two non-empty parts
[s1, s2] = divide(source, div)
[t1, t2] = divide(target, div)
# exchange the order of those parts unless one part
# matches the target
[s1, s2, t1, t2] = if not (s1 == t1 or s2 == t2) do
[s2, s1] ++ divide(target, String.length(s2))
else
[s1, s2, t1, t2]
end
# scramble each of those parts unless all parts
# match their targets
{s1, s2} = if not (s1 == t1 and s2 == t2) do
{
scramble(s1, String.length(s1), t1),
scramble(s2, String.length(s2), t2)
}
else
{s1, s2}
end
if s1 <> s2 == target do
s1 <> s2
else
divide_at(div+1, source, len, target)
end
end
# If the string consists of only one character,
# return the string
def scramble(source, len, _) when len == 1, do: source
# if the string equals the target we're seeking,
# return the string
def scramble(s, _, t) when s == t, do: s
def scramble(source, len, target) do
divide_at(1, source, len, target)
end
def is_scrambled(source, target) do
scramble(source, String.length(source), target) == target
endView the entire Elixir script for this task on GitHub.
Here’s all my solutions in GitHub: https://github.com/packy/perlweeklychallenge-club/tree/challenge-370-packy-anderson/challenge-370/packy-anderson