Perl Weekly Challenge: Odd Char Seems to be the Most Frequent Word

As usual, the title and the musical theme is pretty much word salad from the two problem titles. Tonight, the musical accompaniment is Diana Krall’s cover of Elton John’s Sorry Seems to be the Hardest Word.

So, onward to Perl Weekly Challenge 0xFF… er, 255!

Task 1: Odd Character

You are given two strings, $s and $t. The string $t is generated using the shuffled characters of the string $s with an additional character.

Write a script to find the additional character in the string $t..

Example 1

Input: $s = "Perl" $t = "Preel"
Output: "e"

Example 2

Input: $s = "Weekly" $t = "Weeakly"
Output: "a"

Example 3

Input: $s = "Box" $t = "Boxy"
Output: "y"

Approach

Ok, this is essentially a problem of finding the difference between two sets of characters. One of the thoughts I had was to just plow through the second string comparing characters with the first string, but the problem specifies the second string is “generated using the shuffled characters” of the first, so we need to assume the characters will be in a different order (even though in the examples only one is).

So I figure what I’ll do is this: count each of the characters using a hash. Then when I go through the second string, I’ll check to see if the count for that character exists; if it doesn’t, we’ve found the added character. If it does, we decrement the count for that character and remove it from the hash if the count is now 0, and then move on to the next character. Of course, because the first string is shuffled to create the second string, but both are presented in title case, we want to lower case the characters before we process them.

Raku

Once again, I needed to remind myself that Raku is not Perl. I remembered that I couldn’t test for the existence of entries in a hash with exists and remove entries from a hash with delete the same way I could in Perl; fortunately, I only had to go back to my work on PWC 242 to find the Subscript Adverb :exists and my work on PWC 237 for the Subscript Adverb :delete.

sub oddChar($s, $t) {
  # count the characters in $s
  my %count;
  for $s.lc.split('', :skip-empty) -> $c {
    %count{$c}++;
  }
  # find the character in $t that's been added to $s
  for $t.lc.split('', :skip-empty) -> $c {
    if ( %count{$c}:!exists ) {
      # we found the added character!
      return $c
    }
    %count{$c}--;
    %count{$c}:delete if %count{$c} == 0;
  }
  die "No odd character found!";
}

View the entire Raku script for this task on GitHub.

Perl

But when I went from Raku to Perl, it felt like Raku really was Perl, because other than flipping for to foreach, reformatting the split and lc usage, changing the sigils when accessing hash elements, and changing exists and delete from adverbs to built-in functions, everything else remained the same.

sub oddChar($s, $t) {
  # count the characters in $s
  my %count;
  foreach my $c ( split(//, lc($s)) ) {
    $count{$c}++;
  }
  # find the character in $t that's been added to $s
  foreach my $c ( split(//, lc($t)) ) {
    if ( ! exists $count{$c} ) {
      # we found the added character!
      return $c
    }
    $count{$c}--;
    delete $count{$c} if $count{$c} == 0;
  }
  die "No odd character found!";
}

View the entire Perl script for this task on GitHub.

Python

The Python solution is made easier by being able to just loop over the characters in a string without having to split them up. As I did back in PWC 234, PWC 247, and PWC 249, I’m using Python’s  Counter type in the collections module that lets you autovivify elements in a dictionary by adding to them.

import sys
from collections import Counter

def oddChar(s, t):
    # count the characters in s
    count = Counter()
    for c in s.lower():
        count[c] += 1

    # find the character in t that's been added to s
    for c in t.lower():
        if c not in count:
            # we found the added character!
            return c
        count[c] -= 1
        if count[c] == 0:
            del count[c]

    sys.exit("No odd character found!")

View the entire Python script for this task on GitHub.

In none of the examples do we not find the odd character, but I felt it necessary to throw some kind of exception if we got to the end of oddChar() without having returned anything.


Task 2: Most Frequent Word

You are given a paragraph $p and a banned word $w.

Write a script to return the most frequent word that is not banned.

Example 1

Input: $p = "Joe hit a ball, the hit ball flew far after it was hit."
       $w = "hit"
Output: "ball"

The banned word "hit" occurs 3 times.
The other word "ball" occurs 2 times.

Example 2

Input: $p = "Perl and Raku belong to the same family. Perl is the most popular language in the weekly challenge."
       $w = "the"
Output: "Perl"

The banned word "the" occurs 3 times.
The other word "Perl" occurs 2 times.

Approach

Once again, this challenge is all about counting, and once again, I’m going to use a hash/dictionary.

Raku

While writing this, I noticed that I wasn’t getting the same count in the first example as the sample output, and I realized because it was counting “hit.” separately from “hit”.

Example 1:
Input: $p = "Joe hit a ball, the hit ball flew far after it was hit."
       $w = "hit"
Output: "hit."

The banned word "hit" occurs 2 times.
The other word "hit." occurs 1 time.

So I decided that I needed to remove punctuation from the paragraph before splitting it into words.

The other challenging part was remembering how Raku regexes did substitutions and negated character classes.

sub frequentWord($p, $w) {
  # assign the value to a container so we can modify it
  my $paragraph = $p;

  # first, remove punctuation from the paragraph
  # (but we'll leave apostrophes for contractions)
  $paragraph ~~ s:i:g/<-[a..z'\s]>//;

  # count the words in $paragraph
  my %count;
  for $paragraph.lc.split(' ', :skip-empty) -> $pw {
    %count{$pw}++;
  }

  # generate the output about the banned word
  my $bannedCount = %count{$w} // 0;
  my $output = qq/The banned word "$w" occurs $bannedCount /;
  $output ~= ($bannedCount == 1) ?? 'time' !! 'times';
  $output ~= ".\n";

  # delete the banned word from the word count
  %count{$w}:delete;

  # now find the most frequent word left and report on that
  my @sorted = %count.keys.sort: {
    # sort by count
    %count{$^b} <=> %count{$^a}
  };
  my $freqWord  = @sorted[0];
  my $freqCount = %count{$freqWord};

  $output ~= qq/The other word "$freqWord" /;
  $output ~= qq/occurs $freqCount /;
  $output ~= ($freqCount == 1) ?? 'time.' !! 'times.';

  return $freqWord, $output;
}
Example 1:
Input: $p = "Joe hit a ball, the hit ball flew far after it was hit."
       $w = "hit"
Output: "ball"

The banned word "hit" occurs 3 times.
The other word "ball" occurs 2 times.

View the entire Raku script for this task on GitHub.

Perl

The Perl version was easier, because I’m very familiar with Perl regexes and I didn’t have to worry about reassigning the $p parameter to something I could write to.

sub frequentWord($p, $w) {
  # first, remove punctuation from the paragraph
  # (but we'll leave apostrophes for contractions)
  $p =~ s/[^a-z'\s]//ig;

  # count the words in $paragraph
  my %count;
  foreach my $pw ( split(/\s+/, lc($p)) ) {
    $count{$pw}++;
  }

  # generate the output about the banned word
  my $bannedCount = $count{$w} // 0;
  my $output = qq/The banned word "$w" occurs $bannedCount /;
  $output .= ($bannedCount == 1) ? 'time' : 'times';
  $output .= ".\n";

  # delete the banned word from the word count
  delete $count{$w};

  # now find the most frequent word left and report on that
  my @sorted = sort {
    # sort by count
    $count{$b} <=> $count{$a}
  } keys %count;
  my $freqWord  = $sorted[0];
  my $freqCount = $count{$freqWord};

  $output .= qq/The other word "$freqWord" /;
  $output .= qq/occurs $freqCount /;
  $output .= ($freqCount == 1) ? 'time.' : 'times.';

  return $freqWord, $output;
}

View the entire Perl script for this task on GitHub.

Python

And, just like I did back in PWC 247 and PWC 253, I’d like to point out that Python’s Decorate-Sort-Undecorate idiom is really just a Schwartzian Transformation.

import re
from collections import Counter

def frequentWord(p, w):
    # first, remove punctuation from the paragraph
    # (but we'll leave apostrophes for contractions)
    p = re.sub(r'[^a-z\'\s]', '', p.lower())

    # count the words in $paragraph
    count = Counter()
    for pw in p.split():
        count[pw] += 1

    # generate the output about the banned word
    bannedCount = count[w] if w in count else 0
    output = f'The banned word "{w}" occurs {bannedCount} ';
    output += 'time' if bannedCount == 1 else 'times'
    output += ".\n"

    # delete the banned word from the word count
    del count[w]

    # now find the most frequent word left and report on that
    decorated = [ ( count[w], w ) for w in count.keys() ]
    sorted_tuples = sorted(
        decorated,
        # the - before the first element sorts descending
        key=lambda k: -k[0]
    )
    freqWord = sorted_tuples[0][1]
    freqCount = count[freqWord]

    output +=  f'The other word "{freqWord}" '
    output += f'occurs {freqCount} '
    output += 'time.' if freqCount == 1 else 'times.'

    return (freqWord, output)

View the entire Python script for this task on GitHub.


Here’s all my solutions in GItHub: https://github.com/packy/perlweeklychallenge-club/tree/master/challenge-255/packy-anderson