Perl Weekly Challenge: What’s the Frequency, Kenneth?

There was no Perl Weekly Challenge last week so we jump from 231 to 233!

Both tasks this week deal with accepting lists of items and then manipulating those lists.


Task 1: Similar Words

You are given an array of words made up of alphabets only.

Write a script to find the number of pairs of similar words. Two words are similar if they consist of the same characters.

Example 1

Input: @words = ("aba", "aabb", "abcd", "bac", "aabc")
Output: 2

Pair 1: similar words ("aba", "aabb")
Pair 2: similar words ("bac", "aabc")

Example 2

Input: @words = ("aabb", "ab", "ba")
Output: 3

Pair 1: similar words ("aabb", "ab")
Pair 2: similar words ("aabb", "ba")
Pair 3: similar words ("ab", "ba")

Example 3

Input: @words = ("nba", "cba", "dba")
Output: 0

So what we’re looking for is a way to break down words into a sorted set of the characters that occur in that word so we can use that list to compare whether the words use only the same characters:

sub charsInWord {
    my $word = shift;
    # split the word into characters, then map those characters
    # to a hash
    my %charset = map { $_ => 1 } split //, $word;
    # return the set of characters as a string, sorted
    return join q{}, sort keys %charset;
}

We can then use those character set strings as the keys to a hash. By storing each word in an array referenced in a hash by the character set for that word:

my %similar;
foreach my $word ( @words ) {
  my $charset = charsInWord($word);
  push @{ $similar{$charset} }, $word;
}

Now, you might wonder why I’m not checking to see if $similar{$charset} exists already or has an array reference before just pushing a value there. It’s all through the magic of a feature of Perl called autovivification. When we attempt to access the hash %similar using the key $charset, if that key doesn’t exist, Perl will automatically create it and make it undefined. Similarly, when we try to push a value onto an array reference in a variable that is currently undefined, Perl creates an array reference and populates the variable with it. So when %similar is empty and I say
push @{ $similar{$charset} }, $word; the entry in %similar for $charset winds up containing a reference to an array with one entry: $word.

Autovivification can cause problems in your code, because undefined values in a hash can get autovivified just by referencing their keys, so if I’m ever testing to see if something is defined, I always check to see if the key exists in the array using exists rather than just testing the value of $hash{$key}. However, in this case, I’m putting values into a hash or into an array reference, so if they don’t already exist, I want to create them. If Perl didn’t have autovivification, I’d have to do this:

# if this is the first time we've seen this charset,
# create an empty arrayref to store the word in
$similar{$charset} = [] if ! exists $similar{$charset};

# append the word to the list for this charset
push @{ $similar{$charset} }, $word;

The next bit of the problem that I noticed on carefully reading the examples is it doesn’t just want a list of words using the same character set: it wants pairs of words using the same character set. So we need to take our list of words using the same character set and present it as pairs.

A little thought produced the algorithm for this. Given the list of words A, B, C, D, the list can be broken down into pairs by taking the first word off the list (A), pairing it with each of the remaining words (AB, AC, AD), then repeating the process with the shortened list (B, C, D) until we ran out of words (BC, BD, CD). This is easily done in Perl with a pair of loops:

my @pairs;
while ( scalar(@list) >= 2 ) {
  my $first = shift @list; # remove the first element
  foreach my $second ( @list ) {
    push @pairs, [ $first, $second ];
  }
}

And that pretty much does all the heavy lifting for this problem. The rest is presentation:

#!/usr/bin/env perl
  
use v5.38;

sub charsInWord {
  my $word = shift;
  # split the word into characters, then map those
  # characters to a hash so we only have unique ones
  my %charset = map { $_ => 1 } split //, $word;
  # return the set of characters as a string, sorted
  return join q{}, sort keys %charset;
}

sub findSimilarWordPairs {
  my @words = @_;

  # get the set of characters in each word,
  # store each word in an array reference under
  # the hash key for its character set
  my %similar;
  foreach my $word ( @words ) {
    my $charset = charsInWord($word);
    # if $similar{$charset} is undefined when we
    # try to use it as an array reference to store
    # a value, Perl will "autovivify" a reference
    # to an empty array
    push @{ $similar{$charset} }, $word;
  }

  # filter out character sets that only have one word
  my @multiples = grep {
    # only allow letter sets
    # that have more than one word
    scalar( @{ $similar{$_} } ) > 1
  } keys %similar;

  # make pairs by looping over the list
  # of letter sets that had multiple entries
  my @pairs;
  foreach my $charset ( @multiples ) {
    my @list = @{ $similar{$charset} };

    while ( scalar(@list) >= 2 ) {
      # remove the first word from the list of words
      my $first = shift @list;
      # pair it with each of the remaining words
      foreach my $second ( @list ) {
        push @pairs, [ $first, $second ];
      }
    }
  }
  return @pairs;
}

sub solution {
  my @words = @_;
  say 'Input: @words = ("' . join('", "', @words) . '")';

  my @pairs = findSimilarWordPairs(@words);

  say 'Output: ' . scalar(@pairs);
  my $count = 0;
  foreach my $pair ( @pairs ) {
    say "" if $count == 0;
    say 'Pair ' . ++$count . ': similar words ("'
      . join('", "', @$pair) . '")';
  }
}

say "Example 1:";
solution("aba", "aabb", "abcd", "bac", "aabc");

say "";
say "Example 2:";
solution("aabb", "ab", "ba");

say "";
say "Example 3:";
solution("nba", "cba", "dba");

In the Raku version, some of the language features allowed me to make some different choices:

sub charsInWord(Str $word) {
  # split the word into characters, then use the Raku
  # array method unique to have each character appear once.
  return $word.split('').unique.sort.join;
}

Raku having a .unique method on the array class (really, the Any class) meant I didn’t need to use a hash to get only the unique characters. Autovivification works much the same, however:

my %similar;
for @words -> $word {
  my $charset = charsInWord($word);
  %similar{$charset}.push($word);
}

But then I ran into a problem when I was trying to make the pairs. I wanted to make a copy of the list of similar words so I could modify it, but when I had the assignment my @list = %similar{$charset}, what I got wasn’t what I expected: instead of the elements of the list pointed to by %similar{$charset} being assigned to @list, I got the list itself assigned as the first element of @list. I needed a way to say “return the elements in this list” instead of “return this list”. Unfortunately, the method that feels right for this, .elems, just returns the count of elements, not the elements themselves. I wound up using the .splice method to return a list of all the elements in the array.

So here’s the Raku version:

#!/usr/bin/env raku
  
use v6;

sub charsInWord(Str $word) {
  # split the word into characters, then use the Raku
  # array method unique to have each character appear once.
  return $word.split('').unique.sort.join;
}

sub findSimilarWordPairs(*@words where ($_.all ~~ Str)) {
  my %similar;
  for @words -> $word {
    my $charset = charsInWord($word);
    %similar{$charset}.push($word);
  }

  # filter out character sets that only have one word
  my @multiples = %similar.keys.grep: {
    %similar{$_}.elems > 1
  };

  # make pairs by looping over the list
  # of letter sets that had multiple entries
  my @pairs;
  for @multiples -> $charset {
    # if we assign @list = %similar{$charset}, we get
    # an array with a single element, an array object.
    # By using .splice, I can get all the elements in 
    # the array object assigned to @list
    my @list = %similar{$charset}.splice(0, *);

    while ( @list.elems >= 2 ) {
      # remove the first word from the list of words
      my $first = @list.shift;
      # pair it with each of the remaining words
      for @list -> $second {
        @pairs.push([ $first, $second ]);
      }
    }
  }
  return @pairs;
}

sub solution {
  my @words = @_;
  say 'Input: @words = ("' ~ @words.join('", "') ~ '")';

  my @pairs = findSimilarWordPairs(@words);

  say 'Output: ' ~ @pairs.elems;
  my $count = 0;
  for @pairs -> $pair {
    say "" if $count == 0;
    say 'Pair ' ~ ++$count ~ ': similar words ("'
      ~ $pair.join('", "') ~ '")';
  }
}

say "Example 1:";
solution("aba", "aabb", "abcd", "bac", "aabc");

say "";
say "Example 2:";
solution("aabb", "ab", "ba");

say "";
say "Example 3:";
solution("nba", "cba", "dba");

Task 2: Frequency Sort

You are given an array of integers.

Write a script to sort the given array in increasing order based on the frequency of the values. If multiple values have the same frequency then sort them in decreasing order.

Example 1

Input: @ints = (1,1,2,2,2,3)
Ouput: (3,1,1,2,2,2)

'3' has a frequency of 1
'1' has a frequency of 2
'2' has a frequency of 3

Example 2

Input: @ints = (2,3,1,3,2)
Ouput: (1,3,3,2,2)

'2' and '3' both have a frequency of 2, so they are sorted in decreasing order.

Example 3

Input: @ints = (-1,1,-6,4,5,-6,1,4,1)
Ouput: (5,-1,4,4,-6,-6,1,1,1)

Ok, the lack of explanatory English text at the end of Example 3 and the lack '1' has a frequency of 1 of at the end of Example 2 makes me believe that Mohammad isn’t expecting that text to be part of the output. I’m including it anyway. 😉

This is the simpler of the two problems. First, we just need to count up how often we see each integer:

my %counts;
foreach my $int ( @ints ) {
  $counts{$int}++;
}

Easy-peasy. Then, much like with the last task, we use a hash of arrays to group together integers that have the same frequency:

my %frequency;
foreach my $int ( keys %counts ) {
  push @{ $frequency{ $counts{$int} } }, $int;
}

Then, putting those integers back into an output array in the proper order:

my @output;
foreach my $freq ( sort keys %frequency ) {
  # get each integer for this frequency in descending order
  foreach my $int ( reverse sort @{ $frequency{$freq} } ) {
    # we need to put the integer on the list $freq times
    foreach ( 1 .. $freq ) {
      push @output, $int;
    }
  }
}

Once I add in all the stuff to print the English output after the required array output, and the boilerplate to echo the input, we get this:

#!/usr/bin/env perl
  
use v5.38;

use Lingua::EN::Inflexion qw( wordlist );

sub solution {
  my @ints = @_;
  say 'Input: @ints = (' . join(', ', @ints) . ')';

  # count how often each integer occurs
  my %counts;
  foreach my $int ( @ints ) {
    $counts{$int}++;
  }

  # now create a hash of arrays listing which integers
  # occur at what frequencies
  my %frequency;
  foreach my $int ( keys %counts ) {
    push @{ $frequency{ $counts{$int} } }, $int;
  }

  my @output;
  my $text;
  foreach my $freq ( sort keys %frequency ) {
    my @list = @{ $frequency{$freq} };
    # get each integer for this frequency in descending order
    foreach my $int ( reverse sort @list ) {
      # we need to put the integer on the list $freq times
      foreach ( 1 .. $freq ) {
        push @output, $int;
      }
    }
    # now let's do the English description of the output.
    # have the integers in ascending order in the text,
    # and wrap them in quotes
    @list = map { "'$_'" } sort @list;
    if (@list == 1) {
      $text .= $list[0] . " has a frequency of $freq\n";
    }
    else {
      $text .= wordlist(@list);
      if (@list == 2) {
        $text .= ' both';
      }
      $text .= " have a frequency of $freq, "
            .  "so they are sorted in decreasing order\n";
    }
  }

  say "Output: (" . join(', ', @output) . ")";
  say "\n$text";
}

say "Example 1:";
solution(1,1,2,2,2,3);

say "";
say "Example 2:";
solution(2,3,1,3,2);

say "";
say "Example 3:";
solution(-1,1,-6,4,5,-6,1,4,1);

Producing the output

$ perl/ch-2.pl
Example 1:
Input: @ints = (1, 1, 2, 2, 2, 3)
Output: (3, 1, 1, 2, 2, 2)

'3' has a frequency of 1
'1' has a frequency of 2
'2' has a frequency of 3


Example 2:
Input: @ints = (2, 3, 1, 3, 2)
Output: (1, 3, 3, 2, 2)

'1' has a frequency of 1
'2' and '3' both have a frequency of 2, so they are sorted in decreasing order


Example 3:
Input: @ints = (-1, 1, -6, 4, 5, -6, 1, 4, 1)
Output: (5, -1, 4, 4, -6, -6, 1, 1, 1)

'-1' and '5' both have a frequency of 1, so they are sorted in decreasing order
'-6' and '4' both have a frequency of 2, so they are sorted in decreasing order
'1' has a frequency of 3

The Raku version didn’t change very much:

#!/usr/bin/env raku
  
use v6;

use Lingua::Conjunction;

sub solution (*@ints where {$_.all ~~ Int}) {
  say 'Input: @ints = (' ~ @ints.join(', ') ~ ')';

  # count how often each integer occurs
  my %counts;
  for @ints -> $int {
    %counts{$int}++;
  }

  # now create a hash of arrays listing which integers
  # occur at what frequencies
  my %frequency;
  for %counts.keys -> $int {
    %frequency{ %counts{$int} }.push($int);
  }

  my @output;
  my $text;
  for %frequency.keys.sort -> $freq {
    my @list = %frequency{$freq}.splice(0, *);
    # get each integer for this frequency in descending order
    for @list.sort.reverse -> $int {
      # we need to put the integer on the list $freq times
      @output.append($int xx $freq);
    }
    # now let's do the English description of the output.
    # have the integers in ascending order in the text,
    # and wrap them in quotes
    @list = @list.sort.map: { "'$_'" };
    if (@list.elems == 1) {
      $text ~= @list[0] ~ " has a frequency of $freq\n";
    }
    else {
      $text ~= conjunction @list;
      if (@list.elems == 2) {
        $text ~= ' both';
      }
      $text ~= " have a frequency of $freq, "
            ~  "so they are sorted in decreasing order\n";
    }
  }

  say "Output: (" ~ @output.join(', ') ~ ")";
  say "\n$text";
}

say "Example 1:";
solution(1,1,2,2,2,3);

say "";
say "Example 2:";
solution(2,3,1,3,2);

say "";
say "Example 3:";
solution(-1,1,-6,4,5,-6,1,4,1);

It does, however use the really cool xx operator that does sort of what x does, except for arrays instead of strings. If you execute say 'a' x 5; in Raku (or in Perl), you’ll get the output aaaaa. But if you execute say 'a' xx 5; in Raku, you’ll get (a a a a a).

Also, if I use .push() to put the elements into @output, I’d wind up pushing the arrays themselves into @output and get output like this:
Output: (3, 1 1, 2 2 2)

By using .append(), I was able to append the individual integers to @output and wind up with output like this:
Output: (3, 1, 1, 2, 2, 2)


I’ve also decided that I’m going to start adding more solutions in what the challenge calls “Guest Languages”… namely, anything that isn’t Perl or Raku. This week, I’m adding solutions in another language I know: Python. I want to pick up more languages so I’m more employable, and as I learn them I’ll be adding them to this exercise.


Here’s my solutions in GItHub: https://github.com/packy/perlweeklychallenge-club/tree/master/challenge-233/packy-anderson