Perl Weekly Challenge: Two out of Three Ain’t Lexicographically Bad

Another week, time for another Weekly Perl Challenge!

Task 1: Lexicographic Order

You are given an array of strings.

Write a script to delete element which is not lexicographically sorted (forwards or backwards) and return the count of deletions.

Example 1

Input: @str = ("abc", "bce", "cae")
Output: 1

In the given array "cae" is the only element which is not lexicographically sorted.

Example 2

Input: @str = ("yxz", "cba", "mon")
Output: 2

In the given array "yxz" and "mon" are not lexicographically sorted.

I had to look up when “Lexicographic Order” was on the off chance that it wasn’t what I thought it was. Essentially, it means it’s sorted alphabetically. That was pretty obvious from the examples, so I just dove in:

#!/usr/bin/env perl

use v5.38;

sub quoted_list {
  # given a list, quote the elements and join them with commas
  my @quoted = map { qq{"$_"} } @_;
  return join q{, }, @quoted;
}

sub quoted_english_list {
  # given a list, quote the elements and join them 
  # in a way that makes sense to english speakers
  my @quoted = map { qq{"$_"} } @_;
  my $last = pop @quoted; # last element in array
  if (@quoted == 0) {
    # using an array in a scalar context returns
    # the number of elements in the array

    # there was only one element in the list
    return $last;
  }
  my $joined = join q{, }, @quoted;
  if (@quoted > 1) {
    # if there's more than element, add an Oxford comma
    $joined .= q{,};
  }
  return "$joined and $last";
}

sub is_lexically_sorted {
  my $input = shift @_;

  # get the characters in the input string
  my @characters = split //, $input;

  # generate a string of the characters sorted ascending
  # (with case folding)
  my $forwards  = join q{}, sort {
    fc($a) cmp fc($b)
  } @characters;

  # generate a string of the characters sorted descending
  # (with case folding)
  my $backwards = join q{}, sort {
    fc($b) cmp fc($a)
  } @characters;

  # if the input string is matches either sorted string,
  # then return true
  return( $input eq $forwards || $input eq $backwards );
}

sub solution {
  my @str = @_;
  say "Input: \@str = (" . quoted_list(@str) . ")";

  my @not_lexically_sorted = grep {
    ! is_lexically_sorted($_)
  } @str;

  say "Output: " . scalar(@not_lexically_sorted);
  say "";

  if (@not_lexically_sorted == 0) {
    say "In the given array all elements are"
      . " lexicographically sorted.";
  }
  elsif (@not_lexically_sorted == 1) {
    say "In the given array "
      . quoted_list(@not_lexically_sorted)
      . " is the only element which is not"
      . " lexicographically sorted.";
  }
  else {
    say "In the given array "
      . quoted_english_list(@not_lexically_sorted)
      . " are not lexicographically sorted.";
  }
}

say "Example 1:";
solution("abc", "bce", "cae");

say "";

say "Example 2:";
solution("yxz", "cba", "mon");

I added a bunch of extra subroutines to make the code more readable: quoted_list and quoted_english_list let me just say how I want to display the list, rather than repeating the code every time I want to display it. And the is_lexically_sorted function make the grep that I’m using to determine which array elements aren’t lexicographically sorted more readable as well. Whether it’s Perl or not, sometimes it’s just good coding practice to pull out pieces of your code that represent a concept and make them their own function, even if they’re only being used in only one place, because it just makes the code conceptually easier to understand.

The Raku version

#!/usr/bin/env raku

use v6;

sub quoted_list ( *@list ) {
  # given a list, quote the elements and join them with commas
  my @quoted = @list.map: { qq{"$_"} };
  return @quoted.join(q{, });
}

sub quoted_english_list ( *@list ) {
  # given a list, quote the elements and join them 
  # in a way that makes sense to english speakers
  my @quoted = @list.map: { qq{"$_"} };
  my $last = @quoted.pop(); # last element in array
  if (@quoted == 0) {
    # using an array in a scalar context returns
    # the number of elements in the array

    # there was only one element in the list
    return $last;
  }
  my $joined = join q{, }, @quoted;
  if (@quoted > 1) {
    # if there's more than element, add an Oxford comma
    $joined ~= q{,};
  }
  return "$joined and $last";
}

sub is_lexically_sorted ($input) {
  # get the characters in the input string
  # putting $input in quotes casts it as a Str
  my @characters = "$input".split("", :skip-empty);

  # sort the characters ascending
  my @forwards  = @characters.sort: { $^a.fc cmp $^b.fc };

  # sort the characters descending
  my @backwards = @characters.sort: { $^b.fc cmp $^a.fc };

  # if the input string is matches either sorted string,
  # then return true
  return( $input eq @forwards.join("")
          ||
          $input eq @backwards.join("") );
}

sub solution (*@str) {
  say "Input: \@str = (" ~ quoted_list(@str) ~ ")";

  my @not_lexically_sorted = @str.grep({
    !is_lexically_sorted($_)
  });

  say "Output: " ~ @not_lexically_sorted.elems;
  say "";

  if (@not_lexically_sorted.elems == 0) {
    say "In the given array all elements are"
      ~ " lexicographically sorted.";
  }
  elsif (@not_lexically_sorted.elems == 1) {
    say "In the given array "
      ~ quoted_list(@not_lexically_sorted)
      ~ " is the only element which is not"
      ~ " lexicographically sorted.";
  }
  else {
    say "In the given array "
      ~ quoted_english_list(@not_lexically_sorted)
      ~ " are not lexicographically sorted.";
  }
}

say "Example 1:";
solution("abc", "bce", "cae");

say "";

say "Example 2:";
solution("yxz", "cba", "mon");

This is mostly like the Perl solution above, but I decided to play around a little with slurpy parameters in my function signatures.

Task 2: Two out of Three

You are given three array of integers.

Write a script to return all the elements that are present in at least 2 out of 3 given arrays.

Example 1

Input: @array1 = (1, 1, 2, 4)
       @array2 = (2, 4)
       @array3 = (4)
Ouput: (2, 4)

Example 2

Input: @array1 = (4, 1)
       @array2 = (2, 4)
       @array3 = (1, 2)
Ouput: (1, 2, 4)

Perl version:

#!/usr/bin/env perl

use v5.38;

# function to return unique elements in array
use List::Util qw( uniq );

sub display_array {
  return "(" . join(q{, }, @_) . ")";
}

sub solution {
  my @arrays = @_;
  say "Input: \@array1 = " . display_array( @{ $arrays[0] } );
  say "       \@array2 = " . display_array( @{ $arrays[1] } );
  say "       \@array3 = " . display_array( @{ $arrays[2] } );

  # Return all the elements that are present in at least 2 out
  # of 3 given arrays.  In the sample input, there are arrays 
  # where there elements appear multiple times in a given
  # array, so we want to examine only UNIQUE elements
  my @unique;
  foreach my $arrayref ( @arrays ) {
    push @unique, [ uniq @$arrayref ];
  }

  # now that we have arrays of only unique elements, let's find
  # elements that occur in more than one array using a hash
  my %occurrences;
  foreach my $arrayref ( @unique ) {
    foreach my $element ( @$arrayref ) {
      $occurrences{$element}++;
    }
  }

  say "Output: " . display_array(
    sort # sort the resulting array of elements numerically
    grep {
      # only include elements that were counted more than once
      $occurrences{$_} > 1;
    } keys %occurrences
  );
}

say "Example 1:";
solution(
  [1, 1, 2, 4],
  [2, 4],
  [4]
);

say "";

say "Example 2:";
solution(
  [4, 1],
  [2, 4],
  [1, 2]
);

Raku version

#!/usr/bin/env raku

use v6;

sub display_array (@array) {
  return "(" ~ @array.join(q{, }) ~ ")";
}

sub solution (@array1, @array2, @array3) {
  say "Input: \@array1 = " ~ display_array(@array1);
  say "       \@array2 = " ~ display_array(@array2);
  say "       \@array3 = " ~ display_array(@array3);

  # Return all the elements that are present in at least 2 out
  # of 3 given arrays.  In the sample input, there are arrays
  # where there elements appear multiple times in a given
  # array, so we want to examine only UNIQUE elements, then
  # find elements that occur in more than one array using
  # a hash
  my %occurrences;
  for ( @array1.unique,
        @array2.unique,
        @array3.unique ).flat -> $element {
    %occurrences{$element}++;
  }

  say "Output: " ~ display_array(
    # only include elements that were counted more than once
    %occurrences.keys().grep: { %occurrences{$_} > 1 } 
  ).sort; # sort the resulting array of elements numerically
}

say "Example 1:";
solution(
  (1, 1, 2, 4),
  (2, 4),
  (4,)
);

say "";

say "Example 2:";
solution(
  (4, 1),
  (2, 4),
  (1, 2)
);

I want to point out my discovering the .flat method for Arrays.

Note that in the Perl version, I’m passing around array references to keep the three lists, separate, but in Raku, I’m able to make the three different parameters full-on arrays. Also, in Perl I had to pull in a function from a core module to get a list of unique elements in an array, but in Raku, the .unique method is provide on the base class Any.


Here’s my solutions in GitHub: https://github.com/manwar/perlweeklychallenge-club/tree/master/challenge-229/packy-anderson

Perl Weekly Challenge: Unique Sums and Empty Arrays

Another week, time for another Perl Weekly Challenge!

Task 1: Unique Sum

You are given an array of integers.

Write a script to find out the sum of unique elements in the given array.

Example 1

Input: @int = (2, 1, 3, 2)
Output: 4

In the given array we have 2 unique elements (1, 3).

Example 2

Input: @int = (1, 1, 1, 1)
Output: 0

In the given array no unique element found.

Example 3

Input: @int = (2, 1, 3, 4)
Output: 10

In the given array every element is unique.

The examples make what this challenge is looking for pretty clear. We find the unique elements in the array, and sum those up. I immediately thought of using a hash to accomplish the task:

# find the unique elements
my %unique;
foreach my $int ( @ints ) {
  $unique{$int}++;
}

# make a list of ONLY the unique ints
my @unique_ints = grep { $unique{$_} == 1 } @ints;

It’s a common use-case in Perl to use a hash to count how many times something occurs, whether it’s to only do something once or to actually count up occurrences.

I guess I could have populated the %unique hash via a map, but I wanted to keep what the code was doing obvious, and sometimes I think using a map just to execute code in the code block and not to return an array/hash can be confusing.

map { $unique{$_}++ } @ints;

The other thing I knew I wanted to do was show off some List::Util functions

use List::Util qw( sum );

# sum the unique elements
my $sum = sum(@unique_ints) // 0;

Sure, it would be easy enough to say

my $sum = 0;
foreach my $int ( @unique_ints ) {
  $sum += $int;
}

But sum makes it is a lot shorter. So here’s the entire script…

#!/usr/bin/env perl

use v5.38;

use List::Util qw( sum );

# just accept the list of integers on the command line
my @ints = @ARGV;

# find the unique elements
my %unique;
foreach my $int ( @ints ) {
  $unique{$int}++;
}

# make a list of ONLY the unique ints
my @unique_ints = grep { $unique{$_} == 1 } @ints;

# sum the unique elements
my $sum = sum(@unique_ints) // 0;

# produce the output
say "Input: \@int = (" . join(', ', @ints) . ")";
say "Output: $sum";
say "";

print "In the given array ";
if ( scalar(@unique_ints) == scalar(@ints) ) {
  say "every element is unique.";
}
elsif ( scalar(@unique_ints) == 0 ) {
  say "no unique element found.";
}
else {
  say "we have " . scalar(@unique_ints) . " unique elements ("
    . join(', ', @unique_ints) . ").";
}

As always, I started with my Perl script and made changes to make it valid Raku:

#!/usr/bin/env raku

use v6;

# just accept the list of integers on the command line
my @ints = @*ARGS;

# find the unique elements
my %unique;
for @ints -> $int {
  %unique{$int}++;
}

# make a list of ONLY the unique ints
my @unique_ints = grep { %unique{$_} == 1 }, @ints;

# sum the unique elements
my $sum = [+] @unique_ints;

# produce the output
say "Input: \@int = (" ~ @ints.join(', ') ~ ")";
say "Output: $sum";
say "";

print "In the given array ";
if ( @unique_ints.elems == @ints.elems ) {
  say "every element is unique.";
}
elsif ( @unique_ints.elems == 0 ) {
  say "no unique element found.";
}
else {
  say "we have " ~ @unique_ints.elems ~ " unique elements ("
    ~ @unique_ints.join(', ') ~ ").";
}

Now, the big decision I had to make was how to do the sum. I picked showing off Raku’s Reduction Metaoperator: [ ]. When you put an operator between square brackets and put that in front of a Raku Positional (like an Array), it turns the Positional into a single value by applying the operator to the first two elements, and then applying the operator to the result and the next element, and so on until the Positional has run out of elements. You can multiply all the elements of a Positional using [*], you can concatenate all the elements of a Positional using [~], There’s even a max infix operator that given two operands will return the larger of the two, and this can be applied to a Positional to find the largest value using [max].

But I could have used the .sum routine provided by Raku’s List class (which Arrays are a subclass of):

my $sum = @unique_ints.sum;

Task 2: Empty Array

You are given an array of integers in which all elements are unique.

Write a script to perform the following operations until the array is empty and return the total count of operations.

If the first element is the smallest then remove it otherwise move it to the end.

Example 1

Input: @int = (3, 4, 2)
Output: 5

Operation 1: move 3 to the end: (4, 2, 3)
Operation 2: move 4 to the end: (2, 3, 4)
Operation 3: remove element 2: (3, 4)
Operation 4: remove element 3: (4)
Operation 5: remove element 4: ()

Example 2

Input: @int = (1, 2, 3)
Output: 3

Operation 1: remove element 1: (2, 3)
Operation 2: remove element 2: (3)
Operation 3: remove element 3: ()

This time, the List::Util function I wanted to use was min:

#!/usr/bin/env perl

use v5.38;

use List::Util qw( min );

# just accept the list of integers on the command line
my @ints = @ARGV;

my @operations;
my $count = 1;
while ( scalar(@ints) > 0 ) {
  my $min = min @ints;

  # in either case, we're removing the first element from the list
  my $first = shift @ints;

  if ($min == $first) {
    # the first element is the minimum, discard it
    push @operations, "Operation $count: "
                    . "remove element $min: ("
                    . join(',', @ints) . ")";
  }
  else {
    # the first element is NOT the minimum, add it to the end
    push @ints, $first;
    push @operations, "Operation $count: "
                    . "move $first to the end: ("
                    . join(',', @ints) . ")";
  }
  $count++;
}

# produce the output
# let's use @ARGV again, since we modify @ints as we go along
say "Input: \@int = (" . join(', ', @ARGV) . ")";
say "Output: " . scalar(@operations);
say "";
say join "\n", @operations;

This also does an excellent job of showing off array operations: shift to remove the first element of an array, and push to append an element to the end of an array (though, I will admit I really like the way PHP allows you to append to the end of an array: $ints[] = $first).

At first, I was using $ints[0] to examine the first element in the array and then using shift to remove it and discard the value if the first element was the minimum value, and if it wasn’t, using shift to remove the first value and save itm like this:

if ($min == $ints[0]) {
  shift @ints;
  push @operations, ...;
}
else {
  my $first = shift @ints;
  push @operations, ...;
}

But then I realized that I was shift-ing the value off @ints in either case, and it would just be cleaner to do it before the comparison so I could use $first instead of $ints[0].

The Raku version is nothing fancy this time:

#!/usr/bin/env raku

use v6;

# just accept the list of integers on the command line
my @ints = @*ARGS;

my @operations;
my $count = 1;
while ( @ints.elems > 0 ) {
  my $min = @ints.min;

  # in either case, we're removing the first element
  # from the list
  my $first = @ints.shift;

  if ($min == $first) {
    # the first element is the minimum, discard it
    push @operations, "Operation $count: "
                    ~ "remove element $min: ("
                    ~ @ints.join(', ') ~ ")";
  }
  else {
    # the first element is NOT the minimum, add it to the end
    push @ints, $first;
    push @operations, "Operation $count: "
                    ~ "move $first to the end: ("
                    ~ @ints.join(', ') ~ ")";
  }
  $count++;
}

# produce the output
# let's use @ARGV again, since we modofy @ints as we go along
say "Input: \@int = (" ~ @*ARGS.join(', ') ~ ")";
say "Output: " ~ @operations.elems;
say "";
say join "\n", @operations;

Perl Weekly Challenge #227

This week’s challenge brought two new tasks: Friday the 13th & Roman Maths.

Task 1: Friday 13th

You are given a year number in the range 1753 to 9999.

Write a script to find out how many dates in the year are Friday 13th, assume that the current Gregorian calendar applies.

Example

Input: $year = 2023
Output: 2

Since there are only 2 Friday 13th in the given year 2023 i.e. 13th Jan and 13th Oct.

This was going to be easy, because I knew there were date manipulation modules in the core distribution: Time::Piece and Time::Seconds. I figured I wanted to start at the first of the year, add one day in a loop until I found the first Friday, and then skip from one Friday to the next by adding seven days, using the wday property of a Time::Piece object to check whether or not the date was the 13th. Yes, instantiating the first day of the year would have been easier using the DateTime module, but I wanted to use only core modules if they did what I needed (and, really how much more complex is Time::Piece->strptime("$year-01-01", "%Y-%m-%d")->truncate(to => 'day'); versus DateTime->new(year => $year, month => 1, day => 1)->truncate(to => 'day');).

#!/usr/bin/env perl
use v5.38;

# let's use the core modules for date manipulation
use Time::Piece;
use Time::Seconds qw( ONE_DAY );

# get the year from the command line
my $year = shift @ARGV
  or die "usage: $0 year\n";

# do bounds checking as specified in the problem
if ($year < 1753 || $year > 9999) {
  die "Only years between 1753 to 9999 are allowed ($year is out of range)\n";
}

# create an object for Jan 01 of the given year
my $t = Time::Piece->strptime("$year-01-01", "%Y-%m-%d")
                   ->truncate(to => 'day');

# find the first friday
# in Time::Piece->wday, 1 = Sunday, 6 = Friday
while ( $t->wday != 6) {
  $t += ONE_DAY; # add 1 day
}

# now keep adding 7 days to the date until the year changes,
# noting how many times the day of the month is 13
my $thirteen_count = 0;
while ( $t->year == $year ) {
  $thirteen_count++ if $t->mday == 13;
  $t += ONE_DAY * 7;
}

say "Input: \$year = $year";
say "Output: $thirteen_count";

Doing this problem in Raku wound up being even easier, because in Raku, Date objects are a native part of the language, and incrementing a Date object increases the value by one day. Even instantiating a Date object was easier, because I didn’t need to parse a date format or specify an array with 0-indexed months or years with 1900 subtracted from them, I was able to specify a year, month, day in my new() call:

#!/usr/bin/env raku

sub MAIN($year) {
  # do bounds checking as specified in the problem
  if ($year < 1753 || $year > 9999) {
    say "Only years between 1753 to 9999 are allowed ($year is out of range)";
    exit 1;
  }

  # create an object for Jan 01 of the given year
  my $t = Date.new($year, 1, 1);

  # find the first friday
  # in Date.day-of-week, 0 = Sunday, 5 = Friday
  while ( $t.day-of-week != 5) {
    $t++; # add 1 day
  }

  # now keep adding 7 days to the date until the year changes,
  # noting how many times the day of the month is 13
  my $thirteen_count = 0;
  while ( $t.year == $year ) {
    $thirteen_count++ if $t.day == 13;
    $t += 7;
  }

  say "Input: \$year = $year";
  say "Output: $thirteen_count";
}

Task 2: Roman Maths

Write a script to handle a 2-term arithmetic operation expressed in Roman numeral.

Example

IV + V     => IX
M - I      => CMXCIX
X / II     => V
XI * VI    => LXVI
VII ** III => CCCXLIII
V - V      => nulla (they knew about zero but didn't have a symbol)
V / II     => non potest (they didn't do fractions)
MMM + M    => non potest (they only went up to 3999)
V - X      => non potest (they didn't do negative numbers)

Now, I’m not going to get into how Roman numerals did have ways of expressing fractions or numbers larger that 3,999, because that’s not part of the challenge. Remember, I want to showcase how easy it is to solve problems in Perl & Raku. And I knew just the module to use: Roman. Unfortunately, none of the modules for manipulating Roman numerals are in the core Perl distribution, so I had to use cpanm to install it: $ cpanm Roman (I could have used $ cpan install Roman instead, but I like the cpanm tool).

#!/usr/bin/env perl
use v5.38;

use Roman; # there's a module for handling Roman Numerals!

sub do_arithmetic {
  my $line = shift;
  # split the inout line into the three parts:
  # the two operands and the infix operator
  my($operand1r, $operator, $operand2r) = split /\s+/, $line;
  unless (defined $operand1r &&
          defined $operator  &&
          defined $operand2r) {
    say q{Lines must be of the form "operand1 operator operand2"};
    say q{where both operands are valid roman numerals and the};
    say q{operator is one of the following:  +  -  *  /  **};
    return;
  }

  my($operand1a, $operand2a);

  # check that the first operand is a roman numeral
  if (isroman($operand1r)) {
    # it is a roman numeral, convert it
    $operand1a = arabic($operand1r);
  }
  else {
    say "'$operand1r' is not a roman numberal!";
    return;
  }

  # check that the second operand is a roman numeral
  if (isroman($operand2r)) {
    # it is a roman numeral, convert it
    $operand2a = arabic($operand2r);
  }
  else {
    say "'$operand2r' is not a roman numberal!";
    return;
  }

  # calculate the results
  my $result;
  if ($operator eq '+') {
    $result = $operand1a + $operand2a;
  }
  elsif ($operator eq '-') {
    $result = $operand1a - $operand2a;
  }
  elsif ($operator eq '*') {
    $result = $operand1a * $operand2a;
  }
  elsif ($operator eq '/') {
    $result = $operand1a / $operand2a;
  }
  elsif ($operator eq '**') {
    $result = $operand1a ** $operand2a;
  }
  else {
    die "Unknown operator '$operator'; valid operators are + - * / **\n";
  }

  # handle all the special output cases
  if ($result == 0) {
    say "$operand1r $operator $operand2r => nulla "
      . "(they knew about zero but didn't have a symbol)";
  }
  elsif (int($result) != $result) {
    say "$operand1r $operator $operand2r => non potest "
      . "(they didn't do fractions)";
  }
  elsif ($result > 3999) {
    say "$operand1r $operator $operand2r => non potest "
      . "(they only went up to 3999)";
  }
  elsif ($result < 0) {
    say "$operand1r $operator $operand2r => non potest "
      . "(they didn't do negative numbers)";
  }
  else {
    say "$operand1r $operator $operand2r => " . uc roman($result);
  }
}

# while we have input on STDIN, process the calculations
while (my $line = <>) {
  chomp $line;
  do_arithmetic($line);
}

At first, I whipped it up as a command-line tool that accepted the two operands and the operator on the command line, but I realized it wouldn’t be easy to produce output as close to the sample as possible doing things this way, so I modified it to read the operations from STDIN. This also allowed me to add a file that could be used by both my Perl and Raku solutions to make the input standardized.

I also wanted to do some extra checking: not just the stuff between lines 53-72 to handle the special cases called out in the example; I wanted to check for invalid Roman numerals and for input that didn’t have two operands separated by an operator. Lines 22-40 do the check using the isroman() function provided by the Roman module, and lines 49-51 make sure that we generate an error if we’re not passed one of the five operators specified in the requirements.

The Raku version of this proved slightly more challenging, because the Math::Roman module available for Raku didn’t have a function corresponding to Perl’s Roman module’s isroman() function. So I had to make one:

#!/usr/bin/env raku
use Math::Roman; # it's v0.0.1, but usable

sub isroman ( $var ) {
  # Math::Roman doesn't have a test to see if a string is
  # a Roman numeral, but it does throw an exception if it
  # cannot convert it
  my $result;
  try {
    CATCH {
      default {
        return False;
      }
    }
    $result = Math::Roman.new: $var;
  }
  # Math::Roman also doesn't respect the maximum of 3999
  if ($result.as-arabic > 3999) {
    return False;
  }

  return True;
}

sub do_arithmetic (Str $line) {
  # split the inout line into the three parts:
  # the two operands and the infix operator
  my ($operand1, $operator, $operand2) = $line.split(/\s+/);

  unless (defined $operand1 &&
          defined $operator  &&
          defined $operand2) {
    say q{Lines must be of the form "operand1 operator operand2"};
    say q{where both operands are valid roman numerals and the};
    say q{operator is one of the following:  +  -  *  /  **};
    return;
  }

  # check that the first operand is a roman numeral
  if (isroman($operand1)) {
    # it is a roman numeral, convert it
    $operand1 = Math::Roman.new: $operand1;
  }
  else {
    say "'$operand1' is not a roman numberal!";
    return;
  }

  # check that the second operand is a roman numeral
  if (isroman($operand2)) {
    # it is a roman numeral, convert it
    $operand2 = Math::Roman.new: $operand2;
  }
  else {
    say "'$operand2' is not a roman numberal!";
    return;
  }

  # # calculate the results
  my $result;
  if ($operator eq '+')     {
    $result = $operand1.as-arabic + $operand2.as-arabic;
  }
  elsif ($operator eq '-')  {
    $result = $operand1.as-arabic - $operand2.as-arabic;
  }
  elsif ($operator eq '*')  {
    $result = $operand1.as-arabic * $operand2.as-arabic;
  }
  elsif ($operator eq '/')  {
    $result = $operand1.as-arabic / $operand2.as-arabic;
  }
  elsif ($operator eq '**') {
    $result = $operand1.as-arabic ** $operand2.as-arabic;
  }
  else {
    die "Unknown operator '$operator'; valid operators are + - * / **\n";
  }

  # handle all the special output cases
  if ($result == 0) {
    say "$operand1 $operator $operand2 => nulla "
      ~ "(they knew about zero but didn't have a symbol)";
  }
  elsif ($result.truncate != $result) {
    say "$operand1 $operator $operand2 => non potest "
      ~ "(they didn't do fractions)";
  }
  elsif ($result > 3999) {
    say "$operand1 $operator $operand2 => non potest "
      ~ "(they only went up to 3999)";
  }
  elsif ($result < 0) {
    say "$operand1 $operator $operand2 => non potest "
      ~ "(they didn't do negative numbers)";
  }
  else {
    $result = Math::Roman.new: value => $result.Int;
    say "$operand1 $operator $operand2 => $result";
  }
}

# while we have input on STDIN, process the calculations
for $*IN.lines -> $line {
  do_arithmetic($line);
}

Perl Weekly Challenge #226

I went to the Perl and Raku Conference in Toronto, ON, two weeks ago. I went because I really wanted to reconnect with the Perl community that I’d fallen out of touch with while I was working at a job where Perl was actively ridiculed.

While I was there, I was talking to one of the people giving talks, Bruce Gray. He suggested that one of the best ways to reconnect would be to do the weekly challenge.

I’d seen the challenge being talked about in emails I subscribed to, but I hadn’t given it much thought. But I wanted to reconnect, keep my Perl chops up to date, and generally start participating in the community again. So when I got the email for Challenge #226, I thought about it a bit. What I realized was that the challenge wasn’t just a way for people to showcase their Perl skills; it was a way for the community to showcase how easy Perl and Raku were to use. So I decided that was the approach I was going to take: not try to be clever, but try to show how easy this language I love is to solve problems.

Task 1: Shuffle String

Here’s the description provided in the challenge:

You are given a string and an array of indices of same length as string.
Write a script to return the string after re-arranging the indices in the correct order.

Example 1

Input: $string = 'lacelengh', @indices = (3,2,0,5,4,8,6,7,1)
Output: 'challenge'

Example 2

Input: $string = 'rulepark', @indices = (4,7,3,1,0,5,2,6)
Output: 'perlraku'

I won’t lie, it took me a little while to understand what it wanted me to do. Finally, I realized that the @indicies array was showing me where in the output string the character from the input string should be moved to: the first character in the input string should be moved to the 3rd position in the output, the second character to the 2nd position, the third to the 0th position and so on. Once I grokked that requirement, the Perl implementation came easily:

#!/usr/bin/env perl
use v5.36;

sub shuffle_string {
  my($string, $indices) = @_;
  my @chars = split //, $string; # split input string into characters
  my @result;
  foreach my $index ( @$indices ) {
    my $char  = shift @chars;     # get the next character
    $result[$index] = $char;      # put the character at that index in the result
  }
  say "Input: \$string = '$string', \@indices = (" . join(',', @$indices) . ")";
  say "Output: '" . join(q{}, @result) . "'";
}

say "Task 1: Shuffle String";
say "\nExample 1";
shuffle_string('lacelengh', [3,2,0,5,4,8,6,7,1]);
say "\nExample 2";
shuffle_string('rulepark', [4,7,3,1,0,5,2,6]);

Note how Perl makes handling the parts of the problem easy: splitting a string into its component characters is easy, recombining them back into a string is easy, passing the data around is easy.

Now, I don’t have a lot of experience with Raku; but I want to get better at it, so that’s why I’m doing the challenges in Raku as well. Unfortunately, for the moment my Raku solutions will look a lot like my Perl solutions:

#!/usr/bin/env raku

sub shuffle_string ($string, @indices) {
  my @chars = $string.split("", :skip-empty);
  my @result;
  for @indices -> $index {
    my $char = shift @chars;   # get the next character
    @result[$index] = $char;   # put the character at that index in the result
  }
  say "Input: \$string = '$string', \@indices = (" ~ @indices.join(',') ~ ")";
  say "Output: '" ~ @result.join('') ~ "'";
}

say "Task 1: Shuffle String";
say "\nExample 1";
shuffle_string('lacelengh', (3,2,0,5,4,8,6,7,1));
say "\nExample 2";
shuffle_string('rulepark', (4,7,3,1,0,5,2,6));

Task 2: Zero Array

You are given an array of non-negative integers, @ints.

Write a script to return the minimum number of operations to make every element equal zero.

In each operation, you are required to pick a positive number less than or equal to the smallest element in the array, then subtract that from each positive element in the array.

Example 1:

Input: @ints = (1, 5, 0, 3, 5)
Output: 3

operation 1: pick 1 => (0, 4, 0, 2, 4)
operation 2: pick 2 => (0, 2, 0, 0, 2)
operation 3: pick 2 => (0, 0, 0, 0, 0)

Example 2:

Input: @ints = (0)
Output: 0

Example 3:

Input: @ints = (2, 1, 4, 0, 3)
Output: 4

operation 1: pick 1 => (1, 0, 3, 0, 2)
operation 2: pick 1 => (0, 0, 2, 0, 1)
operation 3: pick 1 => (0, 0, 1, 0, 0)
operation 4: pick 1 => (0, 0, 0, 0, 0)

This one I found a lot easier to understand for some reason.

#!/usr/bin/env perl
use v5.36;

use List::Util qw( min );

sub min_positive {
  my @ints = grep { $_ > 0 } @_; # only consider positive numbers
  return min @ints; # find smallest, undef if empty list
}

sub zero_array {
  my @ints = @_;
  say "Input: \@ints = (" . join(', ', @ints) . ")";
  my @operations;
  while ( my $min = min_positive(@ints) ) {
    my $op_num = scalar(@operations) + 1;
    foreach my $int ( @ints ) {
      $int -= $min if $int > 0;
    }
    push @operations, "operation $op_num: pick $min => (" . join(', ', @ints) . ")";
  }
  say "Output: " . scalar(@operations);
  if (@operations) {
    say "";
    say join "\n", @operations;
  }
}

say "Task 2: Zero Array";
say "\nExample 1";
zero_array(1, 5, 0, 3, 5);

say "\nExample 2";
zero_array(0);

say "\nExample 3";
zero_array(2, 1, 4, 0, 3);

This one I’d like to pull apart a bit more. Picking “a positive number less than or equal to the smallest element in the array” sounded a lot like the min function found in the List::Util module, but that gives us the minimum value, not the minimum non-zero value, so I needed to filter the values equal to zero out of the array first. Initially, I did it like this:

min grep { $_ > 0 } @ints

but then I realized I needed to do that as part of the conditional to a loop, and I decided it would be a lot more readable if I pulled it out into it’s own function. Remember, I’m trying to express how easy things are in Perl, so I want to make my solutions completely readable and understandable to people who have never used Perl before.

I wanted the output to look exactly like the text in the examples, so I made the minimal extra effort to build an array of operations and put a bit of formatting into that so I could just dump the operations when I’d found out how many operations were necessary.

Again, my Raku solution looks like my Perl solution with a few syntax tweaks:

#!/usr/bin/env raku

sub min_positive (@ints) {
  my @positive = @ints.grep({ $_ > 0 }); # only consider positive numbers
  return unless @positive.elems;         # return early if no positive nums
  return @positive.reduce(&min);         # find smallest
}

sub zero_array (@ints) {
  say "Input: \@ints = (" ~ @ints.join(', ') ~ ")";
  my @operations;
  while ( my $min = min_positive(@ints) ) {
    my $op_num = @operations.elems + 1;
    for @ints <-> $int {
      $int -= $min if $int > 0;
    }
    @operations.push("operation $op_num: pick $min => (" ~ @ints.join(', ') ~ ")");
  }
  say "Output: " ~ @operations.elems;
  if (@operations) {
    say "";
    say @operations.join("\n");
  }
}

say "Task 2: Zero Array";
say "\nExample 1";
zero_array([1, 5, 0, 3, 5]);

say "\nExample 2";
zero_array([0]);

say "\nExample 3";
zero_array([2, 1, 4, 0, 3]);

If you really want a good example of how Raku can be used to solve this problem, take a look at Bruce Gray’s solution.

And that’s it. I’ve already coded my solutions for Challenge #227, and I’ll be blogging about them soon.

Preview of my next post…

My next project in the “whittling wood”?  Figuring out why XML::RSS::LibXML parses these tags without any problem:

[code language=”xml”]
<itunes:category text="News &amp; Politics"/>
<itunes:image href="http://media.npr.org/images/podcasts/2013/primary/hourly_news_summary-c464279737c989a5fbf3049bc229152af3c36b9d.png?s=1400"/>
[/code]

and produces this internal data structure:

[code language=”perl”]
category => bless( {
_attributes => [
"text"
],
_content => "",
text => "News & Politics"
}, ‘XML::RSS::LibXML::MagicElement’ ),
image => bless( {
_attributes => [
"href"
],
_content => "",
href => "http://media.npr.org/images/podcasts/2013/primary/hourly_news_summary-c464279737c989a5fbf3049bc229152af3c36b9d.png?s=1400"
}, ‘XML::RSS::LibXML::MagicElement’ ),
[/code]

But then doesn’t have these tags anywhere in the re-rendered XML when it spits it back out again. I know it has to do with the fact that these tags have no content (there’s no opening and closing tag, there’s just the one tag closed with a />), but I don’t know why XML::RSS isn’t properly rendering it when it converts the data structure back to XML.

The easy work-around would be to just do some string matching, recognize these tags in the original XML, copy them and re-insert them into the rendered XML afterwards.

The more difficult fix is to figure out what’s wrong with XML::RSS and try to fix it myself.

Guess which road I’m taking?

But what if I’ve still got an itch?

Ok, I wanted to write a follow-up post about my little program and the changes that I needed to make to it about a day later, but I found myself writing more and more code, and not having any time to actually write about writing the code. My wife, Kay, has dubbed it “my whittling wood”. So, let’s run down things that started to bother me about my creation…

The first thing that bothered me was when I was walking out of my office the first night. I checked my podcast app, and I didn’t have the 7PM news podcast yet, and it was 7:15PM already. I knew immediately what happened: NPR had been late updating the feed, and my cron job had run at 7:12 and missed it. So I thought about how to fix that problem (I managed to get the 7PM news podcast at 8:12 because NPR was late with the 8PM episode as well, so I was able to pick up the 7PM episode on that run).

I immediately dismissed the idea of running the script multiple times an hour. It didn’t feel clean to me. What I decided I needed to do was check to see if the episode I was looking at this time was different than the episode I was looking at the last time the script ran–I would need a new table to track this–and, if it was the same episode, sleep for a minute or two and try again, and continue retrying until I either got a new episode or I decided I’d waited long enough (20 attempts seemed to be a good cutoff number).

Of course, I also decided I wanted to be able to check up on what was happening, so I needed to write a log file. If I was going to be able to see this log file when I wasn’t home, however, I’d have to copy it up to my web server along with the RSS feed XML.

And this brings me to where I was when I wrote my first post. I already had more code in the script, but I blogged about the first draft, wanting to come back to this second draft with a followup blog post.

And that’s when things got crazy.

We had a big filming day coming up for PacKay Productions that weekend, and I had a lot of work to do, some of which I’d already done and blogged about. After the filming was done, I needed to prep for Halloween.  And even with the changes I’d made to this script, things were going wrong with my setup.

One of the things I did wrong was setting up my wrapper shell script to run the perl program.  I’m not really adept at Bourne shell scripting, and I always leave things out. Then, last Thursday night, I was idly wondering how easy it would be to correct the other major annoyance I have with the NPR Hourly News Summary: the inconsistency of the sound levels.

Sometimes, the news summary is recorded at a good level, and I’m able to hear everything just fine.  Other times, the levels are set so low that even with my player’s volume cranked all the way up and my headphones pressed into my ears, I find it impossible to hear what’s being said over the sounds of the street in New York City.

So, of course, I started looking to see if somebody else had already solved my problem.  I ran across this post in the ask ubuntu StackExchange forums, audacitywhich outlined two solutions: Audacity, an open source visual sound editor I was already intimately familiar with, and SoX, which was billed as “the Swiss Army knife of sound processing programs”.

SoX: the Swiss Army knife of sound processing programs

SoX

SoX is a command line tool for processing audio files, and the more I read about it, the more I liked it.  Normalizing an audio file used to be a two-step process in SoX: running a command once in an analysis mode to get the maximum volume of the file, and a second time to boost that volume to the maximum possible without distortion. However, with version 14.3 of SoX, its developers made all of that possible in one single command:

sox --norm infile outfile

I briefly pondered cloning SoX’s git repository and building from source, but I realized that chances were slight that I was going to be making changes to SoX; I just wanted it as a command line tool.  So I turned to one of the most wonderful things you can have on your Mac: Homebrew.

Homebrew is a package manager for OS X that’s all git and ruby under the hood, and it has a beer theme! It installs software in a “Cellar”. It doesn’t have packages, it has “bottles.  It even uses the beer emoji: ????

Installing new software with Homebrew is painfully easy:

brew install sox

Once I got SoX installed, modifying my code to used it was dead easy.

Finally, I decided to tackle the big thing that I wasn’t doing in the program itself: copying files up to the webserver. At first I looked at Net::Scp, but for some reason I couldn’t get it to work (it kept telling me that my remote directory didn’t exist).  So I switched over to Net::OpenSSH, and I was able to get the copy working.

I also cleaned up the code a lot, and added a ton of comments.  I want this code to be able to document itself, so it’s really obvious what I’m doing and why. Some would say that once a program is working, it’s done.  But when I’m writing code for myself, it’s not done until I’ve commented the heck out of it, because I know myself: a year later, I’m going to come back to this code and think “What was I smoking when I wrote this?”

I doubt I’ll think that when I come back to this code.

[code language=”perl”]
#!/Users/packy/perl5/perlbrew/perls/perl-5.22.0/bin/perl -w

use DBI;
use Data::Dumper::Concise;
use DateTime;
use DateTime::Format::Mail;
use LWP::Simple;
use Net::OpenSSH;
use URI;
use XML::RSS;
use strict;

use feature qw( say state );

# define all the things!
use constant {
URL => ‘http://www.npr.org/rss/podcast.php?id=500005’,
TITLE_ADD => ‘ (filtered by packy)’,
TITLE_MAX => 40, # characters
SLEEP_FOR => 120, # seconds (2 minutes)
MAX_RETRIES => 10,
KEEP_DAYS => 7,

REMOTE_HOST => ‘www.dardan.com’,
REMOTE_USER => ‘dardanco’,
REMOTE_DIR => ‘www/packy/’,

MEDIA_URL => ‘https://packy.dardan.com/npr’,

TZ => ‘America/New_York’,
LOGFILE => ‘/tmp/npr-news.txt’,
XMLFILE => ‘/tmp/npr-news.xml’,
IN_DIR => ‘/tmp/incoming’,
OUT_DIR => ‘/tmp/outgoing’,
DATAFILE => ‘/Users/packy/data/filter-npr-news.db’,

SOX_BINARY => ‘/usr/local/bin/sox’,
};

# list of times we want – different times on weekends
my @keywords = is_weekday() ? qw( 7AM 8AM 12PM 6PM 7PM )
: qw( 7AM 12PM 7PM );

my $dbh = get_dbh(); # used in a couple places, best to be global

my $rss; # these two vars are only used in the main code block,
my $items; # but can’t be scoped to the foreach loop

# since, for cosmetic reasons, we’re starting the count at 1, we need
# to loop up to MAX_RETRIES + 1; otherwise, we’ll only have the first
# attempt and then (MAX_RETRIES – 1). If I’d called the constant
# MAX_ATTEMPTS then it would make sense to start at zero…
foreach my $retry (1 .. MAX_RETRIES + 1) {

# get the RSS
write_log("Fetching " . URL);
my $content = get(URL);

# parse the RSS using a subclass of XML::RSS
$rss = XML::RSS::NPR->new();
$rss->parse($content);
write_log("Parsed XML");

$items = $rss->_get_items;

# if a new show was published in the feed, we don’t need to wait
# in a loop for a new one
last unless same_show_as_last_time( $items );

# we don’t want the script to wait forever – if no new episode
# appears after a maximum number of retries, give up and generate
# the feed with the episodes we have
if ($retry > MAX_RETRIES) {
write_log("MAX_RETRIES (".MAX_RETRIES.") exceeded");
last;
}

# for debugging purposes, I want to be able to not have the script
# sleep, and the choices were add command line switch processing
# or check an environment variable. This was the simpler option.
if ($ENV{NPR_NOSLEEP}) {
last;
}

# log the fact that we’re sleeping so we can observe what the
# script is doing while it’s running
write_log("Sleeping for ".SLEEP_FOR." seconds…");

# since I usually want to listen to these podcasts when I’m away
# from my desktop computer, copy the log file up to the webserver
# so I can check on it remotely. this way, if it’s spending an
# inordinate amount of time waiting for a new episode, I can see
# that from my phone’s browser…
push_log_to_remotehost();

# actually sleep
sleep SLEEP_FOR;

# and note which number retry this is
write_log("Trying RSS feed again (retry #$retry)");
}

# test to see if the new item matches our inclusion criteria, and then
# fill the item list with items we’ve cached in our database
get_items_from_database( $items );

# make new RSS feed devoid of the original items… ok, ITEM
$rss->clear_items;

foreach my $item ( @$items ) {
$rss->add_item(%$item);
}

re_title($rss);

write_log("Writing RSS XML to " . XMLFILE);
open my $fh, ‘>’, XMLFILE;
say {$fh} $rss->as_string;
close $fh;
push_xml_to_remotehost();

#################################### subs ####################################

sub get_items_from_database {
my $items = shift;

# build the regex for matching desired episodes from keywords
my $re = join "|", @keywords;
$re = qr/\b(?:$re)\b/i;

my $insert = $dbh->prepare("INSERT INTO shows (pubdate, item) ".
" VALUES (?, ?)");

my $exists_in_db = $dbh->prepare("SELECT COUNT(*) FROM shows ".
" WHERE pubdate = ?");

# I know the feed only has the one item in it, but it SHOULD have
# more, so let’s go through the motions of checking each item

foreach my $item (@$items) {

# pawn off the specifics of how we get the information to a sub
my ($epoch, $title) = item_info($item);

# again, for debugging purposes, I wanted to be able to not
# have the script skip the current item, and the choices were
# add command line switch processing or check an environment
# variable. This was the simpler option.

if ($title !~ /$re/ &amp;&amp; ! $ENV{NPR_NOSKIP}) {
write_log("’$title’ doesn’t match $re; skipping");
next;
}

# check to see if we already have it in the DB
$exists_in_db->execute($epoch);
my ($exists) = $exists_in_db->fetchrow;

if ($exists > 0) {
write_log("’$title’ already in database; skipping");
next;
}

# the NPR news podcast is notoriously bad at normalizing the
# volume of its broadcasts; some are easy to hear and some are
# so quiet it’s impossible to ehar them when listening on a
# city street, so, let’s normalize them to a maximum volume

normalize_audio($item);

write_log("Adding ‘$title’ to database");

# it’s easier to store the data in the episode cache table as
# a perl representation of the parsed data than it is to
# serialize it back into XML and then re-parse it when we need
# it again.
$insert->execute($epoch, Dumper($item));
}

# go through the database and dump episodes that are older than
# our retention period. Since we’re using epoch time (seconds
# since some date, usually midnight 1970-01-01) as the key to our
# episode cache table, it’s really easy to determine which
# episodes are too old

my $now = DateTime->now();
my $too_old = $now->epoch – (KEEP_DAYS * 24 * 60 * 60);
$dbh->do("DELETE FROM shows WHERE pubdate < $too_old");

# now let’s fetch the episodes from the episode cache table in
# oldest-first order. Again, since we’re keyed on the episode’s
# publish date in epoch time, we can do this with a simple numeric
# sort.
my $query = $dbh->prepare("SELECT * FROM shows ORDER BY pubdate");
$query->execute();

@$items = ();
while ( my($pubdate, $item) = $query->fetchrow ) {

# just blindly evaluating text is a potential security problem,
# but I know all these entries came from me writing dumper-ed code,
# so I feel safe in doing so…
my $evaled = eval $item;

push @$items, $evaled;

# log which episodes we’re putting into the feed
my ($epoch, $title) = item_info($evaled);
write_log("Fetched ‘$title’ from database; adding to feed");
}
}

sub same_show_as_last_time {
my $items = shift;

# so we know when the feed is late in publishing a new item,
# we have a table that stores the publication date of the last
# episode we saw. It also stores the title of the episode so
# we can log which episode it was.

my $get_last_show = $dbh->prepare("SELECT * FROM last_show");

# get the information for the current episode
my ($epoch, $title) = item_info($items->[0]);

# fetch the last epsiode from the DB
$get_last_show->execute;
my ($last_time, $last_title) = $get_last_show->fetchrow;

# save the episode we just fetched for next time
my $update = $dbh->prepare("UPDATE last_show SET pubdate = ?, title = ? ".
" WHERE pubdate = ?");
$update->execute($epoch, $title, $last_time);

# now compare the current episode with the one we got from the DB
my $is_same = ($last_time == $epoch);

if ($is_same) {
write_log("RSS feed has not updated since ‘$last_title’ was published");
}

return $is_same;
}

#################################### audio ####################################

sub filename_from_uri {
my $uri = shift;

# abstract out the complexities of fetching the filename from a
# URI so the code will read easier; in this case, we’re
# instantiating a new URI class object and calling path_segments()
# to get the segments of the path, and then returning the last
# element, which is going to be the filename.

return( ( URI->new($uri)->path_segments )[-1] );
}

sub normalize_audio {
my $item = shift;
my $uri = item_url($item);
my $file = filename_from_uri($uri);

# perl idiom for "if directory doesn’t exist, make it"
-d IN_DIR or mkdir IN_DIR;
-d OUT_DIR or mkdir OUT_DIR;

# construct fill pathnames to the file we’re downloading and
# then normalizing to
my $infile = join ‘/’, IN_DIR, $file;
my $outfile = join ‘/’, OUT_DIR, $file;

# fetch the MP3 file using LWP::Simple
my $code = getstore($uri, $infile);
write_log("Fetched ‘$uri’ to $infile; RESULT $code");
return unless $code == 200;

# if, for some reason, we don’t have the program to normalize audio,
# crash with a message complaining about it being missing
-x SOX_BINARY
or die "no executable at " . SOX_BINARY;

# call SoX to normalize the audio
write_log("Normalizing $infile to $outfile");
system join(q{ }, SOX_BINARY, ‘–norm’, $infile, $outfile);

# the feed doesn’t publish an item length in bytes, but it really
# ought to, so let’s get the size of the MP3 file.
my $size = -s $outfile || 0;

# re-write the bits of the item we’re changing
item_url($item, join ‘/’, MEDIA_URL, $file);
item_length($item, $size);

# send the normalized MP3 file up to the webserver
push_media_to_remotehost($outfile);

# clean up after ourselves
unlink $infile;
unlink $outfile;
}

#################################### db ####################################

sub get_dbh {
my $file = DATAFILE;

# check to see if the datafile exists BEFORE we connect to it
my $exists = -f $file;

my $dbh = DBI->connect(
"dbi:SQLite:dbname=$file",
"",
"",
{ RaiseError => 1}
) or die $DBI::errstr;

# if the datafile didn’t exist before we connected to it, let’s set up
# the schema we’re using
unless ($exists) {
$dbh->do("CREATE TABLE shows (pubdate INTEGER PRIMARY KEY, item TEXT)");
$dbh->do("CREATE INDEX shows_idx ON shows (pubdate);");
$dbh->do("CREATE TABLE last_show (pubdate INTEGER PRIMARY KEY, ".
" title TEXT)");
}

return $dbh;
}

#################################### time ####################################

sub now {
# set the time zone in the DateTime object, so we get non-UTC time
return DateTime->now( time_zone => TZ );
}

sub is_weekday {
# makes our code easier to read
return now()->day_of_week < 6;
}

################################### copying ###################################

sub push_to_remotehost {
my ($from, $to) = @_;

my $connect = join ‘@’, REMOTE_USER, REMOTE_HOST;

state $ssh = Net::OpenSSH->new($connect);

write_log("Copying $from to $connect:$to");

if ( $ssh->scp_put($from, $to) ) {
write_log("Copy success");
}
else {
write_log("COPY ERROR: ". $ssh->error);
}
}

# helper functions to make the code easier to read

sub push_xml_to_remotehost {
push_to_remotehost(XMLFILE, REMOTE_DIR);
}

sub push_log_to_remotehost {
push_to_remotehost(LOGFILE, REMOTE_DIR);
}

sub push_media_to_remotehost {
my $from = shift;
push_to_remotehost($from, REMOTE_DIR . ‘npr/’);
}

################################### logging ###################################

sub write_log {
# I’m opening and closing the logfile every time I write to it so
# it’s easier for external processes to monitor the progress of
# this script
open my $logfile, ‘>>’, LOGFILE;

my $now = now();
my $ts = $now->ymd . q{ } . $now->hms . q{ };

# I don’t write multiple lines yet, but I might want to!
foreach my $line ( @_ ) {
say {$logfile} $ts . $line;
}

close $logfile;
}

BEGIN {
unlink LOGFILE; # write a new log each time we run
write_log(‘Started run’); # log that the run has started

# register a DIE handler that will write whatever message I die() with
# to our logfile so I can see it in the logs
$SIG{__DIE__} = sub {
my $err = shift;
write_log(‘FATAL: ‘.$err);
# if we die(), after this runs, the END block will be executed!
};
}

END {
# when the program finishes, log that
write_log(‘Finished run’);

# and, so I can see these logs remotely, push them up to the webserver
push_log_to_remotehost();
}

##################################### XML #####################################

sub re_title {
my $rss = shift;

# append some text to the channel’s title so I can differentiate
# this feed from the original feed in my podcast app

my $existing_title = $rss->channel(‘title’);
my $add_len = length(TITLE_ADD);

if (length($existing_title) + $add_len > TITLE_MAX) {
$existing_title = substr($existing_title, 0, TITLE_MAX – $add_len – 1);
}

$rss->channel(‘title’ => $existing_title . TITLE_ADD);
}

sub item_info {
state $mail = DateTime::Format::Mail->new; # only initialized once!

my $item = shift;
my $title = fix_whitespace($item->{title});
my $dt = $mail->parse_datetime($item->{pubDate});
my $epoch = $dt->epoch;
return $epoch, $title;
}

sub fix_whitespace {
my $string = shift;

# multiple whitespace compressed to a single space
$string =~ s{\s+}{ };

# remove leading and trailing spaces
$string =~ s{^\s+}{}; $string =~ s{\s+$}{};

return $string;
}

# let’s define some pseudo-accessors (since these are unblessed
# hashes, not objects) that will make our code easier to read

sub enclosure_pseudo_accessor {
my $hash = shift;
my $key = shift;
if (@_) {
$hash->{enclosure}->{$key} = shift;
}
return $hash->{enclosure}->{$key};
}

sub item_url {
my $hash = shift;
enclosure_pseudo_accessor($hash, ‘url’, @_);
}

sub item_length {
my $hash = shift;
enclosure_pseudo_accessor($hash, ‘length’, @_);
}

# since XML::RSS doesn’t provide a method to clear out the items in an
# already-parsed feed, I’m creating a subclass to provide that
# functionality rather than just executing code that manipulates the
# internal data structure of the object in my main program

package XML::RSS::NPR;
use base qw( XML::RSS );

sub clear_items {
my $self = shift;
$self->{num_items} = 0;
$self->{items} = [];
}

# since we’re creating a subclass, we can override the default XML
# modules that are used to be the ones we need – no calling
# add_module() from our main program!

sub _get_default_modules {
return {
‘http://www.npr.org/rss/’ => ‘npr’,
‘http://api.npr.org/nprml’ => ‘nprml’,
‘http://www.itunes.com/dtds/podcast-1.0.dtd’ => ‘itunes’,
‘http://purl.org/rss/1.0/modules/content/’ => ‘content’,
‘http://purl.org/dc/elements/1.1/’ => ‘dc’,
};
}

__END__
[/code]

Read it on GitHub: filter-npr-news

Scratching my itch

It’s been a while since I wrote some code to scratch purely my own itch. Most of my time is spent writing code to scratch my employer’s itches, and occasionally I get to write little programs that scratch small itches I get while writing code for my employer — things like extensions to git-p4 that allow me to pull information from a git repository and use it to generate merge commands for Perforce, so I don’t have to figure out which commits/changes I want to merge.

I know; nothing someone else would be interested in.

But I’ve had an itch for a little while that someone else might be interested in. I listen to the NPR Hourly News Summary via my podcast app on my Nexus 6. The web page might list a bunch of back episodes, but the RSS feed only publishes the most recent summary. But I want to listen to SOME older episodes, just not all of them. What I had been doing was having my podcast app keep every episode and then mark the ones I wasn’t interested in as done, but that was tedious, especially considering I only wanted to listen to four or five of the 24 episodes published each day.

So I thought about what I wanted. I wanted a program that would fetch the RSS feed every hour and check to see if the currently published episode was one of the ones I wanted, and, if it was, store it in a database and then spit out a new RSS feed with the last N episodes I’d stored in the database. I realized I didn’t need to actually fetch the episodes themselves, because NPR doesn’t remove the episodes after they disappear from the RSS feed (as evidenced by the web page with multiple episodes). I also realized I didn’t need to generate this new RSS feed dynamically: NPR’s feed only gets updated once an hour, so I only needed to generate my feed once NPR’s feed was updated, and, since I wasn’t generating the feed every time my podcast app asked for it, I could generate the feed on my desktop computer and then copy the XML file up to my web server (since my desktop has way more computing power than my web server).

And, of course, I wanted to use perl, because that’s my favorite programming language.

One of perl’s strengths is that, whatever you want to do, there’s probably a CPAN module that will do the heavy lifting for you. There’s also a Perl Cookbook for commonly used patterns in perl programming.  I found the recipe for Reading and Writing RSS Files, and there was an example for filtering an RSS feed and generating a new one. The example uses LWP::Simple to fetch the RSS feed, XML::RSSLite to parse the feed, and  XML::RSS to generate a new RSS feed. The cookbook even states “It would be easier, of course, to do it all with XML::RSS”. So I did.

Actually, I didn’t rewrite the RSS too much. Rather than building a completely new RSS feed, I used XML::RSS to parse the feed and extract the one item from it. But even though XML::RSS has a method for adding items to the feed, it doesn’t have a method for removing items from the feed. This left me with no choice but to dig through the source code of XML::RSS and figure out what was necessary to clear out the list of items. Once I cleared out the one item out of the feed, I re-loaded the feed with the items I’d stored.

Wait… I had stored items, right?  Oh, crud, I forgot about that part.  Ok, I need to store the last N items. I could use a text file, but that’s difficult to manage.  I could set up a database in PostgreSQL or MySQL, but that’s a lot of overhead for just storing a bit of data. If only there was a self-contained, serverless, zero-configuration, transactional SQL database engine.  Something like… SQLite!

So I set up a simple schema; one table with two columns: one to hold the timestamp of the episode, and one to hold the block data I needed to shove the episode back into the feed. Since it’s SQLite, I didn’t expect the datafile to exist the first time I ran it, so I put in a test to see if the data file existed before I connected to it, and, if it didn’t, create the schema.

The rest is fairly straightforward; I checked the current episode extracted from the feed to see if it was one of the times I wanted. If it wasn’t, I just skipped ahead to generating the new feed. If it was one of the episodes I wanted, I checked to make sure I didn’t already have it in the database.  If I did, I skipped ahead.  If it wasn’t in the database, I added it to the database and then deleted everything in the database older than 7 days.

[code language=”perl”]
#!/Users/packy/perl5/perlbrew/perls/perl-5.22.0/bin/perl -w

use DBI;
use Data::Dumper::Concise;
use DateTime::Format::Mail;
use LWP::Simple;
use XML::RSS;
use strict;

use feature ‘say’;

# list of times we want
my @keywords = qw( 7AM 8AM 12PM 7PM );
my $days_to_keep = 7;

# get the RSS
my $URL = ‘http://www.npr.org/rss/podcast.php?id=500005’;
my $content = get($URL);

# parse the RSS
my $rss = XML::RSS->new();
$rss->parse($content);

my @items = get_items( $rss->_get_items );

# make new RSS feed
$rss->{num_items} = 0;
$rss->{items} = [];

foreach my $item ( @items ) {
$rss->add_item(%$item);
}

say $rss->as_string;

sub get_items {
my $items = shift;

# build the regex from keywords
my $re = join "|", @keywords;
$re = qr/\b(?:$re)\b/i;

my $mail = DateTime::Format::Mail->new;

my $dbh = get_dbh();

my $insert = $dbh->prepare("INSERT INTO shows (pubdate, item) ".
" VALUES (?, ?)");

my $exists_in_db = $dbh->prepare("SELECT COUNT(*) FROM shows ".
" WHERE pubdate = ?");

foreach my $item (@$items) {
my $title = $item->{title};
$title =~ s{\s+}{ }; $title =~ s{^\s+}{}; $title =~ s{\s+$}{};

if ($title !~ /$re/) {
next;
}

my $dt = $mail->parse_datetime($item->{pubDate});
my $epoch = $dt->epoch;

$exists_in_db->execute($epoch);
my ($exists) = $exists_in_db->fetchrow;
if ($exists > 0) {
next;
}

$insert->execute($epoch, Dumper($item));
}

my $now = DateTime->now();
my $too_old = $now->epoch – ($days_to_keep * 24 * 60 * 60);
$dbh->do("DELETE FROM shows WHERE pubdate < $too_old"); my $query = $dbh->prepare("SELECT * FROM shows ORDER BY pubdate");
$query->execute();

my @list;
while ( my($pubdate, $item) = $query->fetchrow ) {
push @list, eval $item;
}

return @list;
}

sub get_dbh {
my $file = ‘/Users/packy/data/filter-npr-news.db’;
my $exists = -f $file;

my $dbh = DBI->connect(
"dbi:SQLite:dbname=$file",
"",
"",
{ RaiseError => 1}
) or die $DBI::errstr;

unless ($exists) {
# first time – set up database
$dbh->do("CREATE TABLE shows (pubdate INTEGER PRIMARY KEY, item TEXT)");
}
return $dbh;
}
[/code]

And it worked!  I then created a git repository on my desktop for it, and pushed it up to my GitHub account under its own project, filter-npr-news.

And that kept me satisfied for a whole day.

Next time, I’ll write about what started bothering me and the changes I made to fix that.

I need help… at work!

I need help.  There’s a lot of work to do at my day job, and we need another developer.  We’ve got a job posting up on our jobs site, and we’re posting to the appropriate job sites, but I really want to fill this position.  Mostly because I’m lonely.

I used to be a beta geek in an office filled with alpha geeks.  I loved this, because there were always people who understood the ideas I had and had ideas about how to make my ideas better.  I hate being the sole alpha geek in an office because then nobody understands the ideas I have.  But I also hate only having one other alpha geek to bounce ideas off of, because then if we can’t agree, there’s nobody to break the tie.

I’m not going to go into great detail about the job.  It’s a coding job, and it uses either Perl or Java (or both, if you’re so inclined).  If you’re reading this, you know me, and you’ll know that I’m still working for the current incarnation of what I’ve called “the best job I’ve ever had.”  If you’ve got a decade of experience, know either Perl or Java, don’t mind working in jeans and a t-shirt, don’t mind working in New York City and don’t think working with me would be a sign of insanity, let me know and I’ll get you in for an interview.

Eureka!

I think I’ve found the solution to a problem I’ve had at work for ages: Win32::Exe.

I would love to examine the PE version information of a Windows file that’s been uploaded to a Linux server. For a long time, I’ve punted on this problem, and waited until I had the file back on a Windows machine before examining this information, mostly because it’s much easier to get this info using Windows’ API calls to get the data than manually parsing the PE header info.  However, just tonight just stumbled across this perl module mentioned in a stackoverflow post, and it doesn’t depend on modules that we don’t already use.

Now this problem will stop bugging me, and I can go to sleep!

Update: Unfortunately, the files I need to examine are large (> 200MB), and Win32::Exe (via Parse::Binary) seems to load the entire file into memory.  This causes an out of memory error.  But maybe I can use this code as a launching point for a different solution.