Perl Weekly Challenge: I’ll be the smartest bird the world has ever seen!

Perl Weekly Challenge 365‘s tasks are “Alphabet Index Digit Sum” and “Valid Token Counter”. I thought about trying to find some song referencing tokens (like subway tokens) or validity, but I kept hearing Caroll Spinney’s voice in my head.

Task 1: Alphabet Index Digit Sum

You are given a string $str consisting of lowercase English letters, and an integer $k.

Write a script to convert a lowercase string into numbers using alphabet positions (a=1 … z=26), concatenate them to form an integer, then compute the sum of its digits repeatedly $k times, returning the final value.

Example 1

Input: $str = "abc", $k = 1
Output: 6

Conversion: a = 1, b = 2, c = 3 -> 123
Digit sum: 1 + 2 + 3 = 6

Example 2

Input: $str = "az", $k = 2
Output: 9

Conversion: a = 1, z = 26 -> 126
1st sum: 1 + 2 + 6 = 9
2nd sum: 9

Example 3

Input: $str = "cat", $k = 1
Output: 6

Conversion: c = 3, a = 1, t = 20 -> 3120
Digit sum: 3 + 1 + 2 + 0 = 6

Example 4

Input: $str = "dog", $k = 2
Output: 8

Conversion: d = 4, o = 15, g = 7 -> 4157
1st sum: 4 + 1 + 5 + 7 = 17
2nd sum: 1 + 7 = 8

Example 5

Input: $str = "perl", $k = 3
Output: 6

Conversion: p = 16, e = 5, r = 18, l = 12 -> 1651812
1st sum: 1 + 6 + 5 + 1 + 8 + 1 + 2 = 24
2nd sum: 2+4 = 6
3rd sum: 6

Approach

The approach to this is fairly straightforward: split the string into characters, calculate the positions of the characters, then join them back together into a string. We then take that string, split the string into numeric characters, sum them, and rejoin into a string, and we do that $k times.

Raku

The one thing I needed to remember (and didn’t, so I got a Cannot resolve caller postfix:<-->(Int:D) error) was that in Raku, subroutine parameters are read-only. I was able to fix that by adding the parameter trait is copy to the definition of $k.

sub alphaIndexDigitSum($str, $k is copy) {
  my $numstr = $str.comb.map({ ord($_) - 96 }).join;
  $numstr = $numstr.comb.sum.Str while ($k-- > 0);
  return $numstr;
}

View the entire Raku script for this task on GitHub.

$ raku/ch-1.raku
Example 1:
Input: $str = "abc", $k = 1
Output: 6

Example 2:
Input: $str = "az", $k = 2
Output: 9

Example 3:
Input: $str = "cat", $k = 1
Output: 6

Example 4:
Input: $str = "dog", $k = 2
Output: 8

Example 5:
Input: $str = "perl", $k = 3
Output: 6

Example BigBird:
Input: $str = "abcdefghijklmnopqrstuvwxyz", $k = 1
Output: 135

Perl

Translating into Perl, we just need to unthread the chained methods:

  • The method .comb becomes split // at the end
  • The method , map becomes the function map
  • The method .join becomes join '' at the beginning

Of course, because Perl doesn’t have it’s own sum, we need to import one from a CPAN module, and as usual I’m using List::AllUtils.

use List::AllUtils qw( sum );

sub alphaIndexDigitSum($str, $k) {
  my $numstr = join '', map { ord($_) - 96 } split //, $str;
  $numstr = sum(split //, $numstr) while ($k-- > 0);
  return $numstr;
}

View the entire Perl script for this task on GitHub.

Python

The big thing I needed to remember in Python is that variables are typed, so I can’t just take a list of characters, convert that into a list of integers, and then just join those integers. I need to convert those integers into strings before joining them, and when I’m summing the digits, I need to convert each digit into an integer, and then convert the sum back to a string

def alpha_index_digit_sum(string, k):
  numstr = ''.join([ str(ord(c) - 96) for c in string ])
  while k > 0:
    numstr = str(sum([ int(n) for n in numstr]))
    k -= 1
  return numstr

View the entire Python script for this task on GitHub.

Elixir

One of the things I wanted to do was avoid having to write a recursive function to loop k times over the numeric string. I realized I could feed a Range into Enum.reduce/3, ignore the element from the range being passed in, and just repeatedly operate on the accumulator (in this case, numstr).

import Enum
import String, except: [to_charlist: 1] # keep Kernel.to_charlist/1

def alpha_index_digit_sum(str, k) do
  numstr = str
  |> graphemes
  |> map(fn c -> hd(to_charlist(c)) - 96 end)
  |> join
  1..k |> reduce(numstr, fn _, numstr ->
    numstr
    |> graphemes
    |> map(fn n -> to_integer(n) end)
    |> sum
    |> Integer.to_string
  end)
end

View the entire Elixir script for this task on GitHub.


Task 2: Valid Token Counter

You are given a sentence.

Write a script to split the given sentence into space-separated tokens and count how many are valid words. A token is valid if it contains no digits, has at most one hyphen surrounded by lowercase letters, and at most one punctuation mark (!, ., ,) appearing only at the end.

Example 1

Input: $str = "cat and dog"
Output: 3

Tokens: "cat", "and", "dog"

Example 2

Input: $str = "a-b c! d,e"
Output: 2

Tokens: "a-b", "c!", "d,e"
"a-b" -> valid (one hyphen between letters)
"c!"  -> valid (punctuation at end)
"d,e" -> invalid (punctuation not at end)

Example 3

Input: $str = "hello-world! this is fun"
Output: 4

Tokens: "hello-world!", "this", "is", "fun"
All satisfy the rules.

Example 4

Input: $str = "ab- cd-ef gh- ij!"
Output: 2

Tokens: "ab-", "cd-ef", "gh-", "ij!"
"ab-"   -> invalid (hyphen not surrounded by letters)
"cd-ef" -> valid
"gh-"   -> invalid
"ij!"   -> valid

Example 5

Input: $str = "wow! a-b-c nice."
Output: 2

Tokens: "wow!", "a-b-c", "nice."
"wow!"  -> valid
"a-b-c" -> invalid (more than one hyphen)
"nice." -> valid

Approach

We’re going to do this with a single regular expression. If we look at the definition of a valid token—no digits, has at most one hyphen surrounded by lowercase letters, and at most one punctuation mark (!, ., ,) appearing only at the end—the pattern looks like

  • An anchor denoting the beginning of the string
  • An optional group of a sequence of one or more lowercase letters followed by a hyphen
  • A sequence of one or more lowercase letters
  • An optional punctuation mark (exclamation mark, period, or a comma)
  • An anchor denoting the end of the string

Raku

I’m still not used to Rakish regular expressions, so I always have to read the documentation to write them. In particular, the parts about non-capturing grouping and pre-defined and enumerated character classes. Another thing I noticed is that, like last week, task 2 looks a lot like task 1. We’re taking a string, splitting it (the first task was into characters, this one into words), passing it through some list processing function (first task map, this one grep) and then some kind of counting.

sub tokenValid($token) {
  $token ~~ /^[<:Ll>+\-]?<:Ll>+<[!.,]>?$/
}

sub validTokenCount($str) {
  $str.split(/\s+/).grep({ tokenValid($_) }).elems
}

View the entire Raku script for this task on GitHub.

$ raku/ch-2.raku
Example 1:
Input: $str = "cat and dog"
Output: 3

Example 2:
Input: $str = "a-b c! d,e"
Output: 2

Example 3:
Input: $str = "hello-world! this is fun"
Output: 4

Example 4:
Input: $str = "ab- cd-ef gh- ij!"
Output: 2

Example 5:
Input: $str = "wow! a-b-c nice."
Output: 2

Perl

For Perl, the work was in re-writing the regex in the Perl syntax most regexes use (thanks to PCRE) and unspooling the chain of methods from Raku into Perl functions.

sub tokenValid($token) {
  $token =~ /^(?:\p{Ll}+\-)?\p{Ll}+[!.,]?$/
}

sub validTokenCount($str) {
  scalar ( grep { tokenValid($_) } split /\s+/, $str )
}

View the entire Perl script for this task on GitHub.

Python

Since we’d like to use unicode character properties to identify lowercase letters, we can’t use Python’s native re library, but we can use the extension library regex. We use a list comprehension of [ x for x in list if conditional ] to filter out the invalid tokens and just count what’s left with len(). One nice thing is we’re able to compile the regex and use it like a function with .match().

import regex

valid_token = regex.compile(r'^(?:\p{Ll}+\-)?\p{Ll}+[!.,]?$')
                            
def valid_token_count(string):
  return len(
    [ w for w in string.split() if valid_token.match(w) ]
  )

View the entire Python script for this task on GitHub.

Elixir

But Elixir’s default regex library handles unicode character properties perfectly well, and the piping makes it easy to see how it’s working.

  def valid_token(w), do:
    Regex.match?(~r/^(?:\p{Ll}+\-)?\p{Ll}+[!.,]?$/, w)

  def valid_token_count(str) do
    str
    |> String.split
    |> Enum.filter(fn w -> valid_token(w) end)
    |> length
  end

View the entire Elixir script for this task on GitHub.


Here’s all my solutions in GitHub: https://github.com/packy/perlweeklychallenge-club/tree/challenge-365-packy-anderson/challenge-365/packy-anderson

Leave a Reply