Oooh! Fun with git and bash!

I stumble across all sorts of things when I browse the web late at night after my wife has gone to sleep.  Last Monday night, I ran across Oh My Zsh, which is a framework for doing all sorts of things with your Z Shell prompt…oh-my-zshand, being a bash bigot, I thought “Hey, I ought to be able to do that in bash!”. I was especially taken by all the git repository information that could be displayed in the prompt, so I went looking and I found https://github.com/magicmonty/bash-git-prompt, which did pretty much the same thing for bash, but without all the fancy angled edges:gitprompt

Now, this was nice, but you’ll note that the pathname for your current directory appears twice: once in your prompt, and once in your titlebar.  I’ve already been putting information in my titlebar for yearsold-prompt-screenshotand I didn’t really want to duplicate information.  So I started looking at how to customize this bash git prompt.

Fortunately, there’s a whole section on configuration in the documentation, and, with just a little tweaking

 

# This theme for gitprompt.sh is optimized for the
# "Solarized Dark" and "Solarized Light" color schemes
# tweaked for Ubuntu terminal fonts

override_git_prompt_colors() {
  GIT_PROMPT_THEME=Custom

  GIT_PROMPT_LEADING_SPACE=0
  GIT_PROMPT_PREFIX=""
  GIT_PROMPT_SUFFIX=""

  GIT_PROMPT_THEME_NAME="Solarized"
  GIT_PROMPT_STAGED="${Yellow}●"
  GIT_PROMPT_CHANGED="${BoldBlue}∆" # delta means change!
  GIT_PROMPT_STASHED="${BoldMagenta}⚑ "
  GIT_PROMPT_CLEAN="${Green}✔"
  GIT_PROMPT_BRANCH="${Yellow}"

  GIT_PROMPT_END_COMMON="_LAST_COMMAND_INDICATOR_ ${BoldBlue}${Time12a}${ResetColor}"
  GIT_PROMPT_END_USER="\n${GIT_PROMPT_END_COMMON} $ "
  GIT_PROMPT_END_ROOT="\n${GIT_PROMPT_END_COMMON} # "

  GIT_PROMPT_START="\[\e]0;\u@\h: \w\007\]"
  PROMPT_START="\[\e]0;\u@\h: \w\007\]"
  PROMPT_END="${GIT_PROMPT_END_COMMON} $ "
}

reload_git_prompt_colors "Solarized"

I was able to turn my bash prompt into something a little more useful when I’m coding!git-prompt-screenshot

Oi. I hate finding out I’ve been hacked late in the evening…

Earlier this evening, I got an email from Google saying that they’d added a new administrator to one of the domains I have.

Except I didn’t make anyone an administrator.

It seems that someone had used some of the security holes in WordPress to set up a shadow website inside one of my idle websites, and they’d just told Google they were an administrator by putting a verification HTML file in the web root.

I’ve removed the file, disabled the idle website, and gone through patching the security holes in my WordPress websites.  I’d rather not be hosting a site that’s providing page-ranks for spammy Chinese and Japanese websites.

Now time for sleep.

Preview of my next post…

My next project in the “whittling wood”?  Figuring out why XML::RSS::LibXML parses these tags without any problem:

<itunes:category text="News &amp; Politics"/>
<itunes:image href="http://media.npr.org/images/podcasts/2013/primary/hourly_news_summary-c464279737c989a5fbf3049bc229152af3c36b9d.png?s=1400"/>

and produces this internal data structure:

      category => bless( {
        _attributes => [
          "text"
        ],
        _content => "",
        text => "News & Politics"
      }, 'XML::RSS::LibXML::MagicElement' ),
      image => bless( {
        _attributes => [
          "href"
        ],
        _content => "",
        href => "http://media.npr.org/images/podcasts/2013/primary/hourly_news_summary-c464279737c989a5fbf3049bc229152af3c36b9d.png?s=1400"
      }, 'XML::RSS::LibXML::MagicElement' ),

But then doesn’t have these tags anywhere in the re-rendered XML when it spits it back out again. I know it has to do with the fact that these tags have no content (there’s no opening and closing tag, there’s just the one tag closed with a />), but I don’t know why XML::RSS isn’t properly rendering it when it converts the data structure back to XML.

The easy work-around would be to just do some string matching, recognize these tags in the original XML, copy them and re-insert them into the rendered XML afterwards.

The more difficult fix is to figure out what’s wrong with XML::RSS and try to fix it myself.

Guess which road I’m taking?

But what if I’ve still got an itch?

Ok, I wanted to write a follow-up post about my little program and the changes that I needed to make to it about a day later, but I found myself writing more and more code, and not having any time to actually write about writing the code. My wife, Kay, has dubbed it “my whittling wood”. So, let’s run down things that started to bother me about my creation…

The first thing that bothered me was when I was walking out of my office the first night. I checked my podcast app, and I didn’t have the 7PM news podcast yet, and it was 7:15PM already. I knew immediately what happened: NPR had been late updating the feed, and my cron job had run at 7:12 and missed it. So I thought about how to fix that problem (I managed to get the 7PM news podcast at 8:12 because NPR was late with the 8PM episode as well, so I was able to pick up the 7PM episode on that run).

I immediately dismissed the idea of running the script multiple times an hour. It didn’t feel clean to me. What I decided I needed to do was check to see if the episode I was looking at this time was different than the episode I was looking at the last time the script ran–I would need a new table to track this–and, if it was the same episode, sleep for a minute or two and try again, and continue retrying until I either got a new episode or I decided I’d waited long enough (20 attempts seemed to be a good cutoff number).

Of course, I also decided I wanted to be able to check up on what was happening, so I needed to write a log file. If I was going to be able to see this log file when I wasn’t home, however, I’d have to copy it up to my web server along with the RSS feed XML.

And this brings me to where I was when I wrote my first post. I already had more code in the script, but I blogged about the first draft, wanting to come back to this second draft with a followup blog post.

And that’s when things got crazy.

We had a big filming day coming up for PacKay Productions that weekend, and I had a lot of work to do, some of which I’d already done and blogged about. After the filming was done, I needed to prep for Halloween.  And even with the changes I’d made to this script, things were going wrong with my setup.

One of the things I did wrong was setting up my wrapper shell script to run the perl program.  I’m not really adept at Bourne shell scripting, and I always leave things out. Then, last Thursday night, I was idly wondering how easy it would be to correct the other major annoyance I have with the NPR Hourly News Summary: the inconsistency of the sound levels.

Sometimes, the news summary is recorded at a good level, and I’m able to hear everything just fine.  Other times, the levels are set so low that even with my player’s volume cranked all the way up and my headphones pressed into my ears, I find it impossible to hear what’s being said over the sounds of the street in New York City.

So, of course, I started looking to see if somebody else had already solved my problem.  I ran across this post in the ask ubuntu StackExchange forums, audacitywhich outlined two solutions: Audacity, an open source visual sound editor I was already intimately familiar with, and SoX, which was billed as “the Swiss Army knife of sound processing programs”.

SoX: the Swiss Army knife of sound processing programs

SoX

SoX is a command line tool for processing audio files, and the more I read about it, the more I liked it.  Normalizing an audio file used to be a two-step process in SoX: running a command once in an analysis mode to get the maximum volume of the file, and a second time to boost that volume to the maximum possible without distortion. However, with version 14.3 of SoX, its developers made all of that possible in one single command:

sox --norm infile outfile

I briefly pondered cloning SoX’s git repository and building from source, but I realized that chances were slight that I was going to be making changes to SoX; I just wanted it as a command line tool.  So I turned to one of the most wonderful things you can have on your Mac: Homebrew.

Homebrew is a package manager for OS X that’s all git and ruby under the hood, and it has a beer theme! It installs software in a “Cellar”. It doesn’t have packages, it has “bottles.  It even uses the beer emoji: ????

Installing new software with Homebrew is painfully easy:

brew install sox

Once I got SoX installed, modifying my code to used it was dead easy.

Finally, I decided to tackle the big thing that I wasn’t doing in the program itself: copying files up to the webserver. At first I looked at Net::Scp, but for some reason I couldn’t get it to work (it kept telling me that my remote directory didn’t exist).  So I switched over to Net::OpenSSH, and I was able to get the copy working.

I also cleaned up the code a lot, and added a ton of comments.  I want this code to be able to document itself, so it’s really obvious what I’m doing and why. Some would say that once a program is working, it’s done.  But when I’m writing code for myself, it’s not done until I’ve commented the heck out of it, because I know myself: a year later, I’m going to come back to this code and think “What was I smoking when I wrote this?”

I doubt I’ll think that when I come back to this code.

#!/Users/packy/perl5/perlbrew/perls/perl-5.22.0/bin/perl -w

use DBI;
use Data::Dumper::Concise;
use DateTime;
use DateTime::Format::Mail;
use LWP::Simple;
use Net::OpenSSH;
use URI;
use XML::RSS;
use strict;

use feature qw( say state );

# define all the things!
use constant {
    URL         => 'http://www.npr.org/rss/podcast.php?id=500005',
    TITLE_ADD   => ' (filtered by packy)',
    TITLE_MAX   => 40, # characters
    SLEEP_FOR   => 120, # seconds (2 minutes)
    MAX_RETRIES => 10,
    KEEP_DAYS   => 7,

    REMOTE_HOST => 'www.dardan.com',
    REMOTE_USER => 'dardanco',
    REMOTE_DIR  => 'www/packy/',

    MEDIA_URL   => 'http://packy.dardan.com/npr',

    TZ          => 'America/New_York',
    LOGFILE     => '/tmp/npr-news.txt',
    XMLFILE     => '/tmp/npr-news.xml',
    IN_DIR      => '/tmp/incoming',
    OUT_DIR     => '/tmp/outgoing',
    DATAFILE    => '/Users/packy/data/filter-npr-news.db',

    SOX_BINARY  => '/usr/local/bin/sox',
};

# list of times we want - different times on weekends
my @keywords = is_weekday() ? qw( 7AM 8AM 12PM 6PM 7PM )
             :                qw( 7AM     12PM     7PM );

my $dbh = get_dbh();  # used in a couple places, best to be global

my $rss;   # these two vars are only used in the main code block,
my $items; # but can't be scoped to the foreach loop

# since, for cosmetic reasons, we're starting the count at 1, we need
# to loop up to MAX_RETRIES + 1; otherwise, we'll only have the first
# attempt and then (MAX_RETRIES - 1).  If I'd called the constant
# MAX_ATTEMPTS then it would make sense to start at zero...
foreach my $retry (1 .. MAX_RETRIES + 1) {

    # get the RSS
    write_log("Fetching " . URL);
    my $content = get(URL);

    # parse the RSS using a subclass of XML::RSS
    $rss = XML::RSS::NPR->new();
    $rss->parse($content);
    write_log("Parsed XML");

    $items = $rss->_get_items;

    # if a new show was published in the feed, we don't need to wait
    # in a loop for a new one
    last unless same_show_as_last_time( $items );

    # we don't want the script to wait forever - if no new episode
    # appears after a maximum number of retries, give up and generate
    # the feed with the episodes we have
    if ($retry > MAX_RETRIES) {
        write_log("MAX_RETRIES (".MAX_RETRIES.") exceeded");
        last;
    }

    # for debugging purposes, I want to be able to not have the script
    # sleep, and the choices were add command line switch processing
    # or check an environment variable. This was the simpler option.
    if ($ENV{NPR_NOSLEEP}) {
        last;
    }

    # log the fact that we're sleeping so we can observe what the
    # script is doing while it's running
    write_log("Sleeping for ".SLEEP_FOR." seconds...");

    # since I usually want to listen to these podcasts when I'm away
    # from my desktop computer, copy the log file up to the webserver
    # so I can check on it remotely.  this way, if it's spending an
    # inordinate amount of time waiting for a new episode, I can see
    # that from my phone's browser...
    push_log_to_remotehost();

    # actually sleep
    sleep SLEEP_FOR;

    # and note which number retry this is
    write_log("Trying RSS feed again (retry #$retry)");
}

# test to see if the new item matches our inclusion criteria, and then
# fill the item list with items we've cached in our database
get_items_from_database( $items );

# make new RSS feed devoid of the original items... ok, ITEM
$rss->clear_items;

foreach my $item ( @$items ) {
    $rss->add_item(%$item);
}

re_title($rss);

write_log("Writing RSS XML to " . XMLFILE);
open my $fh, '>', XMLFILE;
say {$fh} $rss->as_string;
close $fh;
push_xml_to_remotehost();

#################################### subs ####################################

sub get_items_from_database {
    my $items = shift;

    # build the regex for matching desired episodes from keywords
    my $re = join "|", @keywords;
    $re = qr/\b(?:$re)\b/i;

    my $insert = $dbh->prepare("INSERT INTO shows (pubdate, item) ".
                               "           VALUES (?, ?)");

    my $exists_in_db = $dbh->prepare("SELECT COUNT(*) FROM shows ".
                                     " WHERE pubdate = ?");

    # I know the feed only has the one item in it, but it SHOULD have
    # more, so let's go through the motions of checking each item

    foreach my $item (@$items) {

        # pawn off the specifics of how we get the information to a sub
        my ($epoch, $title) = item_info($item);

        # again, for debugging purposes, I wanted to be able to not
        # have the script skip the current item, and the choices were
        # add command line switch processing or check an environment
        # variable. This was the simpler option.

        if ($title !~ /$re/ &amp;&amp; ! $ENV{NPR_NOSKIP}) {
            write_log("'$title' doesn't match $re; skipping");
            next;
        }

        # check to see if we already have it in the DB
        $exists_in_db->execute($epoch);
        my ($exists) = $exists_in_db->fetchrow;

        if ($exists > 0) {
            write_log("'$title' already in database; skipping");
            next;
        }

        # the NPR news podcast is notoriously bad at normalizing the
        # volume of its broadcasts; some are easy to hear and some are
        # so quiet it's impossible to ehar them when listening on a
        # city street, so, let's normalize them to a maximum volume

        normalize_audio($item);

        write_log("Adding '$title' to database");

        # it's easier to store the data in the episode cache table as
        # a perl representation of the parsed data than it is to
        # serialize it back into XML and then re-parse it when we need
        # it again.
        $insert->execute($epoch, Dumper($item));
    }

    # go through the database and dump episodes that are older than
    # our retention period.  Since we're using epoch time (seconds
    # since some date, usually midnight 1970-01-01) as the key to our
    # episode cache table, it's really easy to determine which
    # episodes are too old

    my $now     = DateTime->now();
    my $too_old = $now->epoch - (KEEP_DAYS * 24 * 60 * 60);
    $dbh->do("DELETE FROM shows WHERE pubdate < $too_old");

    # now let's fetch the episodes from the episode cache table in
    # oldest-first order. Again, since we're keyed on the episode's
    # publish date in epoch time, we can do this with a simple numeric
    # sort.
    my $query = $dbh->prepare("SELECT * FROM shows ORDER BY pubdate");
    $query->execute();

    @$items = ();
    while ( my($pubdate, $item) = $query->fetchrow ) {

        # just blindly evaluating text is a potential security problem,
        # but I know all these entries came from me writing dumper-ed code,
        # so I feel safe in doing so...
        my $evaled = eval $item;

        push @$items, $evaled;

        # log which episodes we're putting into the feed
        my ($epoch, $title) = item_info($evaled);
        write_log("Fetched '$title' from database; adding to feed");
    }
}

sub same_show_as_last_time {
    my $items = shift;

    # so we know when the feed is late in publishing a new item,
    # we have a table that stores the publication date of the last
    # episode we saw.  It also stores the title of the episode so
    # we can log which episode it was.

    my $get_last_show = $dbh->prepare("SELECT * FROM last_show");

    # get the information for the current episode
    my ($epoch, $title) = item_info($items->[0]);

    # fetch the last epsiode from the DB
    $get_last_show->execute;
    my ($last_time, $last_title) = $get_last_show->fetchrow;

    # save the episode we just fetched for next time
    my $update = $dbh->prepare("UPDATE last_show SET pubdate = ?, title = ? ".
                               " WHERE pubdate = ?");
    $update->execute($epoch, $title, $last_time);

    # now compare the current episode with the one we got from the DB
    my $is_same = ($last_time == $epoch);

    if ($is_same) {
        write_log("RSS feed has not updated since '$last_title' was published");
    }

    return $is_same;
}

#################################### audio ####################################

sub filename_from_uri {
    my $uri = shift;

    # abstract out the complexities of fetching the filename from a
    # URI so the code will read easier; in this case, we're
    # instantiating a new URI class object and calling path_segments()
    # to get the segments of the path, and then returning the last
    # element, which is going to be the filename.

    return( ( URI->new($uri)->path_segments )[-1] );
}

sub normalize_audio {
    my $item = shift;
    my $uri  = item_url($item);
    my $file = filename_from_uri($uri);

    # perl idiom for "if directory doesn't exist, make it"
    -d IN_DIR  or mkdir IN_DIR;
    -d OUT_DIR or mkdir OUT_DIR;

    # construct fill pathnames to the file we're downloading and
    # then normalizing to
    my $infile  = join '/', IN_DIR,  $file;
    my $outfile = join '/', OUT_DIR, $file;

    # fetch the MP3 file using LWP::Simple
    my $code = getstore($uri, $infile);
    write_log("Fetched '$uri' to $infile; RESULT $code");
    return unless $code == 200;

    # if, for some reason, we don't have the program to normalize audio,
    # crash with a message complaining about it being missing
    -x SOX_BINARY
        or die "no executable at " . SOX_BINARY;

    # call SoX to normalize the audio
    write_log("Normalizing $infile to $outfile");
    system join(q{ }, SOX_BINARY, '--norm', $infile, $outfile);

    # the feed doesn't publish an item length in bytes, but it really
    # ought to, so let's get the size of the MP3 file.
    my $size = -s $outfile || 0;

    # re-write the bits of the item we're changing
    item_url($item, join '/', MEDIA_URL, $file);
    item_length($item, $size);

    # send the normalized MP3 file up to the webserver
    push_media_to_remotehost($outfile);

    # clean up after ourselves
    unlink $infile;
    unlink $outfile;
}

#################################### db ####################################

sub get_dbh {
    my $file = DATAFILE;

    # check to see if the datafile exists BEFORE we connect to it
    my $exists = -f $file;

    my $dbh = DBI->connect(          
        "dbi:SQLite:dbname=$file", 
        "",
        "",
        { RaiseError => 1}
    ) or die $DBI::errstr;

    # if the datafile didn't exist before we connected to it, let's set up
    # the schema we're using
    unless ($exists) {
        $dbh->do("CREATE TABLE shows (pubdate INTEGER PRIMARY KEY, item TEXT)");
        $dbh->do("CREATE INDEX shows_idx ON shows (pubdate);");
        $dbh->do("CREATE TABLE last_show (pubdate INTEGER PRIMARY KEY, ".
                 "                        title   TEXT)");
    }

    return $dbh;
}

#################################### time ####################################

sub now {
    # set the time zone in the DateTime object, so we get non-UTC time
    return DateTime->now( time_zone => TZ );
}

sub is_weekday {
    # makes our code easier to read
    return now()->day_of_week < 6;
}

################################### copying ###################################

sub push_to_remotehost {
    my ($from, $to) = @_;

    my $connect = join '@', REMOTE_USER, REMOTE_HOST;

    state $ssh = Net::OpenSSH->new($connect);

    write_log("Copying $from to $connect:$to");

    if ( $ssh->scp_put($from, $to) ) {
        write_log("Copy success");
    }
    else {
        write_log("COPY ERROR: ". $ssh->error);
    }
}

# helper functions to make the code easier to read

sub push_xml_to_remotehost {
    push_to_remotehost(XMLFILE, REMOTE_DIR);
}

sub push_log_to_remotehost {
    push_to_remotehost(LOGFILE, REMOTE_DIR);
}

sub push_media_to_remotehost {
    my $from = shift;
    push_to_remotehost($from, REMOTE_DIR . 'npr/');
}

################################### logging ###################################

sub write_log {
    # I'm opening and closing the logfile every time I write to it so
    # it's easier for external processes to monitor the progress of
    # this script
    open my $logfile, '>>', LOGFILE;

    my $now = now();
    my $ts  = $now->ymd . q{ } . $now->hms . q{ };

    # I don't write multiple lines yet, but I might want to!
    foreach my $line ( @_ ) {
        say {$logfile} $ts . $line;
    }

    close $logfile;
}

BEGIN {
    unlink LOGFILE; # write a new log each time we run
    write_log('Started run'); # log that the run has started

    # register a DIE handler that will write whatever message I die() with
    # to our logfile so I can see it in the logs
    $SIG{__DIE__} = sub {
        my $err = shift;
        write_log('FATAL: '.$err);
        # if we die(), after this runs, the END block will be executed!
    };
}

END {
    # when the program finishes, log that
    write_log('Finished run');

    # and, so I can see these logs remotely, push them up to the webserver
    push_log_to_remotehost();
}

##################################### XML #####################################

sub re_title {
    my $rss = shift;

    # append some text to the channel's title so I can differentiate
    # this feed from the original feed in my podcast app

    my $existing_title = $rss->channel('title');
    my $add_len        = length(TITLE_ADD);

    if (length($existing_title) + $add_len > TITLE_MAX) {
        $existing_title = substr($existing_title, 0, TITLE_MAX - $add_len - 1);
    }

    $rss->channel('title' => $existing_title . TITLE_ADD);
}

sub item_info {
    state $mail = DateTime::Format::Mail->new; # only initialized once!

    my $item  = shift;
    my $title = fix_whitespace($item->{title});
    my $dt    = $mail->parse_datetime($item->{pubDate});
    my $epoch = $dt->epoch;
    return $epoch, $title;
}

sub fix_whitespace {
    my $string = shift;

    # multiple whitespace compressed to a single space
    $string =~ s{\s+}{ };

    # remove leading and trailing spaces
    $string =~ s{^\s+}{}; $string =~ s{\s+$}{};

    return $string;
}

# let's define some pseudo-accessors (since these are unblessed
# hashes, not objects) that will make our code easier to read

sub enclosure_pseudo_accessor {
    my $hash = shift;
    my $key  = shift;
    if (@_) {
        $hash->{enclosure}->{$key} = shift;
    }
    return $hash->{enclosure}->{$key};
}

sub item_url {
    my $hash = shift;
    enclosure_pseudo_accessor($hash, 'url', @_);
}

sub item_length {
    my $hash = shift;
    enclosure_pseudo_accessor($hash, 'length', @_);
}

# since XML::RSS doesn't provide a method to clear out the items in an
# already-parsed feed, I'm creating a subclass to provide that
# functionality rather than just executing code that manipulates the
# internal data structure of the object in my main program

package XML::RSS::NPR;
use base qw( XML::RSS );

sub clear_items {
    my $self = shift;
    $self->{num_items} = 0;
    $self->{items} = [];
}

# since we're creating a subclass, we can override the default XML
# modules that are used to be the ones we need - no calling
# add_module() from our main program!

sub _get_default_modules {
    return {
        'http://www.npr.org/rss/'                    => 'npr',
        'http://api.npr.org/nprml'                   => 'nprml',
        'http://www.itunes.com/dtds/podcast-1.0.dtd' => 'itunes',
        'http://purl.org/rss/1.0/modules/content/'   => 'content',
        'http://purl.org/dc/elements/1.1/'           => 'dc',
    };
}

__END__

Read it on GitHub: filter-npr-news

Blogging for work

Not of real interest to anybody who doesn’t use BMC Database Automation, but I’m doing a little bit of blogging for my day job as well.

Scratching my itch

It’s been a while since I wrote some code to scratch purely my own itch. Most of my time is spent writing code to scratch my employer’s itches, and occasionally I get to write little programs that scratch small itches I get while writing code for my employer — things like extensions to git-p4 that allow me to pull information from a git repository and use it to generate merge commands for Perforce, so I don’t have to figure out which commits/changes I want to merge.

I know; nothing someone else would be interested in.

But I’ve had an itch for a little while that someone else might be interested in. I listen to the NPR Hourly News Summary via my podcast app on my Nexus 6. The web page might list a bunch of back episodes, but the RSS feed only publishes the most recent summary. But I want to listen to SOME older episodes, just not all of them. What I had been doing was having my podcast app keep every episode and then mark the ones I wasn’t interested in as done, but that was tedious, especially considering I only wanted to listen to four or five of the 24 episodes published each day.

So I thought about what I wanted. I wanted a program that would fetch the RSS feed every hour and check to see if the currently published episode was one of the ones I wanted, and, if it was, store it in a database and then spit out a new RSS feed with the last N episodes I’d stored in the database. I realized I didn’t need to actually fetch the episodes themselves, because NPR doesn’t remove the episodes after they disappear from the RSS feed (as evidenced by the web page with multiple episodes). I also realized I didn’t need to generate this new RSS feed dynamically: NPR’s feed only gets updated once an hour, so I only needed to generate my feed once NPR’s feed was updated, and, since I wasn’t generating the feed every time my podcast app asked for it, I could generate the feed on my desktop computer and then copy the XML file up to my web server (since my desktop has way more computing power than my web server).

And, of course, I wanted to use perl, because that’s my favorite programming language.

One of perl’s strengths is that, whatever you want to do, there’s probably a CPAN module that will do the heavy lifting for you. There’s also a Perl Cookbook for commonly used patterns in perl programming.  I found the recipe for Reading and Writing RSS Files, and there was an example for filtering an RSS feed and generating a new one. The example uses LWP::Simple to fetch the RSS feed, XML::RSSLite to parse the feed, and  XML::RSS to generate a new RSS feed. The cookbook even states “It would be easier, of course, to do it all with XML::RSS”. So I did.

Actually, I didn’t rewrite the RSS too much. Rather than building a completely new RSS feed, I used XML::RSS to parse the feed and extract the one item from it. But even though XML::RSS has a method for adding items to the feed, it doesn’t have a method for removing items from the feed. This left me with no choice but to dig through the source code of XML::RSS and figure out what was necessary to clear out the list of items. Once I cleared out the one item out of the feed, I re-loaded the feed with the items I’d stored.

Wait… I had stored items, right?  Oh, crud, I forgot about that part.  Ok, I need to store the last N items. I could use a text file, but that’s difficult to manage.  I could set up a database in PostgreSQL or MySQL, but that’s a lot of overhead for just storing a bit of data. If only there was a self-contained, serverless, zero-configuration, transactional SQL database engine.  Something like… SQLite!

So I set up a simple schema; one table with two columns: one to hold the timestamp of the episode, and one to hold the block data I needed to shove the episode back into the feed. Since it’s SQLite, I didn’t expect the datafile to exist the first time I ran it, so I put in a test to see if the data file existed before I connected to it, and, if it didn’t, create the schema.

The rest is fairly straightforward; I checked the current episode extracted from the feed to see if it was one of the times I wanted. If it wasn’t, I just skipped ahead to generating the new feed. If it was one of the episodes I wanted, I checked to make sure I didn’t already have it in the database.  If I did, I skipped ahead.  If it wasn’t in the database, I added it to the database and then deleted everything in the database older than 7 days.

#!/Users/packy/perl5/perlbrew/perls/perl-5.22.0/bin/perl -w

use DBI;
use Data::Dumper::Concise;
use DateTime::Format::Mail;
use LWP::Simple;
use XML::RSS;
use strict;

use feature 'say';

# list of times we want
my @keywords = qw( 7AM 8AM 12PM 7PM );
my $days_to_keep = 7;

# get the RSS
my $URL = 'http://www.npr.org/rss/podcast.php?id=500005';
my $content = get($URL);

# parse the RSS
my $rss = XML::RSS->new();
$rss->parse($content);


my @items = get_items( $rss->_get_items );

# make new RSS feed
$rss->{num_items} = 0;
$rss->{items} = [];

foreach my $item ( @items ) {
    $rss->add_item(%$item);
}

say $rss->as_string;


sub get_items {
    my $items = shift;

    # build the regex from keywords
    my $re = join "|", @keywords;
    $re = qr/\b(?:$re)\b/i;

    my $mail = DateTime::Format::Mail->new;

    my $dbh = get_dbh();

    my $insert = $dbh->prepare("INSERT INTO shows (pubdate, item) ".
                               "           VALUES (?, ?)");

    my $exists_in_db = $dbh->prepare("SELECT COUNT(*) FROM shows ".
                                     " WHERE pubdate = ?");

    foreach my $item (@$items) {
        my $title = $item->{title};
        $title =~ s{\s+}{ };  $title =~ s{^\s+}{}; $title =~ s{\s+$}{};

        if ($title !~ /$re/) {
            next;
        }

        my $dt = $mail->parse_datetime($item->{pubDate});
        my $epoch = $dt->epoch;

        $exists_in_db->execute($epoch);
        my ($exists) = $exists_in_db->fetchrow;
        if ($exists > 0) {
            next;
        }

        $insert->execute($epoch, Dumper($item));
    }

    my $now = DateTime->now();
    my $too_old = $now->epoch - ($days_to_keep * 24 * 60 * 60);
    $dbh->do("DELETE FROM shows WHERE pubdate < $too_old"); my $query = $dbh->prepare("SELECT * FROM shows ORDER BY pubdate");
    $query->execute();

    my @list;
    while ( my($pubdate, $item) = $query->fetchrow ) {
        push @list, eval $item;
    }

    return @list;
}

sub get_dbh {
    my $file = '/Users/packy/data/filter-npr-news.db';
    my $exists = -f $file;

    my $dbh = DBI->connect(          
        "dbi:SQLite:dbname=$file", 
        "",
        "",
        { RaiseError => 1}
    ) or die $DBI::errstr;

    unless ($exists) {
        # first time - set up database
        $dbh->do("CREATE TABLE shows (pubdate INTEGER PRIMARY KEY, item TEXT)");
    }
    return $dbh;
}

And it worked!  I then created a git repository on my desktop for it, and pushed it up to my GitHub account under its own project, filter-npr-news.

And that kept me satisfied for a whole day.

Next time, I’ll write about what started bothering me and the changes I made to fix that.

Learning Puppetry tech…

caroll-spinneyOne of the problems I have when I’m puppeteering Rudy Monster is that I’m stuck underneath his huge furry body and it’s really difficult to see the monitors that Kay and Jen are using to watch their own performances. So I started thinking about how to give myself a small, portable monitor that I could use while I was under Rudy.

I’d read about Caroll Spinney’s “electronic bra” that he wears inside of Big Bird, but that has a fairly large CRT video monitor on it, and I was sure that modern technology could provide me with something much, much better.

VideoGlasses

The video glasses that wouldn’t work…

At first, I tried using a pair of Video Glasses.  I figured that I could just wear these on my face and I would be able to puppeteer with the best of them. Unfortunately, I hadn’t counted on two things, the first logistical, and the second technical. First, with these glasses on, I couldn’t see anything but what the camera was showing me. No script, no other puppeteers, no nothing. I was trapped inside the glasses until I took them off.

Second, and much more importantly, they only worked for ten minutes at a time, and then they conked out. If I let them rest for a half hour or so, they’d work again, but I couldn’t afford to only get 10-20 minutes of use out of every hour; I needed to be able to use these for long puppeteering sessions. So I scrapped the glasses and went back to puppeteering blind.

But, of course, that didn’t work very well, either. I constantly had to get coaching from my coworkers about Rudy’s eye focus, and I had to feel my way through everything. Not the way I like to puppeteer.

Finally, I decided to build my own rig to approximate what Caroll Spinney has.  Looking at the pictures of Caroll’s getup, I started figuring out how I could get the appropriate parts. Of course, trying to get a CRT video monitor these days is nigh-impossible, but I found links to LCD flat-screen monitors that have the same 16:9 aspect ratio (widescreen) that our camera has. These screens were being sold for retrofitting cars that didn’t have backup cameras with aftermarket cameras.

And then I saw a related link for a wireless transmitter that allowed you to transmit video from your camera at the back of your car to your monitor on your dashboard, without wires. This was what I needed.

So I got the following equipment:

First I tested the monitor and the battery. Both had all the connectors they needed, so I was able to take the Y-splitter lead that came with the battery and plug one end into the monitor’s power cable and the other into the power supply that came with the battery. I plugged in the video RCA out from our camera and… voilá! I was watching what the camera was recording on the tiny monitor. At the very least, my getup would work if it was hardwired.

Next, I wanted to get the wireless transmission working. The pieces of the wireless transmitter were meant to be hardwired into a car, so they had bare wires as power leads. I was prepared for this (remember, I bought the package of connectors), but then I had a sudden thought: this was DC power, and unlike AC power, with DC power, polarity matters. With my misadventures in LED wiring still fresh in my mind, I popped off to Google to make sure I had my polarity correct. Finding the Wiring Color Codes reference in the free electrical engineering textbook “Lessons in Electric Circuits“, I read the following (the last sentence was bolded by me):

US DC power: The US National Electrical Code (for both AC and DC) mandates that the grounded neutral conductor of a power system be white or grey. The protective ground must be bare, green or green-yellow striped. Hot (active) wires may be any other colors except these. However, common practice (per local electrical inspectors) is for the first hot (live or active) wire to be black and the second hot to be red. The recommendations in Table below are by Wiles. [JWi] He makes no recommendation for ungrounded power system colors. Usage of the ungrounded system is discouraged for safety. However, red (+) and black (-) follows the coloring of the grounded systems in the table.

Ok, so these red wires were positive, and the black wires were negative. And… oh, lovely! This wiring diagram for the product even states that! This is what I get for not reading the documentation before diving in…

Documentation? How quaint…

So I hooked up a female DC power connector to the transmitter’s power, making sure the red wire was going into the positive terminal (then screwing it down with a #0 phillips head) and the black wire was going into the negative terminal (screw, screw, screw). Then I plugged the male power plug from the extra power supply I got and saw the little indicator light on the transmitter light up.

So far, so good.

I did the same with a male CD power connector on the receiver’s power, and then plugged it into the the Y-splitter lead that came with the battery.

BatteryCord

I picked the male power connector because the battery pack had a female power port, and the monitor had a female power plug.  This meant that the cable above would let me plug in a male plug and not have to wire up anything special.

And when I turned on the battery, the indicator light on the receiver lit up.  So, holding my breath, I plugged in our camera to the transmitter… and… success!

I was so excited, I put on the harness and took the camera upstairs to my wife.

“What’s that?” she asked, indicating the picture on the diminutive screen.

“That’s the sewing machine. Downstairs.”

I’ll post a picture of this rig in action after we film with it this coming Sunday, but I couldn’t wait to post about assembling this setup.  It felt good.

GNU Terry Pratchett

In Terry Pratchett’s Discworld series, the clacks are a series of semaphore towers loosely based on the concept of the telegraph. Invented by an artificer named Robert Dearheart, the towers could send messages “at the speed of light” using standardized codes. Three of these codes are of particular import:

  • G: send the message on
  • N: do not log the message
  • U: turn the message around at the end of the line and send it back again

When Dearheart’s son John died due to an accident while working on a clacks tower, Dearheart inserted John’s name into the overhead of the clacks with a “GNU” in front of it as a way to memorialize his son forever (or for at least as long as the clacks are standing.)

“A man is not dead while his name is still spoken.”
Going Postal, Chapter 4 prologue

Keeping the legacy of Sir Terry Pratchett alive forever.
For as long as his name is still passed along the Clacks[simple_tooltip style=’border-bottom: 1px dotted #00F’ content=’Nowadays called the Internet’]*[/simple_tooltip],
Death can’t have him.

http://www.gnuterrypratchett.com/

Boosting the signal

WorldVentures Marketing, LLC
Phone: (972) 805-5100
Fax: (972) 767-3139
5360 Legacy Dr STE 300 Bldg 1, Plano, TX 75024-3135

WorldVentures describes itself as “the world’s largest direct seller of curated group travel, with more than 120,000 Independent Representatives in over 24 countries and we are still growing.”

But other people describe WorldVentures differently. Some press has been unflattering.  Bloggers and commentators openly call it a scam or a scheme. The Better Business Bureau gives it a B- (the Better Business Bureau site says in big bold letters “This Business is not BBB Accredited”).

Now they’re suing a blogger who, after encountering a WorldVentures marketeer, had the temerity to write a post in her blog entitled “WorldVentures: This Is NOT The Way To Travel The World”.

Now she’s being sued by WorldVentures. Read more about it here:
Popehat Signal: Help A Blogger Threatened By A Multi-Level Marketer WorldVentures

Will WorldVentures sue me, too, for linking to this story?  We’ll find out.

Once again, we’re trying to fill positions at my day job…

(We don’t have it posted with our corporate careers site yet, so I’m blogging it here.) We’re in NYC. If this looks interesting to you, drop Jennifer a line!
Continue reading