E2 node autolinker in perl

(thing) by eric+ Thu Jun 01 2000 at 23:06:59

After a day here on Everything, I wrote an article called "my first day on everything" in which I lamented about being short of XP. Someone, I don't know who, soft linked me to "learn how to integrate" and I am now and forever in his or her debt.

From "learn how to integrate" I learned the importance of taking five minutes to follow the hard links in the writeup to create soft links back to me.

But, since I'm a little bit of a geek boy, I thought to myself, Why spend five minutes doing something simple when I can spend a couple of hours to write a program to do the very same thing?

So I did.

Here's a short perl script that looks at your E2 node and automagically creates soft links for you.

If nothing else, it's a nice example of how to use HTML::Parser and LWP.


#!/usr/bin/perl -w

#
# $Id: e2autonoder.pl,v 1.2 2001/02/26 06:02:17 eric Exp $
#

use strict;

use HTML::Parser;
die "HTML::Parser needs to be version 3.x or higher!" 
    if ( $HTML::Parser::VERSION < 3 );

use LWP::UserAgent;


exit main();

##############################################################
############################################################## main
##############################################################
sub main {
    my $node_id = $ARGV[0] || return _usage();

    my $content = _get_e2_node( $node_id );          # download the node
    if ( !defined($content) ) {
        print "Error getting node no. $node_id.\n";
        return 0;
    }

    my $parser = _e2_parser();                       # parse the node
    $parser->parse( $content );

    my %temp = map { $_ => 1 } @{$parser->{links}};  # make a unique list
    my @unique_uris = keys( %temp );

    _set_e2_soft_links( \@unique_uris );              # set soft links to node

    return 0;
}


##############################################################
############################################################## e2 interface
##############################################################
sub _get_e2_node {
    my $node_id = shift() || return undef;

    my $ua = LWP::UserAgent->new();
    $ua->agent( "e2autonoder" );
    my $req = HTTP::Request->new( "GET", 
        "http://www.everything2.com/index.pl?node_id=$node_id" );
    my $response = $ua->request( $req );

    return undef unless ( $response->is_success() );
    return $response->content();
}

sub _set_e2_soft_links {
    my $uri_ref = shift();

    my $ua = LWP::UserAgent->new();
    $ua->agent( "e2autonoder" );

    foreach my $uri ( @$uri_ref ) {
        my $req = HTTP::Request->new( "GET", $uri );
        my $response = $ua->request( $req );
        if ( $response->is_success() ) {
            if ( $response->content() =~ />Here's the stuff:/ ) {
                print "NAK $uri\n";
            }
            else {
                print "OK $uri\n";
            }
        }
        else {
            print STDERR "ERROR $uri\n";
        }
    }
}

##############################################################
############################################################## e2 node parser
##############################################################
###
### This is the E2 parser.  It's only job is to find and
### store the links in the actual writeup.  The writeup
### is defined as being the second table row after the
### topic (i.e. node name) is displayed
###
sub _e2_parser {
    my $p = HTML::Parser->new( api_version => 3 );
    $p->handler( default => "" );
    $p->handler( start => \&_start, 'self, tagname, attr' );
    $p->handler( end => \&_end, 'self, tagname' );
    
    $p->{flag_topic} = 0;       # set when we get to the topic
    $p->{flag_writeup} = 0;     # set when $p is looking at the writeup
    $p->{links} = [];           # list of nodes linked to
    $p->{tr} = 0;               # incremented at <tr>, decremented at </tr>
    $p->{tr_rows} = 0;          # # of table rows we've seen

    return $p;
}

sub _end {
    my ($self, $tagname) = @_;

    if ( $tagname eq 'tr' ) { 
        ($self->{tr})--; 
        if ( $self->{tr} == 0 ) {
            $self->{tr_rows}++;
        }
        if ( $self->{flag_writeup} ) { $self->{flag_writeup} = 0; }
    }
}

sub _start {
    my ($self, $tagname, $attr) = @_;

    if ( $tagname eq 'h1' && $$attr{class} eq 'topic' ) {
        $self->{flag_topic} = 1;
        $self->{tr_rows} = 0;
        $self->{tr} = 0;
    }
    elsif ( $tagname eq 'tr' ) {
        ($self->{tr})++; 
        if ( $self->{flag_topic} && $self->{tr_rows} == 1 ) {
            $self->{flag_writeup} = 1;
        }
    }
    elsif ( $tagname eq 'a' && $self->{flag_writeup}) {
        my $cgi_param_string = $$attr{href} or return;
        $cgi_param_string =~ s/^.*?\?//;
        my $uri = "http://www.everything2.com/index.pl?$cgi_param_string";
        push( @{$self->{links}}, $uri );
    }

}


##############################################################
############################################################## miscellany
##############################################################
sub _usage {
    print <<__HERE__;
usage: e2autonoder.pl <node_id>
       where node_id is the id number of your writeup, *not* the full node.
__HERE__
}

I'll leave it as an exercise for the reader to edit the script to confirm that the links were made.

For hard links, see the E2 node autolinker.

Update: 20010225, if a soft-link can't be made because the target node does not exist, "NAK" is printed instead of "OK". Kudos to dmd for the suggestion.

(thing) by sleeping wolf Mon Feb 12 2001 at 22:22:56

This script is based upon the above script, but has a different purpose: to softlink a list of nodes to a given node. It doesn't require HTML::Parser to figure out what the node id is, and will automatically choose the node with a given title, as opposed to the user or other object. I find it to be of good use following the suggestions given in The Perfect Node, as well as connecting nodeshells (both mine and those of others) to the rest of the database so that someone might find them.

#!/usr/bin/perl -w
# e2autolinker.pl - Allows one to automatically softlink a list of writeups to a node.
#
# Portions Copyright (C) 2001 Arthur Shipkowski "sleeping wolf" <Art_Kowolf something like an at-sign yahoo.com>
# Portions Copyright (C) 2000,2001 Will Woods <wwoods@cowofdoom.com>
# Portions Copyright (C) 2000,2001 eric+
# Distributed under the terms of the GNU General Public License,
# included here by reference.
#
# send comments, questions, and stories to the top address above, or just /msg me.
#
# To-do:
# * Perform the functionality of eric+'s original script as an option, but use HTML::LinkExtor
#    instead (this means I can use the current version of ActivePerl on Windows without 
#   having to actually take the five minutes to download a newer version of HTML::Parser
# * Allow one to use a regexp along with an 'ignore exact' search to make it possible to 
#   quickly link a set of related nodes, such as an autonoded text.
#
# history: 
# v1.0.0: (initial release)
# v1.1.0: Major rewrite to use the 'Cow of Doom' routines and to login since anon. softlinking
#         is disallowed.
# v1.1.1: Updated login routine, since the format changed.  Also upgraded to the latest
#         softlinking routine.
# v1.1.2: Switched over to use the robot Useragent and displaytype=null.  This removes
#         the ability to tell whether or not the softlink exists, but it does make the strain on the
#         E2 server go down.
# v1.2.0: Fixed the login method again and removed the RobotUA since robots are mass-banned.



use strict;

use LWP::UserAgent; # these are both part of libwww-perl, available
use HTTP::Cookies;  # at your friendly local CPAN mirror

$0="$0"; # Perl magic to clean the commandline from the process list
my $version = v1.2.0;
my $node_id;
my $node = $ARGV[2] || _usage();
my $hostroot="http://www.everything2.com";
my $baseurl="${hostroot}/index.pl";
my $login = $ARGV[0];
my $pass = $ARGV[1];
my $ua = LWP::UserAgent->new("e2autolinker-sw" , "Art_Kowolf something like an at sign yahoo.com");
$ua->env_proxy();
my $cookies = HTTP::Cookies->new();
$ua->cookie_jar($cookies);


&login($login,$pass) or die("Login failed.");
sleep(10);

$_ = gete2node( $node );          # download the node
if ( !defined($_) ) {
    print "Error getting node $node.\n";
    return 0;
}
sleep(10);

if (/&lastnode_id=([0-9]*)/) { $node_id = $1; }

   _set_e2_soft_links( $node_id );          # set soft links to node 




sub gete2node {
# takes one argument: $node_title
# assumes that $ua is a valid HTTP::UserAgent object
# returns the contents of the page in a scalar variable
# example: $page = getnode($node_id);
  my $req = HTTP::Request->new('GET', "$baseurl?node=$_[0]&type=e2node");
  return($ua->request($req)->content());
}

sub login {
# takes two arguments: $username, $password
# assumes that $ua is a valid HTTP::UserAgent object
# returns true on success, false on failure
# example: login($username, $password) or die "failed";
  my $req = HTTP::Request->new('POST', "$baseurl");
  $req->content_type('application/x-www-form-urlencoded');
  $req->content("op=login&user=$_[0]&passwd=$_[1]&displaytype=null");
  my $response = $ua->request($req);
  return($cookies->as_string() ne "");
}

sub _set_e2_soft_links {
    my $last_node = shift();
   
    foreach my $title ( <STDIN> ) {
        my $uri = "http://www.everything2.com/index.pl?node=$title&lastnode_id=$last_node&displaytype=null";
        my $req = HTTP::Request->new( "GET", $uri );
        my $response = $ua->request( $req );
        if ( $response->is_success() ) {
            print "ACCESSED $uri\n";
        }
        else {
            print STDERR "ERROR $uri\n";
        }
        sleep(10);
    }
}


##############################################################
############################################################## miscellany
##############################################################
sub _usage {
        print <<__HERE__;
usage: e2autolinker.pl <username> <password> <node_title>
username is your username.  (Softlinks aren't quite anonymous)
password is your password.
node_title is the title for the node, quoted if need be.

Node titles to softlink will be read from STDIN.
__HERE__

exit(1);
}
Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.