All the Perl you need

for slicing and dicing text files

Brennen Bearnes <bbearnes@wikimedia.org>

Source: code.p1k3.com/gitea/brennen/wmf-engprod-offsite-slides
Slides: squiggle.city/~brennen/perl/

Agenda

  1. A capsule history
  2. What is Perl actually like?
  3. Basic syntax and techniques
  4. Examples

A capsule history

What’s Perl actually like?

What’s Perl actually like?

A general-purpose, multi-paradigm programming language.

That also has a lot in common with the Unix shell.

What’s Perl actually like?

General-purpose, multi-paradigm programming language.

That also has a lot in common with the Unix shell.

So: A widely-installed environment where you can combine small tools to work with data.

A sample data file, tab-separated names of some authors:

$ column -t examples/authors.tsv
Robinson  Eden
Waring    Gwendolyn  L.
Brunner   John
Tolkien   John       Ronald  Reuel
Walton    Jo
Toews     Miriam
Cadigan   Pat
Le        Guin       Ursula  K.
Veselka   Vanessa
Wells     Martha
Leckie    Ann

Perl is often described as a superset of grep, sed, and awk.

Combines filtering, sorting, and transforming strings with stuff that’s hard (or missing) in Bash:

Like grep:

$ perl -ne 'print if m/^T/;' examples/authors.tsv | column -t
Tolkien  John    Ronald  Reuel
Toews    Miriam

Like cut or awk:

$ perl -anE 'say @F[1];' examples/authors.tsv
Eden
Gwendolyn
John
John
Jo
Miriam
Pat
Guin
Vanessa
Martha
Ann

Like sed or tr:

$ perl -pe 'tr/[a-z]/[A-Z]/' examples/authors.tsv | column -t
ROBINSON  EDEN
WARING    GWENDOLYN  L.
BRUNNER   JOHN
TOLKIEN   JOHN       RONALD  REUEL
WALTON    JO
TOEWS     MIRIAM
CADIGAN   PAT
LE        GUIN       URSULA  K.
VESELKA   VANESSA
WELLS     MARTHA
LECKIE    ANN

I’m not here to convince you to write large programs in Perl.

I want to gesture at a portion of the language useful for:

Let’s go over some basic syntax and techniques, and then look at a few examples from my toolkit.

Basics

Basics

#!/usr/bin/env perl

print "Hello EngProd.\n";
$ ./examples/hello.pl
Hello EngProd.

Basics - boilerplate edition

#!/usr/bin/env perl

use warnings;
use strict;
use 5.10.0;

say greet($ARGV[0]);

sub greet {
  my ($greetee) = @_;
  return "Hello $greetee.";
}
$ ./examples/hello_boilerplate.pl 'EngProd'
Hello EngProd.

A basic filter

#!/usr/bin/env perl

use warnings;
use strict;
use 5.10.0;

# Extract name where given name matches "John":
while (<>) {
  say "$2 $1" if m/^(.*)\t(Jo.*?)(\t|$)/i;
}
$ ./examples/filter_authors.pl examples/authors.tsv
John Brunner
John Tolkien
Jo Walton