All the Perl you need
for slicing and dicing text files
Brennen Bearnes <bbearnes@wikimedia.org>
Source:
code.p1k3.com/gitea/brennen/wmf-engprod-offsite-slides
Slides:
squiggle.city/~brennen/perl/
A general-purpose, multi-paradigm programming language.
That also has a lot in common with the Unix shell.
General-purpose, multi-paradigm programming language.
That also has a lot in common with the Unix shell.
So: A widely-installed environment where you can combine small tools to work with data.
A sample data file, tab-separated names of some authors:
$ column -t examples/authors.tsv
Robinson Eden
Waring Gwendolyn L.
Brunner John
Tolkien John Ronald Reuel
Walton Jo
Toews Miriam
Cadigan Pat
Le Guin Ursula K.
Veselka Vanessa
Wells Martha
Leckie Ann
Perl is often described as a superset of grep
, sed
, and awk
.
Combines filtering, sorting, and transforming strings with stuff that’s hard (or missing) in Bash:
Like grep
:
$ perl -ne 'print if m/^T/;' examples/authors.tsv | column -t
Tolkien John Ronald Reuel
Toews Miriam
Like cut
or awk
:
$ perl -anE 'say @F[1];' examples/authors.tsv
Eden
Gwendolyn
John
John
Jo
Miriam
Pat
Guin
Vanessa
Martha
Ann
Like sed
or tr
:
$ perl -pe 'tr/[a-z]/[A-Z]/' examples/authors.tsv | column -t
ROBINSON EDEN
WARING GWENDOLYN L.
BRUNNER JOHN
TOLKIEN JOHN RONALD REUEL
WALTON JO
TOEWS MIRIAM
CADIGAN PAT
LE GUIN URSULA K.
VESELKA VANESSA
WELLS MARTHA
LECKIE ANN
I’m not here to convince you to write large programs in Perl.
I want to gesture at a portion of the language useful for:
Let’s go over some basic syntax and techniques, and then look at a few examples from my toolkit.
grep
, sed
, vi
, PCRE, etc.#!/usr/bin/env perl
print "Hello EngProd.\n";
$ ./examples/hello.pl
Hello EngProd.
#!/usr/bin/env perl
use warnings;
use strict;
use 5.10.0;
say greet($ARGV[0]);
sub greet {
my ($greetee) = @_;
return "Hello $greetee.";
}
$ ./examples/hello_boilerplate.pl 'EngProd'
Hello EngProd.
#!/usr/bin/env perl
use warnings;
use strict;
use 5.10.0;
# Extract name where given name matches "John":
while (<>) {
say "$2 $1" if m/^(.*)\t(Jo.*?)(\t|$)/i;
}
$ ./examples/filter_authors.pl examples/authors.tsv
John Brunner
John Tolkien
Jo Walton