How do I efficiently parse a CSV file in Perl?
The right way to do it -- by an order of magnitude -- is to use Text::CSV_XS. It will be much faster and much more robust than anything you're likely to do on your own. If you're determined to use only core functionality, you have a couple of options depending on speed vs robustness.
About the fastest you'll get for pure-Perl is to read the file line by line and then naively split the data:
my $file = 'somefile.csv';
my @data;
open(my $fh, '<', $file) or die "Can't read file '$file' [$!]\n";
while (my $line = <$fh>) {
chomp $line;
my @fields = split(/,/, $line);
push @data, \@fields;
}
This will fail if any fields contain embedded commas. A more robust (but slower) approach would be to use Text::ParseWords. To do that, replace the split
with this:
my @fields = Text::ParseWords::parse_line(',', 0, $line);
Here is a version that also respects quotes (e.g. foo,bar,"baz,quux",123 -> "foo", "bar", "baz,quux", "123"
).
sub csvsplit {
my $line = shift;
my $sep = (shift or ',');
return () unless $line;
my @cells;
$line =~ s/\r?\n$//;
my $re = qr/(?:^|$sep)(?:"([^"]*)"|([^$sep]*))/;
while($line =~ /$re/g) {
my $value = defined $1 ? $1 : $2;
push @cells, (defined $value ? $value : '');
}
return @cells;
}
Use it like this:
while(my $line = <FILE>) {
my @cells = csvsplit($line); # or csvsplit($line, $my_custom_seperator)
}