Perl mysqlimport Equivalent

MySQL has a command line utility mysqlimport which imports CSV files into an MySQL database. In fact, mysqimport is a command line wrapper for LOAD DATA INFILE. Whilst constructing a price comparison website, I came across the problem of uploading merchant product catalogues, which could be 1,000,000 records+, into my MySQL database. The catalogues were provided in CSV format. Great, I thought, I'll use mysqlimport. My joy however was short lived - my host does not allow the use of the LOAD command from the filesystem. No problem, I thought, I'll use PHP. That too was thwarted by Apache timeout limits being breeched. So, I could either write elaborate PHP code to 'chunk' the imports whilst carefully monitoring the system time to avoid timeouts, or think of a better way.

Thankfully, my host does allow the execution of Perl scripts from inside cron jobs, so I figured that would be my best solution. Hence the genesis of insertcsv - my very own mysqlimport equivalent. Actually, using the word 'equivalent' is being a little cavalier - I have only implemented the switches I needed to do my job. If you need any other runtime options, well, there's an exercise for you.

The script relies on three cpan modules - DBI for the database connectivity, Getopt::Long for command line processing, and Text::CSV for reading the csv files.

insertcsv

#!/usr/bin/perl


# script to import large csv files into a mysql database. A perl equivalent of mysqlimport
# (currently without all the functionality)


# Copyright www.badzilla.co.uk
#
Licence GPL. This program may be distributed as per the terms of GPL and all credits
# must be retained
#
# If you find this script useful, please consider a donation to help me fund my web presence
# and encourage me to develop more products to be placed under the terms of GPL
# To donate, go to http://www.badzilla.co.uk and click on the donation button
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


use strict;
use warnings;
use DBI;
use Getopt::Long;
use Text::CSV;


my $delete = "";
my $table = "";
my $dbname = "";
my $user = "";
my $password = "";
my $terminator = ",";
my $ignore = 0;
my $optenclosed = '"';
my $db;
my $prep;
my $fh;

GetOptions('delete' => \$delete,
            'table=s' =>\$table,
            'dbname=s' => \$dbname,
            'user=s' => \$user,
            'password=s' => \$password,
            'fields-terminated-by=s' => \$terminator,
            'ignore-lines=i' => \$ignore,
            'fields-optionally-enclosed-by=s' => \$optenclosed);


# check there is the mandatory options
if ($table eq '' or $dbname eq '' or $user eq '' or $password eq '' or $#ARGV == -1) {
    die("Incorrect options: Need tablename, database, user, password and at least 1 filename\n");
}

# open the database connection
$db = DBI->connect("DBI:mysql:$dbname", $user, $password) or die "Cannot open database\n";

# delete the contents of the table if delete is set
if ($delete) {
     $prep = $db->prepare("DELETE FROM $table") or die "Cannot prepare database " . $db->errstr() . "\n";
     if (!$prep->execute()) { die "Failed to delete rows" . $db->errstr() . "\n"; }
}

# create the csv object
my $csv = Text::CSV->new({binary => 1,
                        quote_char => $optenclosed,
                        sep_char => $terminator});

# main loop for each of the passed files
foreach my $argnum (0..$#ARGV) {
    if (!open($fh, '<', $ARGV[$argnum])) {
        print "Warning: Could not open " . $ARGV[$argnum] . " so file skipped\n";
    } else {
        # if there is an ignore setting, loop around discarding those lines
        for(my $i = 0; $i < $ignore; $i++) {
            my $row = $csv->getline($fh);
        }

        # now perform the database write
        while (my $row = $csv->getline($fh)) {
            $prep = $db->prepare("INSERT INTO $table VALUES (\"" . join('","', $csv->fields()) . "\")")
                    or die "Cannot prepare database " . $db->errstr() . "\n";
            if (!$prep->execute()) { die "Failed to write row " . $db->errstr() . "\n"; }
        }
        $csv->eof or $csv->error_diag();
        print "Successfully written $dbname into $table using " . $ARGV[$argnum]. "\n";
    }
}

To ensure the script can be run, change the file mode bits of the script.
laptop4:/usr/local/compare/scripts> chmod u+x insertcsv
laptop4:/usr/local/compare/scripts>

The following runtime options are available:
--delete                          Delete the contents of the table before inserting new rows
--table                           Mandatory. Name of the table in which to insert the rows
--dbname                          Mandatory. Name of the database
--user                            Mandatory. Name of the MySQL user
--password                        Mandatory. User's password
--fields-terminated-by            The separator between fields
--ignore-lines                    Ignore the first number of lines
--fields-optionally-enclosed-by   Character to enclose fields
file file file ....               Mandatory. At least one csv file.

The following is an example of how to run the script, but the runtime options in your circumstances will be different.
laptop4:/usr/local/compare/scripts> ./insertcsv -ignore-lines=1 --table=affiliates --user=username --password=password --dbname=test --delete --fields-terminated-by="|" ../datafeeds/1708 ../datafeeds/14