Fast forward to today. I've been working on a script that synchronizes data between our Netezza and MySQL. In other words, I have a table in Netezza, and I need to have it replicating to MySQL. Sounds simple, right? The complication is that the MySQL table has an extra field, called last_updated, which defaults to current timestamp. The data that I pull from the Netezza lacks this field. Any data that does not need updating in MySQL should have this field left alone.
My first attempt (and what went into production) was this:
load data local infile '$csv_file' replace into table $mysql_table fields terminated by ',';
This worked just fine, until the Netezza table stopped populating and my script kept reading the same data over and over again and inserting the data (with fresh timestamps) into MySQL. This was not the desired affect. With MySQL, REPLACE INTO deletes rows and does new inserts, and since we weren't specifying the last_updated field, it would auto-populate with the current timestamp. Since the data I was inserting was identical to the data that was there before, the last_updated field should have been left alone.
So today, I was working on a new script that uses an ODBC connection via DBD::ODBC to get the data out of the Netezza and the DBD::MySQL module to connect to the MySQL. I had the code nearly done, and I stopped and looked at it.
I realized that going by the "hit by the bus" theory, if someone else needed to take over my code, they were, to put it bluntly, screwed. And that didn't feel so good.
So, I took a walk.
When I returned, a much more elegant solution was in my brain, ready to go:
1) Select the data out of MySQL into a hash
2) Select the data out of Netezza
3) If the data isn't already in the hash, put it in the hash
4) If the data is in the hash but isn't identical, replace the value in the hash
5) If the data is in the hash and is identical, delete it
6) Iterate over the hash, creating INSERT INTO ... ON UPDATE ...; statements.
Now, I'm not positive I found the perfect solution OR that there isn't a better way, but my current solution is *much* better than what I had written previously.
When you've got a tricky problem to solve, take a walk.