Friday, March 23, 2012

When a Zip is Not a Zip

For a while I administered a department-level "server" (basically, a large and noisy workstation) running some flavor of Windows (I think Win 2K).  Colleagues in my department used it for assorted purposes, including research (web-based surveys), and like all good users they (a) never deleted anything from the server (since, from their perspective, space was free) and (b) were cavalier about maintaining their own copies of their files. When the time came to retire the server, I sent out a couple of warning emails (get your files or wave goodbye to them).  I also created a ZIP archive of all data files on the server before turning it into a doorstop.

Recently (meaning a couple of years or so after the server was recycled) a colleague asked me if I could recover her surveys and survey data.  So I scrounged around and found the ZIP archive ... only to discover that it cannot be decompressed by any program I can find.  I've tried at least seven alternatives, including the compression/decompression method built into Windows 7, several popular third-party programs for the Windows platform, and gunzip on Linux.  They all show the contents of the archive correctly, and they all fail (either with an error message or, in one case, by freezing) when I try to extract anything from the archive.  The error message is that the compression method (described by the ordinal 18 in some cases and the name "IBM/Terse (new)" in others) is unsupported.

Up to now, I've naively assumed that a ZIP file is a ZIP file is a ZIP file.  I know that what's being zipped could be a TAR archive, a bunch of files or whatever, and I know that some ZIP utilities give the user options for trading extra time for smaller file sizes.  I've never had an uncorrupted ZIP file fail to decompress using any compression program, whether the same one that created the archive or not ... until now.

In the future, I guess I will need to be extra careful about how important backups are compressed -- and hope that the compression method used does not go the way of the card reader.

Saturday, March 3, 2012

Diagnosing Unbounded Models

Two approximate calculations give the number of atoms in the observable universe to be close to 1080. (source: Wikipedia)
When teaching, I used the hypothesized finiteness of the amount of matter in the universe as a quick rationale for the statement that all practical optimization models should be bounded.  (All practical optimization models should be feasible, too, unless the purpose of the model is to show that your bosses are being unreasonable in their demands.)  Unbounded models arise from a combination of two factors:
  • the analyst does not bother to determine reasonable finite bounds for each decision variable; and
  • constraints that should bound the problem are either omitted or formulated incorrectly.
So let's say that you submit a model to a solver, and it responds that the problem is unbounded.  How do you diagnose the error?

The fancy approach involves getting the solver to cough up a ray along which the feasible region is unbounded and the objective is strictly improving.  The easy way is to slap finite but ridiculously loose bounds (both upper and lower) on any variable that is not already bounded, and then solve the model again.  Since the feasible region is now bounded, you will get an "optimal" solution.  Since the problem was previously unbounded, at least some variables will attain those ridiculously loose bounds. 

Ask yourself, in the context of the decision problem, why that cannot happen in practice:  why you cannot make 50 trillion automobiles per day, or sell 50 trillion iGadgets per day, ...   Think not in terms of the model, but in terms of reality.  You can't do it because you don't have enough (labor, materials, capacity, capital, demand, ...).  Now focus on the constraints that should reflect those limits.  If they are missing, add them.  If they are present, plug the solution you just got into them and figure out why they failed to prevent it.  Remediate and iterate until you can remove the goofy bounds (preferably replacing them with more reasonable bounds) and get an optimal solution.