Friday, July 30, 2010

Infeasible LPs and Farkas Certificates

Yesterday I wrote about dealing with unbounded LPs in CPLEX. Today we'll look at infeasible LPs. CPLEX APIs provide a variety of mechanisms for dealing with infeasibility, including refineConflict()/getConflict() to isolate irreducible infeasible subsystems (IISes) and feasOpt() to find a “cheapest” relaxation of constraints and bounds that makes the model feasible. They also provide access to what is known as a Farkas certificate.

Thanks to Bo Jensen, who reminded me about Farkas certificates in a comment yesterday. He also provided a link to MOSEK's description of them, which saves me quite a bit of typing. Farkas certificates draw their name from Farkas's lemma, a well-known result in linear programming theory. While it is possible for both a primal linear program and its dual to be infeasible, that condition is rare in practice. The (much) more common case is that an infeasible LP has an unbounded dual. In essence, a Farkas certificate is a recession direction for the dual problem. It's existence proves the infeasibility of the primal.

For infeasible LPs, CPLEX provides the IloCplex.dualFarkas() method to get a Farkas certificate. (As always, I'll stick to the Java API, but all the APIs should provide some version of this.) The method has two arguments used for returning results (and not for providing inputs). One argument returns a list of primal constraints with non-zero dual values; the other vector holds the corresponding dual values.

Consider the following primal LP:\begin{eqnarray*} \mathrm{minimize} & u+v-2w\\ \mathrm{s.t.} & u-2v-w & \ge3\\ & -2u+v-w & \ge2\\ & u,v,w & \ge0\end{eqnarray*}whose dual just happens, by sheer coincidence, to be the LP from yesterday's post. Dual variables $x$ and $y$ correspond to the first and second constraints respectively. The dual problem, shown in Figure 1, is obviously unbounded, so the primal is infeasible. We can easily deduce infeasibility directly from the formulation: add the two constraints together to get $-u-v-2w\ge5$, which cannot be satisfied with nonnegative values of the variables.

Continuing yesterday's Java code (without repeating it), I'll start by constructing the new primal problem:
IloCplex primal = new IloCplex();
IloNumVar u = primal.numVar(0, Double.MAX_VALUE, "u");
IloNumVar v = primal.numVar(0, Double.MAX_VALUE, "v");
IloNumVar w = primal.numVar(0, Double.MAX_VALUE, "w");
IloNumVar[] vars = new IloNumVar[] {u, v, w};
primal.addMinimize(
  primal.scalProd(new double[] {1.0, 1.0, -2.0}, vars));
IloConstraint c1 =
  primal.addGe(
    primal.scalProd(new double[] {1.0, -2.0, -1.0}, vars),
    3.0);
IloConstraint c2 =
  primal.addGe(
    primal.scalProd(new double[] {-2.0, 1.0, -1.0}, vars),
    2.0);
Note that I assign the constraints to program variables as I add them to the model, so that I can refer back to them later.

Since I cannot be sure which constraints dualFarkas will return, nor in what order, it will be handy to construct a map associating each constraint with the corresponding dual variable.
HashMap<IloConstraint, IloNumVar> map =
  new HashMap<IloConstraint, IloNumVar>();
map.put(c1, x);
map.put(c2, y);
The dual variables x and y (instances of IloNumVar) were defined in yesterday's portion of the code.

Now I turn off the presolver, solve the model, and verify that it is infeasible. (As was the case yesterday with an unbounded primal, fail to turn off the presolver will cause CPLEX to throw an exception when I try to access the Farkas certificate.)  I also need to use the dual simplex algorithm; any other algorithm (primal simplex, barrier, ...) will cause CPLEX to throw an exception. There is no explicit statement in the code since CPLEX's default choice for algorithm on this problem is dual simplex, but in general it should probably be specified just in case CPLEX prefers some other algorithm.
primal.setParam(IloCplex.BooleanParam.PreInd, false);
primal.solve();
System.out.println("CPLEX status: "
                   + primal.getCplexStatus());
The output here is CPLEX status: Infeasible.

The next step is to access the Farkas certificate:
IloRange[] farkasConstraints = new IloRange[2];
double[] farkasValues = new double[2];
primal.dualFarkas(farkasConstraints, farkasValues);
System.out.print("Farkas certificate:");
for (int i = 0; i < 2; i++) {
  System.out.print("\t" + map.get(farkasConstraints[i])
                   + " -> " + farkasValues[i]);
}
System.out.println("");
which results in Farkas certificate: x -> 0.6666666666666666 y -> 0.3333333333333333. So the Farkas certificate proves to be the direction vector of the extreme ray anchored at $A$ in Figure 1. Note the use of the previously constructed map to match the constraints returned in farkasConstraints with the corresponding dual values. Although both constraints have nonzero direction components in this example, in general not all constraints need be returned, nor can I be sure that those returned will be in the order I added them to the model.

To find the vertex at which it is anchored, I can access the last feasible dual solution obtained before CPLEX stopped.
double[] q = primal.getDuals(farkasConstraints);
System.out.print("Dual vertex:");
for (int i = 0; i < 2; i++) {
  System.out.print("\t" + map.get(farkasConstraints[i])
                   + " -> " + q[i]);
}
System.out.println("");
The output is as follows: Dual vertex: x -> 1.6666666666666665 y -> 0.33333333333333326. This is vertex $A$ in Figure 1. So, at least in this example, the Farkas certificate produces the same extreme ray that I got yesterday by solving today's dual (yesterday's primal) directly.

I still have one concern. The definition of a Farkas certificate only guarantees a recession direction for the dual; it does not need to be an extreme dual direction to qualify as a Farkas certificate. I suspect, but do not know with certainty, that the mechanism by which solvers select a Farkas certificate will consistently lead to an extreme direction.

Thursday, July 29, 2010

Finding Extreme Rays in CPLEX

It's funny how questions in technical forums sometimes cluster. After going quite some time without seeing any mention of extreme rays on any of the CPLEX-related forums (nor on sci.op-research, nor OR-Exchange), in the past month or so I've encountered at least three threads relating to the topic. So perhaps it's fodder for a blog post.

Some terminology first. If a convex polyhedron is unbounded, it has one or more recession directions, directions of rays that remain within the polyhedron. The set of recession directions is convex, so “one or more” means “one or uncountably many”. Nifty properties include that a recession direction from one point in the polyhedron is a recession direction from every point in the polyhedron, and that the recession directions form a convex cone. An extreme direction is a recession direction that cannot be written as a non-trivial convex combination of other recession directions; the corresponding extreme ray is a boundary of the cone of feasible rays.

Now let's move to the realm of linear programs. If a linear program is unbounded, then the feasible region (a convex polyhedron) must be unbounded in a direction that improves the objective function. The primal simplex method recognizes this when it reaches a vertex at which there is a variable with favorable reduced cost to pivot out of the basis (indicating an edge on which to move) but there is no candidate to enter the basis (because no constraint becomes binding as you move along the edge). The edge in question is an extreme ray anchored at an extreme point of the feasible region.

Consider the problem\begin{eqnarray*} \mathrm{maximize} & 3x+2y\\ \mathrm{s.t.} & x-2y & \le1\\ & -2x+y & \le1\\ & x+y & \ge2\\ & x,y & \ge0\end{eqnarray*}whose feasible region is shown in Figure 1 below.

There are two extreme rays, one anchored at $A=\left(\frac{5}{3},\frac{1}{3}\right)$ with direction $\left(2,1\right)$ and the other anchored at $B=\left(\frac{1}{3},\frac{5}{3}\right)$ with direction $\left(1,2\right)$. I'll code it in CPLEX using the Java API (other APIs will behave similarly), omitting imports, try/catch blocks etc.

IloCplex cplex = new IloCplex();
IloNumVar x = cplex.numVar(0, Double.MAX_VALUE, "x");
IloNumVar y = cplex.numVar(0, Double.MAX_VALUE, "y");
cplex.addMaximize(cplex.sum(cplex.prod(3.0, x),
                  cplex.prod(2.0, y)));
cplex.addLe(cplex.sum(x, cplex.prod(-2.0, y)), 1.0);
cplex.addLe(cplex.sum(y, cplex.prod(-2.0, x)), 1.0);
cplex.addGe(cplex.sum(x, y), 2.0);
// turn off the presolver
cplex.setParam(IloCplex.BooleanParam.PreInd, false);
// select the primal simplex method
cplex.setParam(IloCplex.IntParam.RootAlg,
               IloCplex.Algorithm.Primal);
// solve the model
cplex.solve()
// verify primal is unbounded
System.out.println("CPLEX status: "
                   + cplex.getCplexStatus());
// get the current solution
System.out.println("Last extreme point: x = "
                   + cplex.getValue(x)
                   + ", y = "
                   + cplex.getValue(y));
// get a ray
IloLinearNumExpr ray = cplex.getRay();
System.out.println("getRay returned " + ray.toString());
// parse it
IloLinearNumExprIterator it = ray.linearIterator();
double xcoef = 0, ycoef = 0;
while (it.hasNext()) {
  IloNumVar v = it.nextNumVar();
  if (v.equals(x)) {
    xcoef = it.getValue();
  }
  else if (v.equals(y)) {
    ycoef = it.getValue();
  }
}
System.out.println("Extreme ray direction = ("
                   + xcoef + ", " + ycoef + ")");

The output is as follows:
The output is as follows:
Iteration log . . .

Iteration:     1    Infeasibility =             1.000000

Switched to steepest-edge.

CPLEX status: Unbounded

Last extreme point: x= 1.6666666666666665,
                    y = 0.3333333333333335

getRay returned 0.6666666666666666*x
                + 0.33333333333333337*y

Extreme ray direction = (0.6666666666666666,
                         0.33333333333333337)
Note that the primal simplex method stopped at vertex $A$ and the direction vector is within a scalar multiple of $(2,1)$, so it is the correct direction for the extreme ray there.

Things to note:
  1. We must turn off presolving to find the ray. If we comment out that line, the attempt to access the vertex causes an exception:
  2. CPLEX Error  1217: No solution exists.
    If we also comment out the lines that try to get the vertex values (seeking just the ray and not its anchor point), the call to getRay() throws an exception:
    CPLEX Error  1254: Unbounded solution required.
  3. We must use the primal simplex algorithm. If we comment out that line (and restore the previously removed lines), we get the following output:
  4. Last extreme point: x = 0.0, y = 0.0
    getRay returned 1.0*x + 1.0*y
    Extreme ray direction = (1.0, 1.0)
    
    Note that the anchor point $(0,0)$ is not actually a vertex of the feasible region (and in fact is not in the feasible region), and the direction vector $(1,1)$, while a legitimate recession direction, is not an extreme recession direction.

  5. Our arrival at vertex $A$ rather than vertex $B$ was due to the choice of the objective function. Tilt the objective function so that its normal vector is a bit closer to vertical, and we end up at $B$ instead.
  6. The bad news is that in most APIs we need to fiddle with an iterator to parse the direction vector, since getRay() returns a linear expression rather than a double[]API, CPXgetray() passes the ray coefficients in a double[] argument, including any zeros (no attempt to exploit any sparsity).

Monday, July 12, 2010

Logical Indicators in Mathematical Programs

There's a common paradigm in many procedural programming languages that expressions involving relational operators spit up a true/false value that either is coded numerically (typically 1 for true, 0 for false) or will automatically be translated by the compiler/interpret that way. So, for example, the expression "a*(b>c)" produces the value a if b>c and 0 otherwise. Unfortunately, newcomers to mathematical programming are sometimes inclined to attempt that in the context of a mathematical programming model. Unless they are using an unusually clever modeling language -- more clever than any I've used -- it will not work (and even with a very clever modeling language it will not work for strict inequalities). There are ways to do what is wanted, but they involve introducing binary variables (and additional constraints), and you may not always get exactly what you wanted.

As a simple example, let's say that I have a variable $x$ that represents the delivery time for an order, and a parameter $d$ which is the due date for the order. I have a model in which I minimize some cost function $f(\cdots)$, and I want to assess a fixed penalty $p$ (another parameter) if, and only if, the order is late ($x>d$). In this example, the penalty is a constant, not dependent on how late the order is. Writing a computer program, I might be tempted to try something like $f+=p*(x>d)$, but that will not come close to flying in an optimization model. So my first step is to introduce a new binary variable $z\in\{0,1\}$ which will (hopefully) take the value 1 if the order is late and 0 if not. The objective then becomes $\textrm{maximize }f(\cdots)+p*z$.

Now I need to constrain $z$ so that it takes the correct values. In this particular example, I get some indirect assistance from the objective function. Assuming $p>0$ (I'm not rewarding tardiness), the solver will always choose $z=0$ over $z=1$ given the rest of the solution (meaning once $x$ is known). So it does not matter if the formulation allows both $z=1$ and $z=0$ when $x\le d$; the solver will choose $z=0$ in that case to reduce the objective value. Therefore I only need to incorporate a constraint that says $x>d\Longrightarrow z=1$; I don't need to worry about what happens when $x\le d$. The strict inequality might pose a problem, but fortunately the contrapositive $z=0\Longrightarrow x\le d$ contains a weak inequality.

Now I'm on the home stretch. I need to come up with an upper bound $M>0$ on the possible tardiness of the order. I add the constraint \[x\le d + M*z.\] If it turns out the order is late ($x>d$), this will force $z=1$ (so I'm charged the penalty) and the constraint will become $x\le d+M$, which is nonbinding as long as I chose $M$ sufficiently large. If it happens that the order is on time ($x\le d<d+M$), then $z$ can take either value, and I rely on the objective function to cause the solver to choose $z=0$.

What if I need $z=1$ if and only if $x>d$, and I cannot rely on the objective function (because, let's say, there is no cost penalty for $z=1$)? With one exception, it cannot be done in a mathematical program ... but I can come close. I need another constant $L>0$ such that $d-L<x$ for any feasible value of $x$. (In our example, $L$ bounds the earliness of the order.)  Now I have two choices. The first is to add a pair of constraints:\begin{eqnarray*}x\le d + M*z\\ x\ge d - L*(1-z).\end{eqnarray*} Now $x>d\Longrightarrow z=1$ and $x<d\Longrightarrow z=0$. If $x=d$, though, $z$ can take either value.

The alternate approach involves yet another constant, this time a small one ($\epsilon>0$). Think of $\epsilon$ as representing the minimum discernible difference between $x$ and $d$. The defining constraints for $z$ are now \begin{eqnarray*}x\le d + M*z\\ x\ge d + \epsilon - (L+\epsilon)*(1-z).\end{eqnarray*} What we get is $x\ge d+\epsilon \Longrightarrow z=1$ and $x\le d\Longrightarrow z=0$. What we sacrifice is that $d<x<d+\epsilon$ is now infeasible. In our example, it is illegal to be very slightly late; if the order cannot be on time, it will have to be at least $\epsilon$ time units late. If you're thinking "cool, I can pick $\epsilon=10^{-50}$", keep in mind that $\epsilon$ has to be large enough that it does not get lost in rounding error in the solver (otherwise you're back to the first case, where delivery close to the due date allows $z$ to go either way).

I mentioned one exception above. If $x$ (and $d$) are integers, life is easy, because $x>d$ is equivalent to $x\ge d+1$. So in this case we can get exactly what we want, using the second approach with $\epsilon$ slightly smaller than 1.

Monday, July 5, 2010

Inclusive Or in Mathematical Programs

This is a FAQ in various optimization/mathematical programming forums, so it's worth putting the answer someplace to which I can point. How do you model a constraint of the form "either A or B must hold"?

The first thing to observe is that the conditions A and B must, like any other constraints, be expressed as equalities or weak inequalities; strong inequalities are to be avoided because they make the feasible region an open set. In what follows we'll assume that A and B are expressed respectively as $a \le b$ and $c \le d$, where $a,\ldots,d$ are expressions involving model variables. We'll also assume that what is intended is an inclusive or, meaning it is acceptable that both conditions are satisfied simultaneously.

To model the disjunction, we will need two positive constants $M_1$ and $M_2$ such that we are absolutely certain that $a \le b + M_1$ and $c \le d + M_2$ in the optimal solution (or in at least one optimal solution). Choosing $M_1$ or $M_2$ too small can make the optimal solution infeasible, which would be a serious error. Choosing $M_1$ or $M_2$ unnecessarily large can slow down the progress of the solver (by making bounds looser than is necessary); that can be annoying, but it's not fatal. So one typically errs on side of making $M_1$ and $M_2$ if anything too large, but it pays not to go overboard with them.


We will also need two new binary variables, $z_1$ and $z_2$. We add the following three constraints:\[ \begin{array}{cccc} a & \le & b + M_1*\left(1 - z_1\right) & (1) \\ c & \le & d + M_1*\left(1 - z_2\right) & (2)\\ z_1 + z_2 & \ge & 1 & (3) \end{array}. \] Constraint (1) enforces $z_1 = 1 \Longrightarrow a > b $, and similarly constraint (2) enforces $z_2 = 1 \Longrightarrow c > d$. Finally, constraint (3) forces at least one of $z_1$ and $z_2$ to be 1, which in turn forces at least one of the original inequalities to hold.

Imposing an exclusive or (requiring exactly one of the inequalities to hold) is considerably trickier. The easy part is converting (3) to an equation: $z_1 + z_2 = 1$. The hard part is converting (1) and (2) so that $z_1 = 1$ if and only if $a \le b$, and similarly for $z_2$. This requires that $z_1 = 0 \Longrightarrow a > b$; but enforcing a strict inequality is not possible. The one exception is that if $a$ and $b$ are integer valued, then $a>b$ equates to $a\geb+1$, which can be handled. For continuous expressions, though, the best workaround may be to introduce a small positive constant $\epsilon>0$ and enforce $z_1=0 \Longrightarrow a \ge b + \epsilon$, which has the unfortunate effect of cutting off any (possibly optimal) solution where $b<a<b+\epsilon$.

Friday, July 2, 2010

Updating TeX Live on Mint

Linux Mint is built on Canonical's Ubuntu distribution, and Canonical is, shall we say, a tad conservative about updating packages in their repositories.  Thus Ubuntu 9.10 (Karmic Koala), which I believe was released in the fall of 2009, still had the 2007 version of the TeX Live distribution of LaTeX. (The recently released Ubuntu 10.04 Lucid Lynx distribution apparently comes with TeX Live 2009; meanwhile, TeX Live 2010 is out and about.)

For various reasons, it became incumbent upon me to update my laptop (Mint Helena, built on Karmic Koala) to a later version of TeX Live.  So I zapped the old installation and followed the quick install instructions, including the comment that they recommend installing as a normal user rather than as root.  This made sense, since when TeX Live is installed as root you have to sudo to do some maintenance operations, such as updating the file name database.

Yeah, except ... if TeX Live is installed as root, it can write anywhere, so it gets itself on the system command path effortlessly and automatically.  Not so if it's installed in user mode.  I had to point it to a different folder than where it normally installed (I picked a folder under /opt -- which I then had to sudo to create) and got it installed, but it was not on the system command path.  So now I had to add the location of the new TeX Live bin directory to the system command path.

Rarely do I pay Windows a compliment, but it's pretty darned easy to add something to the Windows path (maybe too easy, from a security perspective), and it's equally easy to distinguish between the overall path (that every user gets) and the supplemental path that only the user making the changes gets.  Not so with Linux.  As best I can tell, there are no helper tools to make path changes in Linux.  So I added the path to my ~/.bashrc file (where I've made other path changes in the past), and that seemed to work ... until I tried to use my favorite document writing program, LyX.  Started from a terminal (where I was logged in as myself), it required a reconfiguration (which came as no huge shock), after which it ran fine. Started from the Mint program menu, though, it insisted that all document classes had gone missing, which is a sure sign that it can't find the LaTeX distribution.

Mind you, if I started LyX from a terminal with sudo (so that it was running under the root account), I could get it to work. But no way would it run from the menu. So I guess the ~/.bashrc file is not read by programs run from the menu. Fine, be that way.  I eventually solved the problem by appending the path export to /etc/profile, the system-wide profile. It works -- now -- but if some system upgrade down the road rewrites that file, I hope I remember to look here for how to get LyX working again. :-(

Why is My Model Infeasible?

A frequently asked question on various optimization mailing lists and web forums runs something like this: The solver says my <linear/integer/nonlinear/integer linear/integer nonlinear> optimization model is infeasible -- which constraint is causing it? The easy but unhelpful answer is that no one constraint causes a model to be infeasible.

Infeasibility is the result of a conflict among two or more constraints. It is important to note that "constraint" encompasses three types of restrictions: functional constraints (equations and inequalities involving two or more variables); lower and upper bounds on individual variables (including nonnegativity restrictions); and integrality requirements for individual variables. In all cases of infeasibility, the objective function is irrelevant; only constraints (as just defined) matter.

For linear programs, the acronym to know is IIS, which is sometimes expanded as Irreducible Infeasible Subset and sometimes as Irreducible Inconsistent Subsystem (as, for instance, in the Mathematical Programming Glossary). Either way, it means the same thing: a subset of the constraints and variable bounds that cannot be simultaneously satisfied (even if all other constraints/bounds are ignored), but such that removal of any one member of the IIS leaves you able to satisfy the remaining constraints/bounds in that IIS. That last qualification is important: there can be more than one IIS in a model, so removing (or sufficiently loosening) one of the members of a particular IIS may not be enough to make the model feasible.

There has been considerable research on efficient ways to identify IISes (but apparently no research on the correct way to write the plural of IIS), and on how to identify a minimal set of constraints/bounds whose removal will break all IISes and make the problem feasible. I believe the latter problem has been shown to be NP-hard. Having identified an IIS, it is not typically necessary to remove a constraint/bound entirely; relaxing one (or more) constraints/bounds sufficiently should do the trick. Contemporary LP solvers usually provide mechanisms for identifying at least one IIS, so all else failing you can repeatedly solve the model, identifying and correcting an IIS each time, until the model becomes feasible. (This may not be the route to feasibility that involves minimal amounts of relaxation, however.)

Which constraint(s) in an IIS to relax, and how much to relax them, is context specific. One possible approach is to insert a nonnegative variable into those constraints in the IIS (including bounds) that can possibly be relaxed. The new variables represent the amount of relaxation of the corresponding constraints. Add a penalty term to the objective for each of the added variables, assigning costs for the relaxations in a manner that makes sense in context. Then optimize the expanded model to find both an optimal solution and the "cheapest" combination of relaxations. Something similar should work for nonlinear programs, although I don't know if NLP solvers implement IIS finding routines.

When there are integrality restrictions, the picture gets more complicated. The first step is to relax the integrality restrictions and test whether the relaxed problem is feasible. If not, identify an IIS (or multiple IISes) and proceed as above. If the continuous relaxation if feasible, then the issue is that the feasible region of the relaxation does not contain any integer points. The only way I know to fix the model is to introduce the relaxation variables (and costs) I described above.

Thursday, July 1, 2010

When Incumbents Need Polishing

This is about formulating and solving mixed integer programs. It has nothing to do with political office holders, although it's about as long-winded as one of their speeches to a camera and an empty legislative chamber. The topic is one of those things that slips easily under the radar, can pop up to bite you in the butt, and is blindingly obvious in hindsight.

It's well known that, in general, getting good incumbents early is helpful if not essential in solving mixed integer linear programs (MILPs). (MILPs being the perverse beasts they are, virtually nothing is always true of them, so assume everything I say carries an implicit "in general" qualifier.) Good incumbents let the solver prune larger chunks of the search tree earlier.

It's also common practice, in some situations, to model an "if and only if" constraint as an "if", relying on the objective function to enforce the "only if", perhaps through a penalty mechanism. Let's start with a general MILP, of the form\[ \begin{array}{lc} \textrm{minimize} & f(x)\\ \textrm{subject to:} & x\in\mathcal{X}\end{array}\] where $f()$ is a linear objective function, $x$ is a mixture of discrete and continuous variables, and $\mathcal{X}$ is a feasible region defined by linear equality or (weak) inequality constraints and integrality conditions. Now let $g_{1}()$, $g_{2}()$ and $g_{3}()$ be linear functions; let $d^{+}\ge0$, $d^{-}\ge0$ and $y\ge0$ be continuous nonnegative variables, and let $z\in\{0,1\}$ be a binary variable; and let $\alpha^{+}$, $\alpha^{-}$, $\beta$, $\gamma$ and $M$ be strictly positive constants. To our general MILP we add goals $g_{1}(x)=0$, $g_{2}(x)\le0$ and $g_{3}(x)\le0$, and assess penalties proportional to deviations in the first two and a fixed penalty for deviations in the third. A typical way to deal with these converts the original MILP to the following:\[ \begin{array}{lcr} \textrm{minimize} & f(x)+\alpha^{+}d^{+}+\alpha^{-}d^{-}\\ & +\beta y+\gamma z & (1)\\ \textrm{subject to:} & x\in\mathcal{X} & (2)\\ & g_{1}(x)=d^{+}-d^{-} & (3)\\ & g_{2}(x)\le y & (4)\\ & g_{3}(x)\le Mz & (5)\end{array}\] with $M$ (the infamous "big M") chosen sufficiently large that we can be sure the optimal solution $x^{*}$ satisfies $g_{3}(x^{*})\le M$.

Constraints like (3) are common in goal programming. Penalizing overtime or tardiness in production and scheduling problems usually leads to constraints of the form (4). Problems with fixed costs, such as the bin packing problem or the lock box problem, often result in constraints of the form (5). The first critical observation is that, while it is not feasible to violate one of those constraints and "underpay" in the objective, it is feasible to "overpay". Suppose that $(\hat{x},\hat{d}^{+},\hat{d}^{-},\hat{y},\hat{z})$ is an integer-feasible solution to the problem with objective value $\hat{f}$; then we can add any positive amount $\epsilon$ to both $\hat{d}^{+}$ and $\hat{d}^{-}$, or to $\hat{y}$, and obtain another feasible solution with objective value $\hat{f}+(\alpha^{+}+\alpha^{-})\epsilon$ or $\hat{f}+\beta\epsilon$ respectively. If $\hat{z}=0$, then $(\hat{x},\hat{d}^{+},\hat{d}^{-},\hat{y},1)$ is also feasible with cost $\hat{f}+\gamma$. In formulating the model, we are ordinarily not concerned with this since minimization of the objective will ensure that \emph{the optimal solution} does not overpay. Moreover, adding constraints that would ensure that $d^{+}$ and $d^{-}$ are not simultaneously positive, or that $g_{2}(x)\le0\Longrightarrow y=0$, or that $g_{3}(x)\le0\Longrightarrow z=0$, may involve the addition of integer variables (which raises computational expense) and may poke holes in the feasible region. As an example of the latter, consider that the contrapositive of $g_{3}(x)\le0\Longrightarrow z=0$ is $z=1\Longrightarrow g_{3}(x)>0$. Since mathematical programs admit only weak inequalities (to keep the feasible region closed), this winds up being $z=1\Longrightarrow g_{3}(x)\ge\epsilon$ for some small (but nonzero) positive $\epsilon$, and we have arbitrarily excluded any solution that might have $0\ltg_{3}(x)\lt\epsilon$.

So common practice is to accept the fact that formulation (1)-(5) allows overpayment in theory, and rely on the objective to prevent overpayment in practice. What of intermediate (suboptimal) solutions, though? In classical branch-and-bound, we have reason to be optimistic: new incumbents are found exclusively as integer-feasible solutions to the linear relaxation of the subproblem at some node (meaning that some of the integer variables are fixed, and the remaining integer variables and all continuous variables are determined by solving a linear program). Since all the continuous variables are determined by solving a linear program, we do not have to worry about "padding" in $d^{+}$, $d^{-}$ or $y$. (In fact, the simplex method itself precludes $d^{+}$ and $d^{-}$ from simultaneously being positive: since their columns are negatives of each other, they cannot both be part of the basis, as that would make the basis matrix singular.) As to $z$, there is the possibility that $z$ is locked at 1 (due to prior branching decisions) while $g_{3}(x)\le0$; so the incumbent found at this node might possibly have a padded objective value. Contemporary solvers, however, frequently include heuristics that generate potential incumbents even when the solution to the node LP is not integer-feasible. Those incumbents can again result in padding of the form $g_{3}(x)\le0$, $z=1$; but, depending on the heuristic, they may also allow $g_{2}(x)\le0$ and $y>0$, or possibly even $d^{+}$and $d^{-}$ simultaneously positive.

Should we care about this, given that the optimal solution is guaranteed not to be padded? Yes, for two reasons. First, in practice we frequently have to settle for good but not optimal solutions, either because finding the optimal solution would take prohibitively long or because \emph{proving} the optimal solution is actually optimal would take prohibitively long. If we are going to settle for an intermediate incumbent, it behooves us to check it for gratuitous overpayment and, if necessary, polish it (in other words, accept the value of $x$ and compute $d^{+}$, $d^{-}$, $y$ and $z$ for ourselves). That's not a big deal; we're really accepting the solution itself as-is and just recomputing the objective value. The other reason for concern, though, is not as easily dismissed. If the MILP solver generates a new incumbent with an inflated objective value, we do not get the full value of that incumbent in pruning nodes. The solver only prunes nodes whose objective bound (typically the objective value of the LP relaxation at that node) is worse than the inflated objective value of the new incumbent; we lose the opportunity to prune nodes whose LP bounds are better than the inflated value but worse than the actual (polished) objective value of that solution. In fact, if the objective value is sufficiently inflated, the solver may not consider the solution a new incumbent at all, and we never see it, even thought its true objective value would make it useful.

I think this problem tends to fly below the radar, in part because we generally have no indication in the output that inflation is taking place, and in part because we subconsciously think that the simplex algorithm (or whatever is solving the node LPs) will protect us from it. I recently tripped over this in a very real way, though. The only reason I knew it was going on was that I was comparing outright solution of a MILP model (using the Java API to CPLEX) to the performance of a blended metaheuristic on the same problem. The code I used to solve it with CPLEX was a branch of the metaheuristic code, and used the metaheuristic's method of computing the objective value. Watching my code's output with one eye and CPLEX's node log with the other, I was unpleasantly surprised to see CPLEX announce an incumbent with objective value $f^{*}$ and my code announce a considerably lower objective value (in the worst cases lower by factors of two to five). I eventually tracked the discrepancy to problems with constraints of the form (5).

What can we do about it? We can, at least in some cases, add constraints that preclude the padding; but those constraints inflate the size of the problem and may slow the solver. We can use an incumbent callback to detect the problem and a cut callback to install the necessary additional constraints locally, so that they apply only to the subtree descending from the node where we detected the inflation. Compared to installing all the anti-inflation constraints up front, this makes for smaller node problems, but it may require rediscovering some of those constraints repeatedly. Also, because it relies on an incumbent callback to detect that a proposed incumbent has an inflated objective value, it does not help with cases where the inflation is enough to prevent the solver from recognizing the solution as a new incumbent.

In my case, I was already using an incumbent callback to capture and record incumbents as they were found. I added code to the incumbent callback that checked for padding and, if it found any, queued the incumbent for "polishing". Then I added a heuristic callback that does nothing unless a solution is queued, in which case it polishes the solution and injects it (same $x$ but $d^{+}$, $d^{-}$, $y$ and/or $z$ adjusted as necessary) as a new incumbent, which CPLEX recognizes with the correct objective value. Like the second proposed solution above, it will not help if the inflation makes a new solution look worse than the current incumbent, but it is fairly streamlined and does seem to help the solution process.