OR in an OB World: 2012

Tuesday, December 25, 2012

Resetting Speaker Volume in Mint/Ubuntu

Okay, you have to promise not to laugh or make "old guy" cracks here. I keep the master volume on my Linux Mint PC cranked fairly low (between 40% and 45% of maximum) because I have powered speakers (and reasonably acute hearing). When I watch YouTube videos, I usually have to crank the volume up a bit. When the video is on the long side (think slide show or lecture rather than cats mugging for the camera), I often (usually?) forget to crank the volume back down again. Mint remembers the volume, so the next time I log in I get blasted by the log-in chime. If I'm not sitting at the PC when the desktop comes up -- and frequently I'm off getting a cup of coffee or doing some chore while it boots -- I'll forget to reset the volume, and eventually get blasted by some other sound.

So here's a partial fix, a script that resets the volume on log-in.

In a terminal, run pactl list sinks to get a list of output devices controlled by the PulseAudio driver. Note the device number for the device that handles your speakers. (In my case, the only entry is device #1.)
Using your favorite text editor, create a script file (mine is named resetVolume.sh) in a directory of your choice (~/Scripts for me). Putting the script somewhere in your home directory should keep it safe from being lost during system upgrades. Put the following two lines of code in the script file:
```
#!/bin/sh
pactl set-sink-volume 1 45%
```
Change 1 to whatever device number you obtained in the first step and 45% to whatever volume setting you want. Note that the pactl command seems to suffer some rounding errors when it does volume changes; this script actually sets my volume to 44%, according to the panel audio control applet.
In a terminal, run chmod +x ~/Scripts/resetVolume.sh (changing the path and file name as appropriate) to make the script executable.
Test the script: use the panel audio applet (or whatever mechanism you normally use to control volume) to crank the master volume up or down, then run the script in a terminal and verify the volume resets correctly.
Find Startup Applications in the system menu (the easiest way is to search for it by name) and run it. Click the Add button and create an entry for the script.
Test once more by screwing with the volume setting and then logging out and back in.

The one drawback I've found to this is that the volume reset takes place after the startup chime sound has begun playing. So the initial auditory assault is not entirely avoided, but at least I've averted any future ones for that session.

UPDATE: I apparently declared victory prematurely. The script seems to run fairly reliably when I log out and log back in, but if I shutdown and restart, or reboot, it does not work. I switched from the pactl command to the pacmd command, but that did not help. I added a line 'sudo alsactl store' to reset the stored volume (and added the script to the sudoers file), but that did not help. I linked the script from /etc/rc0.d and from /etc/rc6.d, so that it would run when shutting down or rebooting, and confirmed that the script did indeed run; it just did not reset the stored volume. (I named it both K99resetVolume, so that it would run late in the shutdown sequence, and K00resetVolume, so that it would run early, but no joy either way.) My suspicion (and it's just a suspicion) is that there's a timing issue, with the script perhaps failing because alsa and/or pulse-audio is not running at the time the script executes. In any event, I'm at a loss how to get it to run properly.

UPDATE #2: Another day, another failure. This time I symlinked the script in /etc/X11/Xsession.d, so that the script would run when the X system started after login. I gave it a name starting with "99z", which I think would make it the last script to run during the X start. Once again, the script ran but failed to affect the audio volume.

[SOLVED] UPDATE #3: I fixed this a while back and apparently forgot to update this post. The script that works for me is as follows:

#!/bin/sh
#
# There seems to be some inconsistency about whether the sound card
# is sink 0 or sink 1, so hit them both just to be safe.
#
pacmd set-sink-volume 0 27500
pacmd set-sink-volume 1 27500
sudo alsactl store

It will generate a harmless warning message because one of the two sinks (0, 1) will not exist when the script runs.

Thursday, December 20, 2012

A GEdit Headache

What should have been a simple task just ate an hour of my life (and, trust me, the marginal value of an hour of your life is an increasing function of age). I'm trying to compare two parts of a text file, and I wanted to use a split view in gedit. That's not a feature of gedit, so I sought out and found a plug-in, GeditSplitView, that should do the trick. I downloaded it and installed it to ~/.local/share/gedit/plugins (after creating ~/.local/share/gedit and the child plugins folder). That I had to create the folders was a bit surprising, as I was sure I'd previously created them for a different plug-in (now gone missing). I keep my home folder tree on a separate disk partition from everything else, so upgrades (such as my recent installation of Mint 14 Nadia over Mint 11 Katya) should not disturb anything in the home folder. Well, whatever.

After installing the plug-in, I restarted gedit - and discovered it was not seeing the plug-in. Hmm. As an experiment, I installed the gedit-plugins package (a set of optional plug-ins) from the Ubuntu repositories, using Synaptic. Gedit didn't see those, either, which sent me off on a fruitless expedition of web searching.

It turns out the problem is simple, if not intuitive. There was a change in plug-in formatting (and naming conventions) between gedit 2.x and gedit 3.x. The GeditSplitView plugin requires gedit 3.x. That brings me to the daffy (to me) part. Mint, as it installs itself, uses the Ubuntu 12.10 "Quantal Quetzal" repositories. (Mint is a fork of Ubuntu.) It lists version 3.6.0-0ubuntu1 as the current version of gedit-plugins (which is what I installed unsuccessfully) ... but it comes with gedit 2.30.4-ubuntu1 (and the corresponding version of the gedit-common package) preinstalled, and lists those as the current versions. So it's giving you incompatible versions of gedit and gedit-plugins as defaults (and lists no other options).

Once I sorted that out, a quick search turned up instructions on how to uninstall gedit 2.30 and install gedit 3.6.1 in its place. After that, I was able to install GeditSplitView easily, and gedit had no trouble finding it.

Sunday, December 16, 2012

How Not to Debate Gun Control

In the wake of Friday's shooting rampage at a Connecticut elementary school that left 26 dead (not counting the shooter), 20 of them between the ages of six and seven (casualty list here; keep a full box of tissues handy if you decide to read it), there are once again calls for a debate on greater gun control in the U.S., and protests by those against it. I have my own opinions on the issue, which I will keep to myself because they are just that: opinions. Both sides of the issue are repeating a pattern of "debate" that contains several fundamental flaws.

Emotional Decisions

First, both sides are confronting a very difficult issue while emotions are running high. It is hardly surprising that a considerably body of research has shown that emotions impact decision-making in a variety of ways. While there may be some benefit to a heightened emotional state in this case -- it pushes us to take up a contentious issue when we might otherwise be tempted to "kick the can down the road" -- there is also the danger that we let those emotions trump reason. In particular, listening to one's "gut" is considerably easier than dealing with a complex, multidimensional analysis.

Reliance on Anecdotes

There is a rational analysis of the issue on Randy Cassingham's blog, along with a considerable discussion in the comments section. It illustrates the flaws I'm discussing here, including in particular the reliance on anecdotes as opposed to statistics and decision models. Some parties in favor of tighter control over guns and ammunition will argue that, had those tighter controls been in effect, this particular incident would have/might have been averted, or at least produced a lower body count. Some parties opposed to tighter controls (or opposed to tighter controls merely in reaction to this incident) will argue that other crimes of a similar nature were conducted without the use of firearms, citing in particular the 1927 Bath Township school bombings. (It happens that I live approximate four miles from Bath Township.) Both sides are relying on historical anecdotes.

Mr. Cassingham mentions closures of mental hospitals, and some commenters echo the theme that we need to address mental illness, rather than gun control. It's not clear what prompted those comments, other than what I suspect is a common assumption that you have to be nuts to murder children, but it is possible that some people are recalling previous incidents in which they believe a shooter was mentally deranged (in the clinical sense) and either was denied treatment or should have been (but was not) involuntarily committed to treatment. For what it's worth, the shooter in the Virginia Tech massacre had been diagnosed with an anxiety disorder (which, to the best of my knowledge, is not a know precursor to violence) but had been receiving treatment. Eric Harris, one of the two Columbine shooters, also suffered some emotional issues but was receiving treatment. The gunman in the 2006 Amish school shooting seems to have been (in unscientific terms) a whack-job, but an undiagnosed one.

In any case, making decisions based on anecdotal evidence is unsound. Anyone in operations research (or statistics) knows that a sample of size 1 is not a useful sample. Reliance on anecdotes also makes us more susceptible to confirmation bias, since (a) we better remember the anecdotes that support our beliefs and (b) we may unconsciously warp those memories if they would otherwise not provide confirmation.

There's a parallel here to the climate control debate. Elements in favor of climate control legislation will argue that a particular weather event (the North American drought in 2012, "Superstorm" Sandy) was the direct result of global warming, even when climatologists are scrupulous in pointing out that there is no direct causal link to a single event. Global warming naysayers will focus on specific events (recent drops in recorded temperatures, record floods from a century or more ago) as evidence that global warming is not occurring, is not a recent phenomenon, or is not exacerbated by man-made emissions.

Optimizing vs. Satisficing

Not that I really believe "satisficing" is a word, but I'll bite the bullet and use it here. Even when a problem has an optimal solution, it is sometimes the case that the time and effort to find it are not adequately rewarded when an alternative solution provides an adequate degree of satisfaction in a more timely or economical manner. Besides the anecdotal aspect, Mr. Cassingham's emphasis on the Bath bombings and the wave of school stabbings and bludgeonings in China (echoed by some of the commenters) implicitly uses the line of argument that if we cannot prevent every mass shooting by enhanced gun control (optimality), it is not worth pursuing. Many (including, I'm willing to bet, Mr. Cassingham) would consider a reduction in mass shootings, or even a reduction in the body counts, as a significant improvement (satisficing). Gun control advocates are not immune from this focus on optimality; they sometimes appear to adopt the position that any level of regulation that would fail to prevent a particular incident is insufficient.

Multi-Criteria Decision Analysis

Okay, you knew this post eventually had to tie back to operations research in some meaningful way (didn't you?). The issue of how to mitigate violence at schools seems to me to be a rather messy example of multi-criteria decision analysis. This is by far not my area of expertise, so I'll keep things rather general here. As I understand MCDA, the discussion we should be having should be framed approximately as follows:

What are our criteria for success? This includes a variety of objectives: keeping children safe; preserving individual rights; making schools conducive places for learning; maintaining a populace capable of defending itself in times of war (I personally do not subscribe to the theory that gun owners necessarily make more effective soldiers, but it should be considered); and likely other objectives I'm not seeing at the moment.
How do we quantify/measure attainment of those objectives? For instance, it's fine to say we want no more children to die at school (or perhaps no more deaths specifically from gun violence), but does that mean that a 99% reduction in such deaths has no value? What about a 1% reduction?
What are our options? We cannot have a meaningful argument about choices without knowing what those choices are.
What are the costs and consequences of each alternative? These will need to be considered in a probabilistic manner. To take a specific, if rather vague, example, suppose that one of the options bans the sale of high-capacity ammunition clips. That would not stop a shooter from wreaking havoc with one or more standard clips, nor could we be positive that no shooter ever would manage to gain access to a high-capacity clip. For that matter, we have no idea when another school shooting might take place (although, sadly, the "if" does not seem to be in doubt -- see the Wikipedia compilation of school-related attacks for some severely depressing historical evidence.). Someone will need to attempt a quantitative assessment of the frequency and severity of violent incidents under each alternative being considered. "Minority Report" being a work of fiction, we cannot claim that a particular decision will prevent or mitigate a specific future attack; we can only talk about expected benefits (and costs).
If we consider multiple alternatives in combination, how do they interact? There is no reason to assume that, in our quest to make life safer for our children, we are limited to a single course of action. It would, however, be a mistake to evaluate each course of action independently and then assume that the results of a combination of them would be additive.
How will we reconcile trade-offs? As one can see in the Wikipedia article on MCDA, there are quite a few methods for solving a multi-criteria decision problem. They differ in large part in how they address the issue of trade-offs. For instance, charitably assuming we can even measure these things, how much freedom are you willing to give up, and/or how much more are you willing to pay in taxes, to eliminate one "expected" future fatality or injury?

This is all very complex stuff, and to do it justice requires a great deal of research (by unbiased parties, if we can find them), a careful and open discussion of what we might gain and what we might lose with each possible action, and an attempt to reach consensus on what trade-offs are or are not acceptable. If the process ever reaches a conclusion, it will also take one heck of a sales job to convince the public that the result really is a good (not perfect, but better than status quo) result.

Update

Shivaram Subramanian wrote a really interesting and well-researched "Collection of Notes on Gun Control", which I commend to anyone interested in the debate. (You won't often see Louis L'Amour, Isaac Asimov and King Asoka of India name-checked in the same blog post.) He cites a Washington Post column from July, in the aftermath of the Aurora (Colorado) shooting, titled "Six facts about guns, violence and gun control" which is also very much worth reading (or, sadly, rereading).

Tuesday, December 11, 2012

Minor CPLEX Java API Backward Compatibility Issue

I just moved a Java application (that compiled and ran fine) from CPLEX 12.4 to CPLEX 12.5 and saw it suddenly sprout a couple of syntax errors (more precisely, a couple of instances of the same syntax error). This may be a case of "be careful what you wish for".

In the CPLEX 12.4 Java API, IloCplex.addLazyConstraint required an argument of type IloConstraint. In the 12.5 API, it requires its argument to be IloRange (which is descended from IloConstraint). I was looking forward to this change. Lazy constraints in CPLEX must be linear constraints, but all sorts of things (disjunctions, implications, SOS constraints) qualify as IloConstraint. The 12.4 API would let you add one of these not-so-linear constraints as a lazy constraint at compile time; at run time, CPLEX would pitch a fit. With the 12.5 API, you'll know immediately (at least if using an IDE) should you attempt to add something untoward as a lazy constraint.

That's the good news. The bad news is that adding a constraint of the form linear expression >= variable or linear expression >= linear expression (and similarly for == or <=) got trickier. There are a gaggle of overloads of the IloModeler.ge, IloModeler.le and IloModeler.eq methods for generating constraints. If one argument is an expression and the other is a number, the resulting constraint is an instance of IloRange, but if both arguments are expressions, the constraint is an instance of IloConstraint -- and looks like an illegal argument to addLazyConstraint in the 12.5 API. So the following code snippet works in 12.4 but won't compile in 12.5:

IloCplex cplex = new IloCplex();
IloLinearNumExpr expr = cplex.linearNumExpr();
IloLinearNumVar x = cplex.numVar(0, 10, "x");
// ... do something to build up the expression ...
cplex.addLazyConstraint(cplex.ge(x, expr));  // compile error here

Yes, I tell CPLEX that expr is a linear expression (and actually build it to be linear), but that doesn't mean CPLEX believes me. There is no overload of IloModeler.ge that recognizes linear expressions as such (they are recognized as expressions, IloNumExpr rather than IloLinearNumExpr).

Fortunately, the solution is simple: just explicitly cast the constraint as IloRange. For my little sample, the fix is to change the last line to

cplex.addLazyConstraint((IloRange) cplex.ge(x, expr));

Additional overloads of the constraint-building methods that output IloRange when both sides are linear (constant, IloNumVar or IloLinearNumExpr) would be nice, but I'm pretty sure the developers' to-do list contains some higher priority items.

Friday, December 7, 2012

Modeling Challenges in the Real World

One of the ongoing issues with operations research and analytics (OR/A) as a profession is that we really do not have a formal system of apprenticeships, the way some trades do. Throw in a mix of university professors who are in love with the mathematics behind OR/A but often short on experience applying OR/A in practice (I freely confess to having been one such), and we end up with a system that produces graduates long on "book learning" but destined for some shocks when they enter the (non-academic) workforce.

Recently, blogs and other online resources have begun to fill the gap in what I call "tactical knowledge", things that seldom make it into textbooks or college lectures. To that end, I'd like to highlight, and expand a bit upon, a recent blog post by Jean-François Puget titled "Analytic Challenges". He lists a number of challenges and then discusses one in detail, how to make analytics "socially and organizationally acceptable". What follows are a few observations of my own.

Choices

Puget discusses impediments to getting the front line troops to accept and implement analytical solutions. This is a very important consideration. Another, slightly different one, is that managers like choices. They often do not care to be given a single solution, even if it is "optimal". (I put optimal in quotes because optimality is always with respect to a particular model, and no model is a perfect representation of the real world.) Among the various reasons for this, some managers realize that they are being paid the "big bucks" for making decisions, not rubber-stamping something an analyst (let alone a rather opaque computer model) said. If you find this notion quaint, ask yourself how you would feel surrendering control to an automated driving system while your car is zipping down a crowded highway during rush hour.

Long ago, before "open-source software" was a recognized phrase, a student in one of my courses became so enthralled by linear and integer programming that he coded his own solver ... on a Commodore 64. (Clearly he was not lacking in fortitude.) He actually used it, in his day job (for the state Department of Transportation), to help make decisions about project portfolios. In order to get his bosses to use his results, he had to generate multiple solutions for them to browse, even when the model had a unique optimum. So he would hand them one optimal solution and several diverse "not too far from optimal" solutions. I think they often picked the optimal solution, but they key is that they picked.

Data Quality

When I taught statistics, I liked to share with my classes Stamp's Law:

The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the chowky dar (village watchman in India), who just puts down what he damn pleases.

Analytics relies on the existence and accessibility of relevant data, and quite often that data is compiled from unreliable sources, or recorded by workers whose attention to detail is less than stellar. I once had a simple application of Dijkstra's shortest path algorithm go horribly awry because the arc lengths (distances), extracted from a corporate database, contained a large number of zeros. (For instance, the distance from Cincinnati to Salt Lake City was zero.) The zeros were apparently the result of employees leaving the distance field blank when recording shipments (as opposed to, say, spontaneous appearance and disappearance of wormholes). Puget's list mentions "uncertain or incomplete data", to which we can add incorrect data. I've read estimates (for instance, here) that anywhere from 60 to 80 percent of the work in a data-based analytics project can be devoted to data cleaning ... a topic that I think has not yet gained sufficient traction in academe.

Implicit Constraints/Criteria

By "implicit" I mean unstated (and possibly unrecognized). Puget mentions this as a cause of resistance to implementation of model solutions, in the context of front-line troops finding faults in the solution based on aspects not captured in the model. Managers may also find these types of faults.

An example I used in my modeling classes was a standard production planning model (linear program), in which the analyst selects the amounts of various products to manufacture so to maximize profit (the single criterion). In the textbook examples we used, it was frequently the case that some products were not produced at all in the optimal solution, because they were insufficiently profitable within the planning horizon. I would then ask the class what happens if you implement the solution and, down the road, those product become profitable (and perhaps quite attractive)? By discontinuing the products, have you lost some of the expertise/institutional knowledge necessary to produce them efficiently and with high quality? Have you sacrificed market share to competitors that may be difficult to recover? Did you just kill a product developed by the boss's nephew? My point was that perhaps there should be constraints requiring some (nonzero) minimum level of output of each product, so as to maintain a presence in the market. Otherwise, you in essence have a tactical or operational model (the boundary is a bit fuzzy to me) making a strategic decision.

Another example I used of implicit criteria also began with a production planning model (in this case more of a scheduling application). You solve the model and find an optimal schedule that meets demand requirements at minimum cost. It also involves major changes from the current schedule. What if there were a slightly suboptimal schedule that required only minor deviations from the current schedule. To a mathematician, "slightly suboptimal" is still suboptimal. To a production foreman having to make those changes, the trade-off might seem well justified.

Mission Creep

Particularly when a model is a first foray into analytics, the users commissioning the model may have a limited scope in mind, either because they are narrowly focused on solving a particular problem or because they think anything larger might be unmanageable. Once the decision makers see the fruits of a working model, they may get the urge to widen the scope and/or scale of the model. Should that happen, modeling choices made early on may need to be revisited, since the original model may not scale well. (Coders know this process as refactoring, not to be confused with refactoring a basis matrix.)

Mission creep can be the undoing of a project. I vaguely remember stories of the Pentagon tweaking the design of a new surface combatant, while the prototype was under construction, to the point where its superstructure raised the center of gravity so high that it was allegedly unstable in a rough sea. I also vaguely remember stories of a new carrier-based patrol aircraft (antisubmarine patrol bomber?) being redesigned on the fly (so to speak) until the prototype rolled off the assembly line too large to land on a carrier. Sadly, "vaguely remember" translates to being unable to recall the specific projects. If anyone recalls the details, please let me know via comment.

Tuesday, December 4, 2012

Cinnamon Spices

Yesterday I upgraded my home PC from Linux Mint 11 (Katya) to Mint 14 (Nadia), picking the Cinnamon version. (Well, I did most of the upgrading -- I'm still tweaking things, installing applications that were not from the main repositories, etc.) A few things about the upgrade are worth mentioning:

Mint ships with LibreOffice 3.6.2 (I think); the current version, as of this writing, is 3.6.3. Either would be more recent than the version I had with Katya. I don't use most of LibreOffice, but I use Calc (the spreadsheet component) rather extensively. So I was dismayed to discover that it crashed more than half the time trying to open a particular spreadsheet file (the first one I needed to access). If I managed to get into the file, Calc crashed consistently when I tried to save it. It also crashed consistently if I tried to copy the contents (thinking that perhaps if I pasted it into a new file I might work around the problem). I tried opening and saving a few other spreadsheets, and they all were handled correctly. There's nothing special about the contents of the problematic one (a column of dates, a column of formulas, a column of numbers, some headings), nor is it a large file. A web search turned up one or two reports of crashes with Writer (the word processing component), typically on a different Mint variant (KDE desktops, I think). One person reported that the problem disappeared after an upgrade. So I downloaded and installed the latest LibreOffice, and so far the problem has not resurfaced.
Cinnamon seems to be based on JavaScript and JSON. A few of the features I used with the GNOME desktop have been removed, have not yet been replicated or are left to the reader as an exercise. Fortunately, third-party substitutes are available in some cases. A number of possibly handy applets (that plug effortlessly into the panel) are available from the Mint Spices page. One I found particularly useful is My Launcher. It provides an easily configured quick-launch list for applications I use frequently. One click expands the list, a second click launches the application. With GNOME, I was able to add the Accessories menu directly to the panel; My Launcher accomplishes the same thing, but lets me add applications that are not on the Accessories menu and remove ones I do not use often.

Wednesday, November 28, 2012

Personal Finance Software for Linux

Upfront disclaimer #1: Anything that looks like a trademark probably is a trademark. That should spare me having to insert trademark symbols every line or so.

Upfront disclaimer #2: This post is about personal accounting software. If you find accounting terminally boring, feel free to stop reading now (and, trust me, I understand).

I keep tabs of my personal finances using a copy of Quicken 2004 running on Windows. At this point, it's pretty much the only reason I ever have to boot into Windows, and it's a bit of a PITA to have to exit Linux Mint and reboot into Windows to do bills, then reverse the process to get back to work (or play). Unfortunately, most versions of Quicken seem not to work very well under Wine: ratings are heavily silver (mostly works), bronze (some features work, some don't) or the self-explanatory "garbage" with a smaller number of platinum (problem-free) and gold instances. For what it's worth, Quicken 2004 Deluxe R3 is rated "bronze", but on my system the installer crashes.

So I've been tentatively planning for a while to migrate to a Linux personal finance package. Features and stability matter, but the #1 concern for me was the ability to migrate my Quicken data to the new application, and that's the focus of this post. The best (only?) way to get data out of Quicken is by exporting to a QIF (Quicken Interchange Format) file. QIF seems to be one of those industry "standards" that isn't quite standard, so you can't depend on an application that has a QIF import function to correctly import any old QIF file. I think my QIF file contained between 11,000 and 12,000 transactions. The number is probably not important, other than to explain why I had absolutely no desire to reenter data from scratch with a new application.

Without further ado, here are the Linux apps I tested and my results of trying them. I'll list them in alphabetic order to avoid offending anyone more than necessary.

GnuCash (open source, cross-platform): This is generally the most widely cited and best rated of the Linux personal accounting programs. It has a QIF import option. Unfortunately, when it imported my Quicken QIF file, several (most?) of the account balances were off. More critically, they were off in ways I could not repair. There were old and unreconciled "transactions" that were incorrect. (I put "transactions" in quotes because in at least one case there was a split in a transaction that was processed incorrectly.) GnuCash does not allow you to mark just any transaction as reconciled; you can only mark it by actually reconciling the account. There is seldom a good reason to retroactively mark a transaction as reconciled, but correct a software error is one of them. The only way I could see to mark the problem transactions as reconciled (after fixing them) was to reconcile as of the transaction's date; but that was a dismal failure, because GnuCash takes the balance as of the most recent reconciliation as the opening balance and will not let you specify a different opening balance. Since I couldn't find a way to fix the errors, I gave up on GnuCash.
Grisbi (open source, cross-platform): Grisbi also seems to be well regarded among reviewers. Sadly, it repeatedly hung up (with no error messages) trying to import my QIF file, and had to be killed from a terminal. (Killing it left an idle process that had to be tracked down and killed separately.) So Grisbi was a non-starter for me.
Homebank (open source, cross-platform): Homebank seems to be an attractive program, perhaps with fewer features (and an easier learning curve) than GnuCash has. When I imported my QIF file, however, quite a few of the account balances were off, in some cases way off. (Trust me: my credit union would not allow me to overdraw my draft account by more than half a million US dollars.)
jGnash (open source, Java-based): jGnash offers one feature not found in any of the other programs (that I'm aware of) - the ability to use multiple currencies. It's not a feature I need, but it certainly didn't turn me off. The (Swing-based?) interface is not the most aesthetically pleasing of the lot, but it is certainly good enough. Overall, I was quite impressed with jGnash, except for the all-important QIF import feature. When I imported my QIF file, some bank transactions were missed. Unlike GnuCash, you can retroactively add a missing transaction and mark it reconciled, so I seriously considered using jGnash and just adding the missing transactions ... until I discovered that none of my securities accounts had been imported. A major reason for my using Quicken was to track security purchases (including dividend reinvestments), since Uncle Sugar has funky rules for capital gains and such come tax time. So jGnash bit the dust.
Moneydance (commercial, Java-based): Moneydance may be the most Quicken-like of the programs I tested (give or take GnuCash). It has all the features I would want and one or two (such as grouping transactions using ad hoc "tags") that I didn't know I wanted. Most importantly, it imported my QIF file nearly correctly. All but one of my bank accounts had the correct balance. In the one account that was off, there were two or three phantom transactions scattered through its history, possibly the result of incorrectly read splits. They were easy to identify, as they were the only old transactions that were not reconciled, and deleting them fixed that account. All my securities accounts seem to have been imported correctly. Moneydance is a bit on the pricey side -- it actually costs more than the basic versions of Quicken, and you can probably find a "deluxe" version of Quicken at a lower price -- but it gets the job done.
Money Manager Ex (open source, cross-platform): Money Manager Ex looked like an attractive option, but the QIF import function seems to be limited to one account at a time (and the account must exist in MMEX before you import the QIF file). That's fine for grabbing data from a financial institution that offers QIF downloads, but I had no desire to do a separate QIF export of each account from Quicken, and I'm not sure what that would have done to transactions that moved money from one account to another. So after discovering that import of the all-in-one QIF file failed, I punted.
PocketMoney desktop (commercial, cross-platform): PocketMoney is actually a mobile application (Android and iWhatever), but Catamount Software does sell a desktop version. There is a QIF import function that, from what I read on the Web, works for the mobile versions but seems not to be implemented in the Linux desktop version. I did find a menu entry to import a QIF file. It popped up a file chooser, but the only file type it allowed was "myQIFType", and would not list my QIF file even if I changed the file extension to .myQIFType. So PocketMoney was a non-starter.
Skrooge (open source, cross-platform): Skrooge is based on KDE but runs well with Gnome. It has an attractive interface and likely would have been my first choice had it imported my QIF file properly. Sadly, it refused to import the QIF file at all. Unlike Grisbi, it did not hang: it immediately gave me an error message, and the GUI continued to work properly after the message. The message, however, gave me no idea as to what in the QIF file had offended it.

So I ended up purchasing Moneydance. As a postscript, to test whether the Quicken QIF file might contain something funky, I exported my accounts from Moneydance to a QIF file and tried importing that into the other programs. In all cases, the results were identical (Grisbi hung, Skrooge failed gracefully, jGnash skipped the securities, GnuCash and Homebank muffed a bunch of balances).

Saturday, November 17, 2012

NP-dopey

I have to make an up-front disclaimer: I have a hard time stifling yawns when discussions of which problems are in NP, NP-complete, NP-hard or NP-anything come up. In fact, I have a tempered enthusiasm for computational complexity in general. The mathematics is perfectly solid, it's of theoretical interest to some people, but I find it hard to connect much of it to the practice of mathematical optimization.

A couple of things prompt this post. First, Jean Francois Puget wrote a very nice blog post about what it means for a problem to be in class NP versus class P (and what the distinction is between an optimization problem and decision problem, as it relates to complexity classification). More importantly, from my perspective, he tied the discussion to customer concerns ("Why does it take so long to solve my problem?" "Can I solve larger instances in reasonable time?"). Part of what I'm going to say here was originally contained in a comment I posted to his blog, but apparently commenting on that blog is NP-difficult; my comment shows up blank. So I'll express myself here, where I control what actually gets approved.

The second trigger was an incident at the recent INFORMS 2012 conference. During one of the coffee breaks, I had a discussion with a few colleagues about publishing optimization papers, during which at least a couple of us agreed that authors have an unfortunate tendency to say "Our problem is NP-something, so we have to use a heuristic [and here's the wonderful heuristic we concocted]." Well, the Traveling Salesman Problem is NP-hard. Heck, I think it was voted the poster boy for NP-hard problems. Nonetheless, give me a network with three nodes, and I will give you an exact solution to it. Catch me after a cup of coffee and before too many beers, and I can even do a five node problem without resorting to a heuristic. Sure enough, not long after the coffee break I was in a session where a presenter said that his problem was NP-hard, and therefore he and his coauthor were forced to use a heuristic. (The presenter was a student, so it's not too late to redeem his soul.)

The following are my personal opinions and, based on conversations I've had, not widely endorsed within the academic community:

Small instances of NP-hard problems can be solved exactly. Large instances are intractable. The same is true for polynomial problems (class P). The definition of "small" is "I can solve this easily". The definition of "large" is "this bugger takes too long to solve".
There was excitement in the academic community when it was discovered that linear programs (or at least those satisfying some conditions I never bothered to memorize) can be solved in polynomial time. As George Dantzig pointed out (and if anyone has the reference for this, please post it in a comment!), a polynomial can be a big number. If I tell you that a particular problem class has an algorithm that is polynomial with complexity $O(n^{2593})$, where $n$ is, say, the number of nodes in a network, does that make it sound easy to you? It sure doesn't to me.
Complexity results are asymptotic. To me, saying that problem A is in class P and problem B is in class NP means that as instances of the two problem grow larger and larger, I'll probably hit the barrier for what I can solve (with a contemporary computing platform, in acceptable time) sooner with problem B than with problem A. The thing is, I don't usually have problems where the instances grow and grow without bound. I have small instances that I use for testing and debugging, and I have the actual instances that need to be solved. If I can solve the instances I need to solve, I really don't care if the algorithm is polynomial time or not.
Implicit in my previous statements were some things that I think get lost sometimes when people are throwing NP-ness around. A problem class belongs to P or NP. An individual instance of that problem does not. Also, computational complexity needs to be viewed at the algorithm level. Problems in class NP are those for which nobody knows a polynomial-time algorithm. Problems in class P are those for which we do know polynomial-time algorithms. That doesn't mean you're necessarily using one. The simplex algorithm has nonpolynomial (worst case) complexity, but it is applied to linear programs -- which, as mentioned above, are class P -- quite commonly.
Computational complexity of an algorithm is a worst-case, asymptotic bound. I've already taken exception with the asymptotic part, since I don't typically worry about how computation time changes as my problems morph from huge to disgustingly huge. As far as the worst-case aspect, I don't typically expect to be solving the worst case. (Some days it seems like I am, but I'm probably just being grumpy.) I'd be more interested in knowing what complexity is on "average" problems, but even that might be deceptive - there might be something specific to my versions of the problem that would favor an algorithm that doesn't have the best "average" run times.
In complexity measurement, there are constants that typically get overlooked. Suppose we have a class of problems where we can represent the size of an instance by a single dimension $n$. Suppose further that we have two algorithms, one (A) with complexity $O(n^3)$ and the other (B) with complexity $O(n)$. It's pretty clear that want to use the linear time algorithm (B) rather than cubic time algorithm (A), correct? Well, what the complexity stuff means is that (worst case) the time/work for A is bounded by $M_A n^3$ while the time/work for B is bounded by $M_B n$, where $M_A$ and $M_B$ are constants (that we typically do not know). Regardless of the values of those constants, \[\lim_{n\rightarrow \infty} \frac{M_B n}{M_A n^3}=0,\]so for large enough $n$, B is the winner. If $M_A \ll M_B$, though, then for small $n$, A is the faster algorithm. This isn't just a hypothetical. Take sorting a list. Tree sort ($O(n\log(n))$) is asymptotically faster than bubble sort ($O(n^2)$), but tree sorts involve a bunch of overhead, so bubble sorts really are faster on short lists. If you are going to sort a large number of short lists, and you implement tree sort (because, hey, it's got better complexity), you'll pay for it.
The question of whether P = NP remains undecided, as far as I know. (There have been a few allegations of solutions in the last few years, but I don't think any have survived close scrutiny.) Like pretty much everyone, I believe they're not equal ... but if it turns out that P really does equal NP, won't all the people wringing their hands over this or that being NP-hard feel silly?
For authors of academic papers, it's fine to include a statement (and, if it's not obvious, a proof) that your problem (class) is NP-whatever. That's another tidbit added to the body of human knowledge. Just don't think it's license to launch into building some funky new heuristic without first testing standard or obvious solution approaches (assuming such approaches exist).

So, am I out to thoroughly diss computational complexity? No. First, it gives us an indication of whether a problem class is likely to be difficult to solve. I don't know a better way to classify problems into scary-looking versus not scary-looking. I just wish people would not read more into the classification than it warrants.

Second, implementing algorithms usually involves more algorithms. Solving a MIP by branch and bound? How are you going to solve the node problems (meaning what algorithm will you use)? Computing shortest paths through a network as a step in some larger algorithm? Will you use Dijkstra, Floyd-Warshall or something else? How will you sort lists of things? What data structures (which contain algorithms for insertion and retrieval) will you use? When I'm coding an algorithm, I don't have time to test all possible combinations of all possible component algorithms; so I frequently grab the option that I think has best complexity because, frankly, I don't have anything better to go on.

So, bottom line, many instances of problems in class NP can be solved, and large enough instances of class P problems cannot be solved. To paraphrase a former boss (discussing hard work), I'm not afraid of NP-hard problems. I can lie down right next to them and sleep peacefully.

Other voices (in no particular order):

re: The Humbling Power of P v NP (Managing organized complexity)
NP Or Not NP? That Is The Question (IT Best Kept Secret is Optimization)
No, The TSP Isn't NP Complete (IT Best Kept Secret is Optimization)
Ekthetikophobia: The Fear of the Exponential (Operations Research and Analytics)

Tuesday, November 6, 2012

An OM Blog Worth Reading

Since many of the three readers of my blog come from the operations research community, perhaps I should start with a quick bit of terminology. Experts in the various areas will no doubt disagree. Fortunately, none of them read this blog.

"Operations management" (OM, also known in some quarters as either "production management" or "service management") generally refers to planning and managing the process of producing goods and services. "Logistics" generally refers to the process of transporting and distributing products (and services), either on the inbound side (from suppliers) or the outbound side (to customers). "Sourcing", "purchasing" and "procurement" are synonyms for the process of obtaining materials, components and services to support production and distribution. (Never trust an academic discipline that changes its name every few years.) Integrate these three and you have "supply chain management" (SCM).

I've had a somewhat uneasy coexistence with SCM in its various incarnations and manifestations over my academic career. On the one hand, I've taught a few courses in or related to SCM, and I've coauthored (with actual SCM faculty) a few papers in the area. On the other hand, the SCM program at Michigan State University basically crowded management science out of the curriculum, and one of the professional societies to which I belong has migrated from a focus on the decision sciences to dominance by the SCM crowd (with a commensurate shift in its annual conference). Fundamentally, my reaction when I see anything SCM-oriented is almost indistinguishable from my reaction to campaign ads.

Given that, when I recommend reading an operations management blog, you can insert the tag line "man bites dog". Barry Render and Jay Heizer have written a bunch of operations management textbooks that have been well received by the OM/SCM community. They also have a blog (the subject of this post) that ties business news to OM concepts, frequently with teaching tips to help integrate the news with content from their latest book. The blog features occasional guest posts related to teaching; but even if you have no interest in teaching OM (and, trust me, I have no interest in teaching OM ever again), the blog is worth reading for insights it provides into issues of business and global competitiveness. The writing is concise and crystal clear.

As with all blogs (movies, novels, ice cream flavors ...) individual tastes will vary. If you have any interest in manufacturing, logistics or business in general, I think their blog is at least worth a look.

Friday, November 2, 2012

Producing UML Diagrams in Netbeans 7

I need to revise some Java code I wrote several years ago, and I thought it might be helpful to have some visual aids as a reminder of how it works. Sadly, when you get older, memory is the second thing to go. (The first thing is ... er ... um ...) I use the Netbeans 7.2 IDE, and I knew that Netbeans had a plugin to produce UML diagrams for Java code, so I figured it was finally time to bite the bullet and learn UML.

Good news first: The book "UML 2.0 in a Nutshell" is available as a ~~free~~ e-book. The authors assume that you know the key concepts of object-oriented programming (abstraction, classes, fields, methods ...) but not much else. I've had good luck with O'Reilly's "nutshell" series before, and this book is proving no exception. (Caveat: I'm only in chapter 2. There's still time for my head to explode.) [Update: I just found out, via the comments, that the copy I downloaded is probably pirated. So I need to reevaluate and either buy the book or switch to a truly free resource. Mea culpa: I should have checked the source more carefully. Usually I can spot pirate sites.] There is also a rather nice online resource by Scott W. Ambler [that really is free].

Now the bad news (you knew this was coming, right?): Apparently the UML plugin for Netbeans was discontinued when Netbeans moved up to version 7. What to do, what to do?

Back to good news: I found this blog post by Matthew Johnson (grad student at UNC-Charlotte), in which he identifies an alternative solution that works in Netbeans 7.x. It uses the yWorks UML Doclet, which is available as in both a free "community edition" and a paid "professional edition". (I'm using the free edition, at least until I figure out whether I'll do enough UML diagrams to warrant investigating the paid version.) The yWorks doclet embeds UML diagrams in documentation generated by Javadoc for existing Java code (when you generate the javadocs), but does not work in the other direction (producing code stubs from UML diagrams). That's fine with me; I never put enough forethought into a program to produce any sort of diagram in advance of the onset of coding.

Installing the doclet is trivial; just download it, unpack the zip archive, and park the directory somewhere you can find. Using it proved a bit problematic at first, but a combination of a tip by commenter Kai Zander on Johnson's blog and a little trial-and-error by me got it working. Here are the key steps (once the software is installed). For any Netbeans Java project:

Per Kai's tip, go to the resources folder of the UML Doclet installation directory, edit ydoc.cfg, find the tag <group name="tiling"> and change the enabled property from true to false. Otherwise, you tend to get UML diagrams with an empty swath down the middle.
Clean and build the project. (The user guide page for UML Doclet says that you need to add paths to any external libraries used by the project to the docletpath string, but I found that if you build the project first -- which copies any external libraries into the dist/lib folder -- UML Doclet will find them without needing to add them to docletpath.)
Right-click the project name, click Properties, then drill down to Build>Documenting. Add the following to the Additional Javadocs Options entry: -docletpath "<ypath>/lib/ydoc.jar" -doclet ydoc.doclets.YStandard -resourcepath "<ypath>/resources" -umlautogen, where <ypath> is the absolute path to the UML Doclet installation folder. The last switch (-umlautogen) tells UML Doclet to generate overview, package and class diagrams for all documented files. There are alternative switches to generate some but not all three types of diagrams, and there are switches that will restrict diagrams to documented files containing an @y.uml tag. If you omit all the -uml... switches, or use -umlgen but don't have @y.uml tags in any of your files, you won't get any UML diagrams because the doclet will think it has no work to do.
Now right-click the project, click Generate Javadoc, and discover that your javadocs have embedded diagrams.

Here are samples of package and class diagrams (from my command line utility):


Package Diagram

Class Diagram

Monday, October 22, 2012

Thanks a Bunch, Twitter!

At the INFORMS 2012 panel session on social media, I suggested that people interested in dipping a toe into the waters use Google Reader to watch RSS feeds of various things, including Twitter timelines. Ironically, a few days before the session (October 11 by my estimation), Twitter made API changes that broke existing RSS feeds. The feeds I was watching in Google Reader all stopped updating at the same time (last visible entries dated October 10).

Twitter unveiled their new API in September and, among other things, dropped support for RSS. RSS feeds are still available on an interim basis, but apparently they'll be gone for good as of March 5, 2013. Hopefully Google will come up with a workaround for Google Reader by then.

In the meantime, if you want to read tweets in Google Reader, you need to reload your Twitter RSS feeds. After a lengthy web search, I found a tip on this thread that, combined with something I picked up on another site (which one I no longer remember), seems to work.

I'm assuming you already have a Google account and use Google Reader. Open a new browser tab and paste the following verbatim in the URL bar:

http://www.google.com/reader/view/feed/http://api.twitter.com/1/statuses/user_timeline.rss%3Fscreen_name%3D

Don't hit Enter yet; we're not quite done. Append the screen name of the Twitter account you want to follow in Reader. For instance, to follow me (and, trust me, I do not recommend this), add parubin after %3D. Now hit Enter. If you're lucky, you'll be told that you Reader found the feed and be given an opportunity to subscribe to it.

Here's where it gets tricky. The screen name is case-sensitive; but sometimes the correct case does not work, and you have to experiment with intentional errors. (No, I'm not making that up.) I just tried to subscribe to my own feed (against my own advice above) using

http://www.google.com/reader/view/feed/http://api.twitter.com/1/statuses/user_timeline.rss%3Fscreen_name%3Dparubin

and was told my feed does not exist (news to me!). So I repeated it with

http://www.google.com/reader/view/feed/http://api.twitter.com/1/statuses/user_timeline.rss%3Fscreen_name%3DParubin

(note the incorrect capitalization of the "P") and was rewarded with the option to subscribe. In another case, I tried to repair my subscription to the feed for @ThisIsTrue. Note the camel case. Appending ThisIsTrue after %3D failed, but appending ThisisTrue (incorrect lower case on the second "i") succeeded. I'm skipping over a few unsuccessful attempts. Finding the correct single letter to muff is a trial-and-error process, happily not needed in all cases, sadly needed occasionally.

Tuesday, October 9, 2012

Detecting Multiple Optima in an LP

This is a frequently asked question: given an optimal solution to a linear program, how can you tell if there are other optimal solutions? Theory tells us that if there are two optima, there are an infinite number (any convex combination of optima is optimal). What in the output tips us off this is going on? The answer is sensitivity analysis. Any LP software worth its price will provide sensitivity output in some form. I'll illustrate using CPLEX, but what I'll say should apply, with a little translation, to any other solver.

An Example

Let's consider a simple LP with multiple optimal solutions:\begin{eqnarray*}\textrm{maximize } x\\ x & \le & 1\\ x + y & \le & 3 \\ x, y & \ge & 0.\end{eqnarray*} The feasible region and two optima (blue dots) are depicted in the following graph:

The optima occur at $(1,2)$ and $(1,0)$.

Ordinarily, the hyperplane (isoquant) corresponding to the objective function at its optimal value (the set of all points where the optimal objective value is reached) is tangent to the feasible region of an LP at a single point. Tilt it a little and it is still tangent at the same point. When multiple optima occur, the isoquant is tangent along a face of dimension greater than zero, as in this picture (where it is tangent along the vertical segment between the blue dots). Now think about what would happen in the illustration if we perturbed the slope of the objective function slightly. If we tilted it at all upward (say, maximize $x + \epsilon y$ for small positive $\epsilon$), $(1,2)$ would be the unique winner ($1 + 2\epsilon \gt 1$).

If we tilted it at all downward (say, maximize $x - \epsilon y$), $(1,0)$ would be the unique winner ($1 \gt 1-2\epsilon$).

So the tipoff that you may have multiple optima is that there is some objective coefficient which, if perturbed at all in the wrong direction, causes the current optimal solution to cease to be optimal.

Sensitivity Output for the Objective Function

I coded this model in CPLEX, entering the constraint $x\le 1$ as an upper bound on $x$ rather than as a functional constraint (but the end result would be the same either way). The Java API for CPLEX contains a method, IloCplex.getObjSA(), that takes as arguments two double precision vectors and a vector of variables, and returns the lower and upper limits for the constraint coefficients in the first and second vectors respectively. As long as any one objective coefficient changes but stays between its upper and lower limits, and the others stay constant, the current optimal solution is known to remain optimal. Here's the results of a call to getObjSA (along with the current objective coefficients):

	Objective Coefficient
Variable	Low	Current	High
$x$	0	1	$+\infty$
$y$	$-\infty$	0	0

CPLEX does not literally return values of $\pm \infty$; I'm interpreting anything with magnitude $10^{20}$ or larger as $\infty$. Note that the upper limit of the objective coefficient for $y$ matches its current value ($0$). Any increase in that coefficient, however small, jumps to a new solution. That is the signal of a possible alternate optimum.

"Possible" Alternate Optimum?

There's one small catch with this. Having an objective coefficient at its upper or lower limit is a necessary condition for an alternate optimum to exist, but not quite a sufficient one. Suppose we change our objective to maximizing $x+2y$ and add one more constraint to our example: $x+2y \le 0$. The feasible region collapses down to just the origin (which must therefore be the optimal solution), and the origin is a degenerate corner (more than the requisite number of constraints, two in this case since we are operating in $\mathbb{R}^2$, are binding).

Here is the objective sensitivity table:

	Objective Coefficient
Variable	Low	Current	High
$x$	1	1	$+\infty$
$y$	$-\infty$	2	2

This time we have two signals of a possible alternate optimum: the objective coefficient of $x$ is at its lower limit, and the objective coefficient of $y$ is at its upper limit. From the graph, though, it is clear that no other optimum can exist (since no other feasible point exists).

As with the objective function, we can get sensitivity information for the right-hand sides of constraints (IloCplex.getRHSSA()) and variable bounds (IloCplex.getBoundSA()). If any one right-hand side or bound changes but stays within the indicated range, with all other right-hand sides/bounds remaining constant, the current basis remains a valid basis (although the values of the variables will change). Violate a range and the basis may change. Degeneracy is signaled by a right-hand side or bound that has no room to move in one particular direction.

For our modified example, the sensitivity tables are as follows:

	Lower Bound			Upper Bound
Variable	Low	Current	High	Low	Current	High
$x$	$-\infty$	0	0	0	1	$+\infty$
$y$	-0.5	0	0	0	$+\infty$	$+\infty$

Constraint	Low	Current	High
$x+y\le 3$	0	3	$+\infty$
$x+2y\le 0$	0	0	1

We have three signals of degeneracy.

The Bottom Line

The first thing to look at, when trying to detect alternate optima, is the objective ranging table. A coefficient at one of its limits is a necessary condition. The next step is a bit less obvious. You can check bound and constraint sensitivity. If there are no signals of degeneracy, the necessary condition becomes sufficient: you have other optima. If degeneracy is present, however, that does not rule out alternate optima; it just means the objective ranging condition is not sufficient to be sure.

One way to resolve the uncertainty (and also find another optimum, if one exists) is to select one of the objective coefficients which is at a limit, and nudge it just a little past that limit. Then solve the modified LP, verify that the new solution is not the old solution (i.e., that you nudged the coefficient enough to jump to a new vertex of the feasible region). Finally, evaluate the new solution using the original coefficient and verify that the objective value matches the original optimal value (i.e., that you did not nudge the objective coefficient too far).

Friday, October 5, 2012

Updated Java Command Line Utility

A couple of years ago, I posted source code for a (massively over-engineered?) Java utility I wrote to parse command line arguments. Recently, I added a new capability. The utility will let you specify options using "key value" or "key=value", where keys can be case-sensitive or case-insensitive at your discretion, and the "=" separator can be changed to any character. (The utility also understands toggles and positional arguments, and can cope with excess arguments on the command line.) The new capability allows you to use the same key more than once (if you declare it to be reusable), with the values found returned in a list.

If you're wondering why I would want that feature, the specific motivation was to let me put multiple CPLEX parameters on the command line without having to declare a zillion keys. (If you don't know what CPLEX is, don't worry about it.) So a command line in one of my programs might look like

... -p TiLim=30 -p PreInd=false -p MIPEmphasis=3 ...

with "-p" translated as "here comes another bleeping parameter to set".

The updated utility (including a test program to demonstrate its capabilities) is now ~~posted here~~. As before, I'm releasing it under the Apache 2.0 license.

Update: The utility, which also includes methods to parse file specifications with wildcards and find matching files, is now available from Sourceforge under the name JCLILIB, still under the Apache 2.0 license.

Tuesday, October 2, 2012

Setting CPLEX Parameters

UPDATE: I have new versions of the code, and a new post (superseding this one) documenting how to get and use the code.

When I write Java programs that link to CPLEX (typically for research purposes), I often want to specify some CPLEX parameters, such as time limit for runs, on the command line. Unfortunately, CPLEX has about as many parameters as the US tax code has loopholes. When I know I'll only be tweaking a few parameters, I write code the looks specifically for those parameters, which is mildly tedious. For a current project, though, I want to open a wider range of parameters to experimentation, and writing code to check for each one in the command line would be brutal.

So I wrote a Java class that lets the user set any CPLEX parameter by specifying the parameter name (case-sensitive) and value in strings, the way they would come from the command line. Suppose, for example, that I have a Java program in a JAR file named myprog.jar, and I want to set a time limit of 37.5 seconds and turn off the presolver. My command line would look like

java -Djava.library.path=<path to CPLEX> -jar myprog.jar TiLim 37.5 PreInd false

The code would look like the following:

try {
IloCplex cplex = new IloCplex(); // create a CPLEX critter
// assume the command line stuff is in String args[]
CplexParamSetter.set(cplex, args[0], args[1]); // set TiLim
CplexParamSetter.set(cplex, args[2], args[3]); // set PreInd
// do stuff ...
} catch (...) {
// panic and run for the hills
}

with class CplexParamSetter properly imported and with a whole slew of exceptions it might throw correctly caught and dealt with. (The exceptions are documented via Javadoc in the code.)

I've run a few unit tests on the code, and it seems to work properly, but I make no guarantees. Feel free to download the source code for CplexParamSetter and use it under the Eclipse Public License.

Update: I just modified the code to work with CPLEX 12.5.1 (which made some changes to the Java API that affected parameter recognition). Hopefully it is backward compatible, but I've only tested against 12.5.1. Also, I moved the code from Google Drive to Bitbucket (and updated the link above).

Update: It's no longer on Bitbucket. Please use the link at the top of the post (in the red box).

Sunday, September 23, 2012

Separable Benders Decomposition

A reader (I have readers??) asked me a question about Benders decomposition of a mixed-integer program (MIP) when the linear program (LP) subproblem is decomposable: how do we glue together a mix of optimality and feasibility cuts from the subproblems? The short answer is that we don't.

Rather than repeat a lot of definitions, I'll just refer you to a previous post for some background (if you need it; chances are you don't, because if you're not familiar with Benders, you've already tuned out). I will repeat the formulation of a generic decomposable MIP, just to establish notation:\[ \begin{array}{lrclcc} \textrm{minimize} & c_{1}'x & + & c_{2}'y\\ \textrm{subject to} & Ax & & & \ge & a\\ & & & By & \ge & b\\ & Dx & + & Ey & \ge & d\\ & x & \in & \mathbb{Z}^{m}\\ & y & \in & \mathbb{R}^{n} \end{array} \]The MIP decomposes into a master MIP (containing $x$ and a real variable $z$ that serves as a surrogate for $c_{2}'y$) and a subproblem (LP) containing $y$.

Now suppose that $B$ and $E$ are both block-diagonal, with the same block structure and with $K$ blocks. We can decompose the subproblem into $K$ smaller LPs. In the master problem, we replace $z$ with $z_1+\dots+z_K$, with $z_k$ the surrogate for the objective contribution $c_{2k}'y_k $ from the $k$-th subproblem.

When we obtain an incumbent solution $(\hat{x},\hat{z}_1,\dots,\hat{z}_K)$ in the master, we pass $\hat{x}$ to each subproblem and solve. Some subproblems may be infeasible (generating feasibility cuts); others may have optimal solutions with $c_{2k}'y_k\gt\hat{z}_k $ (in which case we generate an optimality cut); and some may have optimal solutions with $c_{2k}'y_k=\hat{z}_k $, requiring no action. If any subproblem is infeasible, then $\hat{x}$ is infeasible in the original problem regardless of what goes on in other subproblems, and the feasibility cut from the subproblem is a valid feasibility cut in the master. Similarly, if $\hat{z}_k$ underestimates the objective value in subproblem $k$, then the optimality cut generated there is valid in the master regardless of what happens in other subproblems. So each feasibility/optimality cut generated in a subproblem can be added to the master problem without modification. There is no need to combine cuts (and, in fact, combining cuts might weaken them).

I'll close with two comments. First, although Benders is usually taught as "solve master, solve subproblem, add one cut, repeat", adding multiple cuts during one iteration is certainly legitimate and possibly beneficial (assuming, as in this case, that you discover multiple valid cuts). The cuts need not all be a single type (optimality or feasibility); a mix is perfectly fine. Second, while it is fine to solve every subproblem at every iteration, you may also opt to solve the subproblems in some order (perhaps cyclical?), abort the cycle the first time a subproblem generates a cut, and add just that one cut. This speeds up each cycle but delays finding cuts. I really do not know which way is better, but I suspect that optimality cuts generated from a value of $\hat{x}$ that is found to be infeasible in another subproblem may not be all that useful, and multiple feasibility cuts lopping off the same $\hat{x}$ might be a bit redundant. So I would be inclined to abort the cycle the first time I got a feasibility cut, and add just that cut, but add all the optimality cuts I found if no feasibility cut was generated.

Wednesday, September 12, 2012

Boot Bug: Anniversary Edition

A year ago July, I ran into a problem in which my AMD-64 Mint Katya system started to hang on boots. As noted in that post, the problem stemmed from messing with the video settings for the boot loader. Reinstalling the nVidia display driver might or might not have helped eliminate the problem, but eventually it sorted itself out.

A month later, I ran into a second problem linked to the nVidia driver, in which the X server decided to spontaneously reboot on occasion. I got rid of that problem by using the sgfxi script to update to the latest version of the nVidia driver.

Do gremlins celebrate anniversaries? Slightly over one year after fixing the second problem, the first one came back. It started when I tried to run a program that crashed because it could not find a way to do XGL hardware graphics acceleration. That led me to the Additional Drivers dialog in the Control Center, where I discovered that the proprietary nVidia driver was not selected. As it turns out, that's probably because the Control Center sees the one the operating system installed but not the newer one that the sgfxi script installed. I'm not sure that either version was running, though, since both should provide XGL.

Anyway, I decided to turn on the proprietary driver in the Additional Drivers dialog. When I rebooted, the machine hung at the battery test stage. (Note to self: if this happens again, do not turn on the proprietary driver. Get out of X and run the sgfxi script.) I could boot into safe mode, but no way could I boot regularly (despite repeated attempts). Safe mode worked, but it used too low a resolution, with the desktop off-center (left edge cut off, blank screen on the right), and I could not change the resolution. In Control Center > Monitors I discovered that the system could not detect what kind of monitor I had, so it gave me a wimpy default choice that I could not edit.

Eventually I booted into safe mode, went back to Additional Drivers and discovered that I now had a new option: an "experimental" driver. This was the nouveau driver, which I selected. With the nouveau driver, I could boot normally and get to the desktop, but the resolution was still wrong, and the desktop was still off-center. So I switched back to the nVidia driver, and the next boot predictably hung.

I booted into safe mode again, but this time I selected the option to repair packages, thinking perhaps one of the X packages was damaged. That was apparently not the case, but it did remind me that I had seven updates I had not installed. These were updates to the Linux core, including the X system. Normally I don't install those updates because I like to upgrade to a new version of the entire system at once. This was not a normal time, though (particularly as it was getting on 1:30 in the morning), so I let the system install the updates and rebooted again ... and everything worked. Mint automatically detected my monitor correctly, the resolution was set correctly, everything was back the way it had been. As an added bonus, the nVidia driver was being used, and the program that triggered this whole cluster worked correctly. Woo-hoo!

I subsequently ran the sgfxi script, and sure enough I was on a one-iteration-old nVidia driver. I let the script upgrade me to the latest version, and everything still seems to work properly.

There won't always be a core update waiting to bail me out, so I guess the solution process next time (a year from now?) will be as follows:

Try the sgfxi script.
If that fails, see if there's an update to any of the X packages that can be installed. If not, try reinstalling the X system.
If that also fails, swear mightily and see if there's a new version of Mint I can download.

Streaking - Automotive Edition

If you came here expecting something about driving around nekkid, sorry to disappoint you: that's not the theme of this post. (Trust me when I say that nobody wants to see a guy my age tooling around in the buff.) This post is about cleaning car windows.

Among the many things I'm not good at (a very long list), washing car windows is a prominent item. I've tried glass cleaners, bleach-based cleaners, soap-and-water, plain water, even windshield washer solvent, and inevitably the windows end up with streaks. A bit of research turned up the fact that there are special purpose sprays, allegedly used by people who "detail" cars. I'm not that anal (and too cheap) to buy them. Fortunately, I've had a bit of luck recently, which I'm writing down here so that I won't forget it.

As a starting point, if the windows are really grungy, it's probably a good idea to wash off the bulk of the dirt with plain water, blotting stuff up with paper towels so that you don't leave mud caked on the windows. (Between endless road construction and my driving with the sun roof open, this is a nontrivial consideration, both exterior and interior.)

Next comes the secret elixir: vinegar and water. I read someplace that white vinegar works best, but I've been getting good results with apple cider vinegar, which is what I happen to have in the kitchen. The exact proportions of the mixture are a mystery to me, but I think it's safe to say more than half should be water. (Note to self: less vinegar next time. We're not making salad dressing.)

The other key tool is newspaper. For whatever reason, newpaper works better than even paper towels at both absorbing the dirt and not leaving streaks. I worried at first that the ink would bleed onto the windows, but that does not seem to be the problem. After a few brief experiments, it appears that (as with other uses) the New York Times does better than my local paper, but either will work. I'm not sure if really conservative papers will work as well; they may insist that you outsource the job. In any event, you'll want to read the paper before washing the windows with it.

One or two web sites talked about devoting a spray bottle to applying the mix, but that strikes me as overkill. Wad up one sheet of newsprint, dip one end in a bowl of the mix, and apply it. Wipe, and use the dry end to blot up any excess. Repeat as needed.

The results are not perfect, but they're better than anything I've achieved in the past.