Stupid mistakes part 94

How one simple endless loop can gum up your entire website.

For the Neopets part of this site, I have a script that fetches a remote page and parses it for information. Now, sometimes that site is down or broken, so if it doesn’t find the information it needs, it tries to fetch the page again. Which is fine as long as the problem is a temporary hiccup at the remote site.

If the problem is that the remote page has changed format, then the parsing is always going to fail. So the script keeps on running, keeps on complaining that it can’t parse the page (spewing out the resulting page as a warning). The script is run from crontab (the Unix scheduling program) four times an hour, so many copies of the script start piling up. The machine grinds to a halt.

So I killed all of those copies of the script, and fixed it so it wouldn’t repeat more than 5 times, ever. The load on the machine went down to manageable levels.

Then I checked my mail. All output from scripts run from crontab are sent to the user’s email address, and some of those scripts had been running a long time, and kept on printing out long HTML dumps.

The resulting mail was 3Mb.