From jan Tue Jan 11 09:41:01 1994 To: adam, finley, hph, jat, john-t, marcel, robert, rustan Subject: FYI >From nntp-server.caltech.edu!elroy.jpl.nasa.gov!swrinde!cs.utexas.edu!uunet!news2.uunet.ca!wildcan!sq!msb Tue Jan 11 09:34:16 PST 1994 Article: 7970 of comp.programming Newsgroups: comp.programming Path: nntp-server.caltech.edu!elroy.jpl.nasa.gov!swrinde!cs.utexas.edu!uunet!news2.uunet.ca!wildcan!sq!msb From: msb@sq.sq.com (Mark Brader) Subject: Re: Disastrous Syntax Message-ID: <1994Jan11.084246.18559@sq.sq.com> Organization: SoftQuad Inc., Toronto, Canada References: <2gs6nd$sf5@motmot.doc.ic.ac.uk> Date: Tue, 11 Jan 94 08:42:46 GMT Lines: 164 Manor Askenazi (ma6@doc.ic.ac.uk) writes: > "Were you ever involved in a situation where a LOT of money/time/health > was lost over a simple syntax error in a program (the kind that cannot > be detected by a compiler - more like a misunderstanding than an error) ? Manor, it would be clearer if you avoided the term "syntax error" here, since this normally refers to exactly the sort of error that you say you *aren't* interested in. No, I haven't been involved in such a situation, but I've certainly read about them. I suggest that you explore the archives of the newsgroup comp.risks, also known as the Risks Digest. The archives are available for anonymous FTP on crvax.sri.com (128.18.30.65) in the directory RISKS: (note the colon in the name; this is apparently a VMS system). Unfortunately, everything there is in one directory and identified only by issue numbers, making it kind of tricky to browse; if your system has enough megabytes free, it may be easiest to retrieve *everything* first. The *.00 articles are indexes to the others, but refer to them by Subject lines, whose meaning may not be apparent at first. A subset of comp.risks articles appear in hardcopy form in the ACM SIGSOFT publication Software Engineering News, but of course you can't grep in that! There is also a book publication upcoming. > An example of this would be the loss of a satellite (Mariner???) over a > "-" or a "," which was apparently discussed in this newsgroup 1-2 years > ago. ... Any pointers to the Mariner incident would also be appreciated... Here's my canned article on the subject. Issue numbers refer to comp.risks. ---------------------------------------------------------------------- The space probe that the DO-loop story has been wrongly attached to is Mariner I (or 1), which was intended for Venus (not Mars). Several incorrect or partially correct versions of what really happened were posted in comp.risks; the best of these cited a NASA publication called "Far Travelers" by Oran W. Nicks, but still did not have the whole story. Then in issue 8.75 we found out what really happened... | Date: Sat, 27 May 1989 15:34:33 PDT | From: Peter Neumann | Subject: Mariner I -- no holds BARred | | Paul Ceruzzi has written a truly outstanding book for the new show | that opened two weeks ago at the Smithsonian National Air and Space | Museum. The exhibit and the book are both entitled "Beyond the Limits | -- Flight Enters the Computer Age". Both are superb. Go for it (them). | | Paul has dug into several cases treated previously in RISKS and in | issues of the ACM Software Engineering Notes, and has been able to | resolve several mysteries. In particular he considers the case of | Mariner I, about which various inaccurate stories have been told. | Intended to be the first US spacecraft to visit another planet, it was | destroyed by a range officer on 22 July 1962 when it behaved | erratically four minutes after launch. The alleged missing `hyphen' | was really a missing `bar'. I quote from Paul's book, pp. 202-203: | | # During the launch the Atlas booster rocket was guided with the help | # of two radar systems. One, the Rate System, measured the velocity of | # the rocket as it ascended through the atmosphere. The other, the | # Track Ssytem, measured its distance and angle from a tracking | # antenna near the launch site. At the Cape a guidance computer | # processed these signals and sent control signals back to the | # tracking system, which in turn sent signals to the rocket. Its | # primary function was to ensure a proper separation from the Atlas | # booster and ignition of the Agena upper stage, which was to carry | # the Mariner Spacecraft to Venus. | # | # Timing for the two radar systems was separated by a difference of | # forty-three milliseconds. To compensate, the computer was instructed | # to add forty-three milliseconds to the data from the Rate System | # during the launch. This action, which set both systems to the same | # sampling time base, required smoothed, or averaged, track data, | # obtained by an earlier computation, not the raw velocity data | # relayed directly from the track radar. The symbol for this smoothed | # data was ... `R dot bar n' [R overstruck `.' and `_' and subscript n], | # where R stands for the radius, the dot for the first derivative | # (i.e., the velocity), the bar for smoothed data, and n for the | # increment. | # | # The bar was left out of the hand-written guidance equations. [A | # footnote cites interviews with John Norton and General Jack Albert.] | # Then during launch the on-board Rate System hardware failed. That in | # itself should not have jeopardized the mission, as the Track System | # radar was working and could have handled the ascent. But because of | # the missing bar in the guidance equations, the computer was | # processing the track data incorrectly. [Paul's EndNote amplifies: | # The Mariner I failure was thus a {\it combination} of a hardware | # failure and the software bug. The same flawed program had been used | # in several earlier Ranger launches with no ill effects.] The result | # was erroneous information that velocity was fluctuating in an | # erratic and unpredictable manner, for which the computer tried to | # compensate by sending correction signals back to the rocket. In fact | # the rocket was ascending smoothly and needed no such correction. The | # result was {\it genuine} instead of phantom erratic behavior, which | # led the range safety officer to destroy the missile, and with it the | # Mariner spacecraft. Mariner I, its systems functioning normally, | # plunged into the Atlantic. The DO-loop incident did happen at NASA, and at about the same time. As told by Fred Webb in alt.folklore.computers in 1990: | I worked at Nasa during the summer of 1963. The group I was working | in was doing preliminary work on the Mission Control Center computer | systems and programs. My office mate had the job of testing out an | orbit computation program which had been used during the Mercury | flights. Running some test data with known answers through it, he was | getting answers that were close, but not accurate enough. So, he | started looking for numerical problems in the algorithm, checking to | make sure his tests data was really correct, etc. | | After a couple of weeks with no results, he came across a DO | statement, in the form: | DO 10 I=1.10 | This statement was interpreted by the compiler (correctly) as: | DO10I = 1.10 | The programmer had clearly intended: | DO 10 I = 1, 10 | | After changing the `.' to a `,' the program results were correct to | the desired accuracy. Apparently, the program's answers had been | "good enough" for the sub-orbital Mercury flights, so no one suspected | a bug until they tried to get greater accuracy, in anticipation of | later orbital and moon flights. As far as I know, this particular bug | was never blamed for any actual failure of a space flight, but the | other details here seem close enough that I'm sure this incident is the | source of the DO story. Project Mercury's sub-orbital flights were in 1961, and its orbital flights began in 1962. I forwarded the above to comp.risks, slightly abridged, and it appeared there in issue 9.54. The erroneous claim that the DO-loop bug was the bug that killed Mariner I apparently originated with, and certainly was propagated by, the book "Software Reliability: Principles and Practices" by G(lenford) J. Myers (John Wiley & Sons, 1976). I haven't read it myself; I've seen the page numbers 7 and 275 attributed to the assertion. I expect both are right. This book also describes the bug as a "billion-dollar error", which is too large by a factor of about 50. In some earlier postings it was suggested that Myers be located and asked about his sources (the book gives none), but nobody successfully did this; his employer at the time of publication didn't have his current address. My guess is that he simply made an error or more likely accepted someone else's wrong recollection, and didn't feel it necessary to go to original sources to verify what was only an illustrative point anyway. Quoted items in this article have been reformatted but not abridged. Original text is in the public domain. -- Mark Brader "... i will have hideous nightmares involving huge Toronto monsters in academic robes carrying long bloody utzoo!sq!msb butcher knives labelled Excerpt, Selection, msb@sq.com Passage and Abridged." -- Helene Hanff