coping chapter 9

9 Genetic Algorithms

...even as late as 1957, "algorithm" did not appear in Webster's New World Dictionary. The closest word to appear in dictionaries of that time was "algorism," which means the process of doing arithmetic using Arabic numerals. The word algorithm appears as the result of confusing the word arithmetic with the name of the Persian mathematician Abu Jafar Mohammed ibn Musa al-Khowarizmi (c. 825)....before it came to have its present meaning of a well-defined procedure for computing something.

George Markowsky (1)

A genetic algorithm is a computer program that is designed to work in much the same way that biological evolution does. The results are not attempts to arrive at optimal answers, but fittest solutions. A recent article on the front page of the Wall Street Journal (2) describes well the nature and uses of genetic algorithms as Aids to Learning:

The cold, digital domain of silicon-based technology is drawing inspiration and new ideas from an unlikely source: the living, breathing realm of nature. Companies and scientists are turning to a wide variety of natural models-from the way salmon migrate to how the human body fights viruses to evolution-for new approaches to problem-solving. Spurring this unusual alliance is the realization that many problems, similar to the ones humans want to solve, have already been cracked by Mother Nature. Nature offers a "huge library of design metaphors," says John Hiles, president of Thinking Tools Inc., a Monterey, Calif., company that develops software based on natural analogies. "It's opening up a wide range of possibilities." That's why scientists have studied the human immune system for tips on how to protect against viruses of another kind-those that invade computers. Typical virus-detection programs check software against a database of known viruses. That can let unknown viruses-for which there are no database matches-easily sneak through.

Alien Molecules

To find a new approach, Stephanie Forrest, a computer science professor at the University of New Mexico, has teamed up with Alan S. Perelson, a theoretical immunologist at Los Alamos Laboratory. Together they have created software that attacks unrecognized computer viruses by imitating a neat trick by which immune systems identify alien molecules. The human immune system uses a class of protein-attacking cells known as T-cells. Early in life, the system destroys those T-cells that would attack the body's own proteins, while keeping all other T-cells. This unusual process, known as "negative selection," lets the immune system identify intruders it has never encountered before. When T-cells run into a nonbody protein on the surface of an invading virus or a foreign cell, they sound the alarm for other parts of the immune system to arrive and do battle. Professor Forrest and researchers at Interval Research Corp., a small research-and-development company in Palo Alto, Calif., have built their own "T-cells" -from strings of computer code. As with the immune system, their approach evolves the strings so as to keep only those that are extremely sensitive to "foreign" computer code.

Virus Killers

To test them, researchers unleash both known and unknown computer viruses into a network. The digital T-cells roam around seeking out foreign code and setting up an alarm: A window pops open upon the screen, warning, "A change has been detected." The software then displays the file where the virus lies. "I really believe that our computer systems are so complicated, we can't use them effectively till we make them look more like biological systems," Prof. Forrest says. Interval Research hopes to release a commercial version of the antivirus program in a year or two.

So what, exactly, makes nature such a wizard problem-solver? Her "reckless and random" ways, says David Liddle, a co-founder of Interval and chairman of the board of trustees of the Santa Fe Institute, which supports nature-inspired research. An interesting term for complexity. Because humans rely on logical processes, they consider a fairly narrow range of solutions, he argues; nature, on the other hand, takes a sprawling trial and error approach that tests many more potential solutions. "Our view of computer science is rational, mechanistic. But nature winds up doing things in a way we'd never think of," Mr. Liddle says.

Arms Race

Other scientists have caught the bug. Mr. Hiles of Thinking Tools is borrowing from bacteria to create a computer simulation to teach managers how to handle complex projects, such as building an airport. Traditional software rarely does a good job of imitating the unpredictability of events in the real world. To make his simulator more challenging, Mr. Hiles programmed it to behave like a nasty bacterium, so that the simulator engages in an escalating "arms race" against its host, the user.

Just as the tuberculosis bacteria, because it constantly mutates, ends up developing resistant strains against antibiotics, the simulator keeps changing its responses to a user, manipulating the information and throwing out an unexpected difficulty that the trainee must then tackle. "If a user persists in making a certain kind of mistake, the computer can ruin the operation. So he's pressured to learn," says Mr. Hiles. He hopes to make the simulator commercially available sometime in 1996.

Elsewhere in the world of software, scientists at AT&T Bell Laboratories are on an ambitious quest: to create software that can write itself and solve a problem. But this work isn't under the direction of a software programmer. It is guided by a physicist, Andrew Pargellis. In this pursuit, Mr. Pargellis has built the digital equivalent of the "primordial soup," the mishmash of chemicals from which all life is said to have originated. Dubbed Amoeba, this artificial world consists of 1,000 rectangles that flit about on a computer screen. Each rectangle represents a piece of randomly generated software code that contains certain mathematical instructions. Since the original primordial soup spontaneously gave rise to life, Mr. Pargellis reasons, why can't an artificial soup spontaneously generate the answer to the problem? So Mr. Pargellis sets a test problem for each rectangle: Copy thyself.

Evolving Rectangles

At first, the rectangles flit about on the screen chaotically. No one rectangle carries all the instructions to execute the command. But some contain bits of code that, when combined with that of others, would let them pull it off. Sooner or later, one rectangle-a red one, say-gathers up the right instructions to let it copy itself, albeit clumsily, taking five or six separate steps. Still, red rectangles begin to proliferate, dominating the screen. Then random mutations-an alteration in the code in certain inhabitants-begin to help; although most mutations are harmful, eventually a mutation will be beneficial, letting a blue rectangle, say, copy itself in fewer steps. This means the blue rectangles are more "evolved," and they eventually take over the screen. Other mutations could let still another colored rectangle copy itself even faster.

The point is that the software itself, long after Mr. Pargellis gave it vague instructions, keeps seeking out better ways to carry out the directive. "The implication is that one could define a problem and let software evolve to solve that problem," Mr. Pargellis says. So far, Amoeba has only proved itself in solving elementary arithmetic problems. But someday, this kind of self-writing software may handle the burden of creating millions of lines of base code, letting human developers focus on the hard stuff.

Ones and Zeros

One of the first approaches to borrow ideas from evolution came in the 1950s-"the genetic algorithm," invented by John Holland at the University of Michigan. Dozens of companies have applied this technique to a range of programming problems. Prof. Holland's main insight: The way ones and zeros are strung along a piece of computer code is similar to the way genes are strung along a chromosome.

In evolution, the best solutions-the traits that let a species survive-win out after long periods in which organisms occasionally pass on mutant genes to offspring. These offspring with beneficial mutations tend to thrive, and those with insufficient traits tend to die out. Prof. Holland figured that, in computers, if you combine and recombine strings of ones and zeros in a similar way, they would yield ever-better solutions. When he proffered this notion about 40 years ago, "it was greeted with resounding indifference," he recalls. Critics would joke, "You don't have enough time to imitate evolution."

But the genetic algorithm has become a hit. Moody's Investment Service uses it to farm out hundreds of computer-service jobs. Organizers of the 1992 Paralympic Games used it to schedule events, LBS Capital Management of Clearwater, Fla., uses the algorithm to help pick stocks for a pension fund it manages. "Three billion years of evolution can't be wrong," says David Goldberg, an engineer and genetic-algorithm pioneer at the University of Illinois at Urbana-Champaign. "It's the most powerful algorithm there is."

Making a Face

One of the more unusual applications is the FacePrints project, which uses a genetic algorithm to help witnesses describe and identify criminal suspects. Witnesses who are interviewed often can't conjure up a suspect's individual features; they are much better at recognizing entire faces. The FacePrints project runs through random illustrations of faces on a computer screen, combining and recombining features dozens of times until the best solution emerges. "A particular face has on one billion possibilities. This lets us search an enormous 'face space' very quickly," says the FacePrints inventor, Victor S. Johnson, a psychology professor at the New Mexico University at Las Cruces.

His technology got a real-world test after one of his students, Craig Caldwell, and two companions were robbed of $22 outside a restaurant. They took turns at a computer. It generated 30 random pictures of faces, which were rated in order of likeness to the assailant, then threw up 30 more, which were rated again. The process was repeated dozens of times "until I was satisfied the picture resembled the criminal," Mr. Caldwell says. "The final three images that the three witnesses separately created were strikingly similar."

How did FacePrints do it? Using a genetic algorithm, the software "evolved" the picture. FacePrints consists of hundreds of individual features-a hooked nose, a bushy brow-and each is written in a digital string of computer code. When Mr. Caldwell rated the first 30 random faces, slapped together from 30 arbitrary combinations, he was performing the FacePrints equivalent of natural selection. The picture with the highest rating was then "bred" with another picture to produce a new choice; to make room for it, the software "killed off" a few less-likely pictures-and so on, so that each "new" population of 30 faces was slightly "fitter" than the last. In the Caldwell case, the resulting picture was printed in the local paper. Although the bandit remains at large, the police were impressed. "Their picture had more detailed features than our pictures, " says Kay Hernandez, a Las Cruces police detective. Prof. Johnson is setting up a company to commercialize his invention.

Directing Traffic

Sometimes the models offered by nature are fuzzier. At Texas Instruments Inc., researchers in collaboration with Thinking Tools are building a computer system to help shipping companies efficiently dispatch goods to far-flung areas. Their inspiration? The navigation skills of salmon. Just as a salmon finds the way to a spawning ground, so might thousands of packages each 'seek' their own best routes to particular destinations. At big shipping companies, traffic is typically controlled by a central computer. When volume soars, the system can fail. TI researchers figure each crate could have a small screen that tells baggage handlers where it is headed. A built-in sensor would wirelessly pull up relevant data from the shipper's computer. Warned of an accident on a particular route, the crate would independently search out the next best path and display new instructions on its screen.

Just as nature can offer new approaches, so can experts in one field, when they cross-pollinate their skills in another discipline. At AT&T Corp.'s computer division, Kenneth L. Reed works on developing sales-forecasting models for retailers. Yet his specialty is ecology. As a Yale professor, he and others modeled forest growth by measuring the overall sunlight falling on a given area-and got better results than when they tried extrapolating from the amount of light hitting an individual leaf. [This is a good example of coarse-graining in order to get a "crude look at the whole."] Mr. Reed now is on a loftier quest. Using a technique called "evolutionary programming, " he hopes to create software that can produce the perfect product mix for each store in a big chain. "We haven't tried it yet," Mr. Reed says, "but we think it will work." He laughs. "We're looking for guinea pigs."

[Reproduced by permission of The Wall Street Journal via the Copyright Clearance Center.]

The actual workings of a genetic algorithm are described in the following example by LtCol Steven M. Rinaldi. The subject is the selection of target sets for an air strike involving some mix of electrical and petroleum networks. As you can see, the process follows closely the nonlinear biological process of evolution. (4)

In the natural world, the fitness function of an organism is a measure of its ability to survive in a given environment. Reproduction, exchange of genetic material, mutations, and natural selection change the genetic code of successive generations of the organism, either improving their positions on the fitness landscape or not. A genetic algorithm (GA) uses the same basic processes to evolve optimal solutions to problems inside a computer. Like its organic counterparts, the GA creates "generations" of solutions that progressively move toward the global maximum of the fitness function. In solving the problem, the GA mimics naturally occurring biological processes....

A principle element of a GA is the gene string or genotype. The simplest and most general prototype occurs in the binary combinatorial optimization problem. Here, there are n discrete elements or variables, such as n potential targets. Let ai represent the ith element. Since the elements are binary, they can take only one of two values, 0 or 1 (on or off, attacked or not attacked, etc.). Concatenating the elements in a string yields a binary variable a1, a2, a3......an...correspond(ing) to a point in the configuration space. For example. 0111001011 and 1101010011 are two points in the configuration space of an n=10 binary problem. In an analogy with the biological case, the string...represents a chromosome (genotype), each bit position of the genotype corresponds to a gene, and each gene represents the state of a particular discrete element. Consequently, the genotype represents all of the possible system configurations.

A second key element of the GA is the fitness function...the embodiment of the problem at hand... In the targeting problem, the value of the fitness function denotes how well a given targeting solution...meets the commander's requirements ...Clearly, the fitness function will change for every targeting scenario. In all scenarios, the GA will attempt to "evolve" high fitness targeting solutions.

The operation of a GA parallels the biological processes of selection, reproduction, and genetics...The algorithm begins by creating an initial population of individuals. That is, the routine generates m values of a1, a2, a3......an...Each individual is a trial solution to the optimization problem...Once the program has created the population, it is ready to pass to the reproduction step. As the name implies, the reproduction step creates the next generation of individuals. First, the routine evaluates the fitness function for each individual. The fitnesses determine whether an individual survives to the next generation or dies out...The average fitness of the successive generation is generally higher than that of the previous one.

Following reproduction and selection, the algorithm performs two crucial operations. The first is crossover, in which two individuals swap blocks of genetic material...In the second operation, a mutation operator selects an individual at large and then randomly flips one of its bits....Crossover and mutation are important steps that maintain the diversity of the population as well as allowing the algorithm to sample large regions of the configuration space....

The above description sketches a highly simplified picture of the main steps of a GA. Researchers have modified this simple routine in many ways, adapting it to a variety of problems.

The problem centers on the notional electrical and POL [petroleum, oil and lubricants] networks of some hypothetical country. Following the output-based targeting philosophy, the friendly commander has decided that certain sectors of the electrical grid and POL networks must be destroyed. Their elimination will hamper enemy efforts: integrated air defenses and communications networks will suffer from power outages, electrified rail transportation for mobilization will shut down, motorized transportation will be hindered from the loss of POL resources, and so forth. The adversary can use backup electrical power generation and stockpiles of POL to overcome some of the immediate losses of economic resources. However, we are also interested in the synergistic effects that arise from the couplings between the networks.

In more concrete terms...the commander has decided that the electricity and POL pipelines must be shut down in the eastern half of the adversary nation. To facilitate reconstruction efforts after the conflict, those elements targeted for destruction must be repairable within six months. This restriction eliminates certain potential targets, such as generators and their step-up transformers. Furthermore, we assume that the ROEs constrain the attack sorties to the eastern third of the nation. The problem thus poses objectives as well as several constraints. The fitness function must include all of these considerations... The program employs a genetic algorithm to evolve targeting solutions that meet the commander's requirements.

The Genotype

Our notional targeting problem is an example of binary combinatorial optimization. The electrical grid and POL network consist of n components (lines, buses, transformers, generators, pipeline segments, compressor and pump stations, etc.), where n is some large integer. Each component can be in one of two states: targeted (and assumed destroyed during an attack) or untargeted. If the variable ai is the state of the ith component, then it takes one of two values, 0 for untargeted (undamaged) and 1 for targeted (destroyed). The state of the entire electrical grid and POL network is then represented by the genotype a1a2a3...an. The genotype takes the particularly simple form of a binary variable.

The genotype will be long if the number of components n is large. However, if the number of targets that can be attacked is limited (by available aircraft, munitions, etc.) and is much less than n, then the genotype will be sparse. Compression of the genotype information will reduce the storage requirements, especially if the population size m is large. For example, in a very sparse genotype, it is only necessary to store a set of pointers that indicate which components are targeted, rather than storing information about each component.

The Fitness Function

The fitness function f is arguably the most important part of the routine. It is the embodiment of the targeting problem, and as such must incorporate the commander's objectives and all constraints and restraints...considerable care must go into its development.

Each electrical grid and POL network component has an associated set of rewards and penalties. In keeping with the commander's desires, every electrical grid and POL network component in the eastern half of the country that shuts down as a result of the attack accrues a positive reward. Likewise, every eastern transmission line or pipeline that is still operational after the attack incurs a negative penalty. Any targeted facilities in the western two-thirds of the nation will also incur negative penalty. Note that there is no penalty for components still running in the western half of the nation. Some components may be weighted more heavily than others...For example, if the commander determines that destroying the electrical grid is more important than shutting off the POL flow, the electrical grid rewards would be correspondingly higher than those for the POL network. Note that the values of the penalties and rewards may require tuning to improve the convergence of the GA...[The table on the next page] lists the rewards and penalties for our particular problem.

Each component, then, has an associated set of weights. The weights form a vector...ri, si, ti, ui, vi, wi. Using the weights from the table, a destroyed electrical grid component on the eastern border of the country with a repair time of two years (such as a step-up transformer) would have (100, 0, 0, 0, -25, 0) as its vector. A destroyed pump station with a four-month repair time located in a restricted flight zone but in the eastern half of the country would be characterized by (0, 50, 0, 0, 0, -100). If the same pump station is unattacked but nevertheless shut down, its vector becomes (0, 50, 0, 0, 0, 0). Similarly, every component in the data base has a weight factor.

The fitness function is given by the sum of the rewards and penalties over all grid elements. The maximum value fmax is simply the sum of the rewards (ri+si) which occurs when all components in the eastern half of the country are down, and no constraints have been violated. With this particular fitness function, the program must attempt to find a target set that maximizes f. (3)

Program Logic

In general terms, the algorithm is composed of a nodal analysis section and an optimization routine. The nodal analysis section performs load-flow and hydraulic analyses, and incorporates the interconnections between the two systems. This section draws heavily upon the database. The optimization routine computes the fitness function values and generates new target sets for evaluation. Figure 9.1 illustrates the flow of the algorithm.

The nodal analysis begins after some initial set of targets are generated. The initial target set can be randomly generated, determined by some algorithm, or input by the planner. (Each individual in the population pool is, in essence, an attack plan. The value of the genotype indicates which components are attacked or bypassed.)

The manner in which the electrical grid-POL system linkages are incorporated merits further discussion. The program uses an iterative technique to determine the synergistic results of an attack. First, the routine simulates the attack by "removing" any targeted components from the database. The result is a "post-attack" database used in the ensuing nodal analyses. This database (or genotypes) reflects the state of the electrical grid and POL distribution system immediately after the attack.

Second, the routine performs separate nodal analyses of the two elements. In each step, the program analyzes each element in isolation from the other. In essence, the program calculates the effects of the damage on each element without regard to synergistic couplings. The analyses determine the components that shut down due to the attacks. Any such electrical grid or POL pipeline component is removed...(leaving) only those components still functioning in the isolated economic elements.

Third, the routine reconciles the effects of the couplings between the two elements. For example, if electricity is lost to a substation that feeds a pipeline pump, the pump ceases to function. Although the pump was not directly attacked, the loss of electricity causes the pump failure. The routine then removes the pump from the post-attack data base. Similarly, if the natural gas line feeding a gas-fired electrical generator shuts down, the electrical generator drops off line...At this point, the program has removed any components that either were destroyed in the attack, "failed" during the isolated nodal analyses, or shut down due to synergistic couplings.

Fourth, the program repeats the nodal analysis-reconciliation steps...In essence, the program calculates the cascading failures within and between the two elements of the model... Eventually, the routine will converge to a post attack data base that undergoes no further changes. This database represents the final operating state of the model after the attacks. It includes the results of the synergistic couplings and cascading failures. Therefore, the last nodal analysis yields the final state to which the coupled economic elements deteriorate. The optimization routine commences by determining the fitness f of the target sets. If f is sufficiently high or if the results of the attack achieve the commander's requirements, the routine terminates. Otherwise, the routine generates a new set of targets and iterates.

Next - Chapter 10