5.0 Developmental Simulator
The purpose of this project is to build a system that uses the principles of developmental biology to grow neural networks from digital chromosomes. Populations of these chromosomes can then be bred using evolutionary techniques to create neural nets that produce desired behaviors. Each chromosome in the population will go through the development process to produce a neural network where some of the neurons are attached to motor outputs and some are attached to sensory inputs, and the rest are attached to each other. The connection strengths of all these neurons will be regulated by the genes in their cells to allow things like learning to occur. Currently, only the nervous system of the organism will be grown from the chromosome. At some future date the body plan may be grown simultaneously. The simulated insect robot will then be placed in a virtual environment that reproduces the forces of nature like friction, gravity, torque, and so on. The behavior of the robot will be completely determined by this grown neural network. When motor neurons are fired they will move appendages, and when sensory neurons fire they will be detecting something in the virtual environment like light, pressure, or temperature. Certain goals will then be given to the robot. At first it will be very simple things like can it move its legs? Then the goals will get more and more complex. For example, can it move its legs together in a coordinated fashion? Can it stand up? Can it walk forward? etc.. The neural network developed by each chromosome in the population will be tested to see how well it can perform these tasks, and a quantitative fitness function will be assigned to it based its performance. Those chromosomes that produce neural nets that perform better are more fit and will have a greater chance of being reproduced into the next generation. Over several generations the neural networks that are grown will become better and better adapted to producing the desired behavior. Once it produces something that performs good enough for one goal, like moving its leg, the fitness function will be changed to move to the next, more challenging stage like moving all of the legs in a coordinated manner. Slowly, more complicated behavior will be evolved until you eventually get the final behavior you wanted. Which in this case is an insect robot that can walk on its own.
2. Developmental Biology
How does nature start with one tiny cell and use that to produce an organic machine like the human body with trillions of cells all working together to keep us alive? How does the DNA in our cells actually grow our bodies? These are some pretty deep questions that scientist have been struggling over for a very long time. Luckily, in the last few decades we have made incredible progress in beginning to finally understand how these processes take place. As you can imagine this is a very complex topic and therefore I am not going to go into any real level of detail here. If you are truly interested in how organisms develop then you need to read some of the references I list, or better yet take a class. But I do want to touch briefly on some of the basic principles and describe which of those principles are used by this simulator, and why some were used and some were left out.
3. DNA. Is it a blueprint?
When you build a house you start with a set of blueprints that describe the layout of the house. They tell the construction crew where everything will go. Wall A starts at position X and is Y feet long. There will be an electrical socket on wall A exactly at position X. The entire structure of the house is specified within the blueprints. This has the advantage of making everything crystal clear. The builders know exactly where everything goes and there are, usually, no mistakes. However, what happens if the prospective owners come back and say they want to triple the number of rooms in the house. After the architects finish cursing them, they will have to go back to the blueprints and fill in all of the details for these new rooms down to the last centimeter. By tripling the number of rooms you have tripled the information in the specifications. Blueprints for the house that might have been 5 pages before, now suddenly take up 15 pages. Now just think about what would happen if you increased the number of cells a million fold. You would be killing a lot of trees to describe the plans for your gigantic mansion. But this is in essence the problem that organisms have in development. Each animal is made up of billions of different cells. If DNA was literally a blueprint that told the exact type and location for every cell in the body then the amount of information that would need to be encoded in them would be truly astronomical. Instead, our DNA is more like a recipe. They are a set of instructions for how to create a body. This is referred to as epigenetic development. When you bake a cake you start with a set of initial ingredients and then follow a set of rules that results in transforming those ingredients into a cake. Development is similar. The egg gets some initial patterns laid down in it from its mother, and it gets the basic machinery of life. These are the initial ingredients. The embryo then follows a set of rules that are governed by the genes that controls things like cell division, death, differentiation, shape, movement, and so on. Each step builds on the results of the previous steps. So instead of having to specify the location and type of billions of cells, it just has to come up with general rules like 'divide 1000 times', 'if you do not get factor x then kill yourself', 'begin moving up this concentration gradient till you contact chemical A'. The exact number of cells, and the fate of any individual cell, is unimportant. It is the overall results that count.
Another important aspect of doing things this way is that a small change through a mutation can sometimes have a major impact on the outcome. For instance, what if one of the first steps was to 'divide 1000 times', but instead something went wrong and it was changed to 'divide 10,000 times.' This may cause the developing organism to simply die, but it may survive and prosper as a much larger individual. Since the rest of the developmental program simply uses the results of the previous steps it may be able to take the 10,000 cells and still grow something that is similar to the original organism, but bigger. However, if we were using a system where every cell was specified exactly this would be a disaster. We would have tons of left over cells and no idea what to do with them. The whole blueprint would definitely be ruined. So the epigenetic method of development is much more adaptable to mutations and change than the other method.
4. Types of Differentiation
There are two basic ways that the different parts of the body are formed. The first is through cellular lineage and the second is through induction. An easy way to remember this is to think of it as the European Vs. American models. In the European model the fate of the cells are determined by who their parents are. When a cell divides some of the proteins will go with one cell, and other proteins will go with the other cell. This causes them to be different, and to take different paths during development. This is asymmetrical cell division. In the American model the fate of a cell is determined by the influence of its neighbors. So one mass of cells can influence or force another set of cells to become something else. This is called induction. Of course, in real organisms these two distinctions are rarely exact. Most organisms use a combination of the two methods. Still, as you look at organisms with increasing complexity you can see that the inductive model is used more and more, and the lineage model is more pervasive in simpler organisms. During the design of the developmental simulator it was decided to rely solely on inductive events to grow the neural networks for several reasons. First, if you use the lineage model it implies that you have to have the creation and destruction of new cells. Simulating these acts would be very costly because now you need to have a dynamic, three-dimensional structure where the shape of the organism must be retained. Each time a cell divides you need to allocate more memory and then deallocate when a cell dies. When the cells divide or die you would need to rearrange the shape of the of the organism and this would be very expensive. And if you are really going to go that far then you would also need some way to model the way the cells can contort their shapes to produce folding and invaginations in sheets of cells. Yes, it is true that there are ways to cheat here and only allow divisions or deaths on the boundary, but you would still be required to keep track of which cells are where in three-dimensions. Secondly, coming up with an easy way to divide the proteins in an uneven manner between the two dividing cells is no simple task. This also would cost precious ticks of the clock during the simulation. Third, I believe that the inductive events alone are capable of generating neural networks with sufficient complexity and flexibility to meet my goals. By using induction alone we are able to create a group of cells when the simulation is first initialized and then simply reset values for those cells for each new chromosome that is being evaluated. This eliminates the need to do frequent memory allocations and deallocations. Allocating memory is horrendously slow and should be avoided whenever possible. We then do some initial setup of the array of cells just like the maternal genes do during OOGenesis. Gradients of transcription factors and ligands are positioned along the various axes to control the development. This then leads to a partitioning of the cells and to differential gene expression. As the cells begin to differentiate, one set of cells induces changes in nearby cells to change their fate. Each of these choices build on each other until you end up with a highly structured set of neurons that are linked together based on the past partitioning.
4. What is being Simulated?
This system does not simulate cell division, death, morphology, or migration. There is a fixed number of cells that are laid out in a rectangular pattern in the simulation, and that same number of cells with that same pattern remains throughout the run. Currently, only cellular differentiation is really simulated. This first step allows the developing organism to form into different segments. Each cell in a segment will have a slightly different gene expression than cells in the other segments. The next step is to add the ability to simulate growth cone formation and axonal guidance. After that the simulation of synapse formation will be added. Then the genes of each new neuron, along with their firing activity, will regulate the connection strengths on the neurons to fine tune the network and allow the new brain to learn. These are the basic steps that will occur in the developmental process, and they are also the different phases of code development that need to be done on this system. The first phase has been completed and is documented in the following pages. This gives the system the ability to form different patterns of gene expression. The following links will lead you through a detailed discussion of how the simulator system works.