Iterative Learning for Rigid-Rule Actors

I’ve gone through the first iteration of my computational intelligence simulation system, and what I got was bunch of dumb actors moving around a giant grid and licking each other.

Let me explain.

Inspiration

I’ve had this idea for a while for a very simple learning and cognition simulation experiment that involves a set of very simple actors navigating an environment space. These actors would come into contact with specific elements of the evironment–some that impede their movement, some that they can eat, some that try to kill them–and then are able to develop internal rules for governing their actions that can then be taught to other agents.

This idea spawned a plethora of other ideas, and ignited a flame that is now the early stages of the Multi-Agent Systems that Simulate Intelligence and Cognition (MASSIC) Project.

I was thinking about a chess board when I created a 25x25 grid (the “environment”) in which seven Simple Ruleset-Governed Agents (SRGAs) were spawned at random points and set in motion. In this first iteration of my simulation, I have programmed them to be extremely naive; they have a simple set of rules that they follow based on the interactions they have with their environment, and based on those interactions, they “learn” how to interact with their environment in the future.

After watching little graphics jump around a tilemap I decided that the best way to simulate this is going to be in real-time, so I scrapped the rigid grid idea and opted for the PhaserIO JavaScript game library and a 800x800 canvas. If we consider that a pitri dish might be a controlled experiment environment, the canvas presented with the Phaser library can be considered a pitri dish for this simulation experiment.

Methods & Experimental Design

Iteration #1: Everyone’s licking each other

The first iteration of Massic (my pre-massic period) was nothing more than a series of numbers and letters across a two-dimensional array. I had characters representing actions, and numbers representing environmental variables. The experiment started out like this:

environment = [
    [0,0,0,1,0,0,0,0,0,0],
    [0,0,0,1,0,0,0,0,B,0],
    [0,0,0,1,0,0,0,0,0,2],
    [A,0,0,1,0,0,0,0,0,0],
    [0,0,0,1,0,0,0,0,0,0],
    [0,2,0,1,0,0,0,0,0,0],
    [0,0,0,1,0,0,0,0,0,0],
    [0,A,0,1,0,0,0,2,0,0],
    [0,0,0,1,0,0,B,0,0,0],
    [0,0,2,1,0,0,0,0,0,B]
]

What you’re looking at is essentially a pitri dish, where:

  • A is an organism of species A
  • B is an organism of species B
  • 0 is a blank, occupiable tile
  • 1 is an impassable “rock” tile
  • 2 is a “food” tile (consumable, goal-defined)

The simulation has cycles (paralleling CPU cycles) in which each organism does the following:

  1. Check adjacent tiles. If anything other than a 0 is found, lick it to see if it’s consumable (e.g., a 2). If it is, consume it then sleep. If it’s not consumable, *remember what was licked (e.g., the character or number that classifies the tile), and move to #2.
  2. Check adjacent tiles for a 0. If found, move there. Sleep. If not found, got to #1.

As you can see, the rules governing this simulation are rigid and fairly short, similar to John Conway’s (1970) Game of Life. This is meant to serve as a foundational component to replicating, to a small extent, the inspiration behind Conway’s simulation. Conway sought to replicate a celular automata problem from the 1940s, and ended up modelling what looked like organization and life out of a chaotic system governed by a small set of rules. If we consider that math and physics could be the “rules” governing our own existence, perhaps a simulation like the Game of Life models our own reality–if we suppose our own reality is a simulation.

The four rules above yielded what I can only consider to be a cellular orgy: all the organisms ended up bundled together and constantly licking one another, since there was nothing preventing them from doing so–and they really weren’t programed to do anything else.

Here’s what one of the final cycles looked like:

environment = [
    [0,0,0,1,0,0,0,0,0,0],
    [0,0,0,1,0,0,0,0,0,0],
    [0,0,0,1,0,0,0,0,0,0],
    [0,0,0,1,0,0,0,0,0,0],
    [0,0,0,1,0,0,0,0,0,0],
    [0,0,0,1,0,0,0,0,0,0],
    [A,0,0,1,0,0,0,0,0,0],
    [A,0,0,1,0,0,0,0,0,0],
    [0,0,0,1,0,0,0,0,B,0],
    [0,0,0,1,0,0,0,B,B,0]
]

Originally I set out to determine the learned differences between SRGAs at different future iterations cycles, so without a programmatic way for the organisms to communicate with one another the simulation sort of fails.

What I was missing was a way to 1) replicate the organisms via reproduction1, 2) ill off organisms that could not find food, 3) regulate the population in general, and 4) dynamically build internal rules based on how the organisms communicate.

So restated: The purpose of this experiment was to determine the learned differences between SRGAs at different future iterations cycles.

For the next iteration, each organism is modeled via a node in a linked list that will be traversed via an independent pitri dish update function. The function will run each node through a series of checks, ultimately resulting in the node falling asleep (sleep, in this case, represents the end of a node’s current “turn”). The following cycle will be considered the “lifecycle” of the program (illustrated in pseudocode with ternary operators):

for each node in the pitri dish
    ( surrounding area has food ? Go there, eat, sleep : continue )
    ( surrounding area has organism ? 
        age < 50 || already reproduced || organisms are not suitable reproductive partners { 
            ( organism is same species ? share knowledge, continue : eat organism, sleep ) : continue )
        } else {
            ( organism is same species ? reproduce, sleep : eat organism, sleep ) : continue )
        }
    ( surrounding area has empty space ? move there, sleep : sleep )

A few things to consider, here:

  • An organism that is adjacent to another organism of the same species will share knowledge with that organism. In other words, each possible set of training data that was generated for will be now consumed by and vice versa. So if had learned that rocks are impassable and inedible, and had learned that a specific plant type makes you sick, each organism will now, theoretically, have the same neural networks since they shared the full set of each others’ training data.

  • An organism above 50 cycles in age will attempt to reproduce before sharing knowledge. This rule will result in a very opinionated simulation, but I think it’s important to consider how a simulated coming-of-age can affect the way organisms interact in an environment.

  • There exists the potential for many configuration parameters available to someone running this simulation, and I can see an apparent need for a set of dynamic configurations that we would use to populate global environment variables before the simulation starts. At present, I can see: health and age required for reproduction; health required to consume another organism (if we assume that an organism of species A would not be able to eat an organism of species B if it has less health than it).

Iteration #2: Dynamic learning

This next iteration will incorporate fully-autonomous agents governed by a turn-based lifecycle controlled by our pitri dish controller. Each agent will seek food to grow, and upon reaching age 50 (that’s 50 cpu cycles), will look for an organism of the same species and opposite sex to reproduce with. For the sake of simplicity, it will be assumed that two organisms of opposite sex each with an age > 50 will be suitable reproductive partners.

The following research questions drive the experiment:

  1. After some time, is there a significant difference between the rulesets created by each agent given the same environment but not necessarily the same interactions with other agents?

  2. How many actors are alive after 50 cycles? 100 cycles?

Within this simulation, these actors can only do one of the following during any given “cycle” (which we can think of as a ‘turn’ in a turn-based game):

  • Move to an adjacent unoccupied square
  • Interact with an adjacent occupied square

If the actor interacts with a nearby square, it will be faced with one of three possibilities:

  • A ROCK, which is immovable and worthless.
  • A PLANT, which is edible and can be consumed for food.
  • Another actor, which is the same as a rock.

In the first iteration of the system, I wanted 33% of the surface area to have random ROCK tiles, but now I’m thinking we will have a set of experiements with different numbers of rocks to see how the agents develop differently.

Discussion of Results

Aug-15: Experiments are underway. Stay tuned!

Scope issues with development

I wanted to talk about the issue of “scope” before anything else. A project like this will have an enormous amount of scope creep if a clear boundary isn’t set for each iteration–and by boundary I mean specific rules for the system that are resolute (that can then insipire dynamically created rules on a per organism basis). This means that DRY programming (or “Don’t Repeat Yourself”) may not be possible, as I can see several ways in which I will have to rewrite core code to account for the future dynamicism that Massic will no doubt employ.

Future Direction

This iteration of Massic is super naive, and doesn’t include anything like a Belief-Desire-Intention (BDI) model. While this iteration does serve as an impetus to drive question-asking that inspires future research, its lack of any true decision-making computations (outside of a rigid ruleset) only highlight how even modelling simple systems is enormously complex.

Sharing Data

When two organisms came into contact with each other, if there was no food around, they would share everything they know with the other organism by way of each organism training their internet NN with the other organism’s inputs. In the future, we may look at only sharing a set of training input data, perhaps based on some condition. For example, if is hungry, shared data from may be restricted to only a set of input data relating to food in the environment.

Implement a quorum “cortex” for multiple neural network classifiers

My original notes had a design for multiple neural networks that each governed the simulation of the delivery or suppression of different neurochemicals. For example, an oxytocin NN and a cortisol NN would each operate independently given the same sensory inputs provided to the system, each to report to a centralized quorum NN that would make the final decision. This way, weighted calculations for each NN would come into play–and possibly even have simulated phisiological implications–but the ultimate decision would like in the quorum (or “cortex”) NN, itself charged with carrying out some “meaning of life” goals.

  1. I did not want the organisms to be able to reproduce at will without consuming food, since the idea here is that an organism needs to make the choice between spending energy to reproduce and spending energy to find food.