Saturday, January 31, 2015

Re: Callahan on Axtell: Artifact Origins and Increase

Last month, Gene Callahan brought to my attention an error that is present in Robert Axtell’s paper “Why Agents?” Before I respond to his post, I encourage readers to familiarize themselves with not only this paper, but Axtell’s work and other research representative of the growing agent-based paradigm. While I don’t expect that equilibrium models that employ systems of linear equations will disappear any time soon, agent-based and complexity economics are here to stay. The paradigm will only grow more influential over the next decade or two.

The error itself exists in what appears to be a tangential discussion within Axtell’s paper. He identifies “bugs” that arise in the program as “artifiacts”. These artifacts produce output that typically produce results that are not robust to small parameter changes. Axtell writes,

This architecture, in which very little source code effectively controls a much larger amount of execution code, is the basis for the highly scalable nature of agent-based models. The number of agents or commodities, for instance, are conveniently specified as user-defined constants in the source code or read in from the command line, and thus the scale of the model can be specified at compile or run-time. Typically, no significant rewriting of the code is needed in order to change the scale of the model.

It is also the case that the “small source, large execution code” character of agent computing is partially responsible for the production of artifacts, an important class of systematic problems that can arise in agent models, as alluded to above. When a small amount of code — say a single commodity exchange rule, for example — controls so much other code, then it will sometimes be the case that an idiosyncrasy in the rule code will produce output that one might erroneously take to be a significant result of the model. A common route to phenomena of this type occurs when the agent interaction methods impose some spurious correlation structure on the overall population — for instance, agents are interacting with their neighbors more than with the population overall in what is supposed to be a “soup” or "mean field" interaction model — then an ostensibly systematic result — large price variance, say — is clearly artifactual.8 There is no real solution to this problem, aside from careful programming. One can, however, look for the existence of such artifacts by making many distinct realizations of an agent model, perturbing parameters and rules. When small perturbations in the code produce large changes in the model output, then artifacts may be present. Sometimes, large changes in output are realistic and not signatures of artifacts. For example, imagine that a small change to a threshold parameter makes an agent die earlier than it otherwise would, and therefore induces at first a small change in agent exchange histories (i.e., who trades with who), that over time is magnified into a wholesale change in the networks of agent exchange. Perhaps this is not unrealistic. But when such large scale changes have origins that are unrealistic empirically, then one should be instantly on guard for undiscovered flaws in the source code.

Gene Callahan identified as a causal error in his post,

Rob Axtell, in his 2000 paper "Why agents? On the Varied Motivations for Agent Computing in the Social Sciences," attributes the existence of what he calls "artifacts" (program behavior that is not a part of the model being created, but a byproduct of a coding decision which was intended only to implement the model, but actually did something else as well) "partially" to the fact that, in agent models, a small amount of source code controls a large amount of "execution" code. As an example, he offers a system where millions of agents may be created and which might occupy up to a gigabyte of memory, even though the source code for the program is only hundreds of lines long.

But this explanation cannot be right, because the causal factor he is talking about does not exist. In any reasonable programming language, only the data for each object will be copied as you create multiple instances of a class. The functions in the agent-object are not copied around again and again: they sit in one place where each agent "knows" how to get to them. What causes the huge expansion in memory usage from the program as it sits on disk to the program running in RAM is the large amount of data involved with these millions of agents: each one has to maintain its own state: its goal, its resources, its age: whatever is relevant to the model being executed.

So what we really have is a small amount of code controlling a large amount of data. But that situation exists in all sorts of conventional data-processing applications: A program to email a special promotional offer to everyone in a customer database who has purchased over four items in the last year may control gigabytes of data while consisting of only a few lines of source code. So this fact cannot be the source of any additional frequency of artifacts in agent-based models.

The code itself is not copied. Rather, the same code is used to instantiate some number of agents. Each of these agents has objects of its own. These objects, as opposed to the code itself, occupy memory. So the error here is mostly identified by Gene, but it can be investigated further.

What we have is an increase in the number of interactions which occurs as a result of having numerous agents, as compared to a single program. It just so happens that the objects that are instantiated along with the agent take up space. However, even if they took up no space, the problem of "artifacts" / bugs would exist. The more agents there are, and the more they interact, the more opportunities there are for bugs to arise. The problem is a combinatorial one. All else equal, the increase in the number of agents and their objects grow the behavior space of an ABM program at an increasing rate.

To demonstrate, consider a simple combinatorial problem where we want to find the number of possible permutations (ordering matters). How many ways can a line of 5 people be arranged? The answer: they can be arranged in 5! possible states, 5 * 4 * 3 * 2 * 1 = 120. The problem is simple to think through. If agent 1 stands at the front of the line, the other four agents can be arranged in 4! (= 24) different states. Repeat this process with agents 2 through 5. I could continue to break the problem down iteratively, start with agent 1 in front and 2 second in line. The others could reorganize in 3! (= 6) different states. Replace agent 2 with agents 3, 4, and 5 and now we have 4! different states with agent 1 in front. Replace agent 1 with agents 2, 3, 4, and 5 and, voila, we’ve arrived back at my earliest formulation of the answer. Increase the number of agents to 6, and now the number of possible permutations grows from 120 to 720. Increase to 7 and now there are 5760 possible permutations.

Now imagine growing the number of agents to something in the range of 100,000 or 1,000,000 agents. Let’s say that each of these agents own 10 different objects. If all of these objects can potentially interact with one another due as part of pairwise agent interactions, then the model represents a behavior space that is vast. Within that behavior space is likely a large number of “artifacts” that cannot be detected until runtime.

Axtell is correct that “artifacts” might arise due to emergent patterns not accounted for by the programmer. He mentions one example:
A common route to phenomena of this type occurs when the agent interaction methods impose some spurious correlation structure on the overall population — for instance, agents are interacting with their neighbors more than with the population overall in what is supposed to be a “soup” or "mean field" interaction model — then an ostensibly systematic result — large price variance, say — is clearly artifactual.

Also note that in the first quite, Axtell does recognize that "A common route to phenomena of this type occurs when the agent interaction methods impose some spurious correlation structure on the overall population." He intuitively, even if not explicitly, sees that the problem is combinatorial. 

Axtell is pointing to an effect of a bug in the code. This effect – in this case, a higher than expected rate of local interactions - is what skews the data and may make a spurious result appear to be significant. However, this does not capture the full extent of the problem. The combinatorial growth of interacting agents and the objects associated with them implies that the behavior space is growing at an increasing rate. With every increase in memory, the rate of growth of the behavior space is increasing! But in this analysis, memory is only a stand-in for the objects that occupy it. What Axtell appears to be calling execution code are the objects instantiated by that code. The amount of memory occupied by these objects - as opposed to the growth rate - has no bearing on the combinatorial problem. It is the number of objects that is of interest. These objects, whose content are transformed by the processes associated with them, promote massive growth of the behavior space as population increases. It is the job of the scientist to identify and categorize different types of states and progressions. One of those types is the “artifactual.” 

No comments:

Post a Comment