### Representational framework

The internal representation framework used in STEM is a graph [4]. This is a powerful mathematical abstraction that has been applied to many different problem domains. STEM uses the graph to represent geographic locations and their relationships. In a graph a *node* represents a geographic location. Depending on the resolution, nodes in a particular graph may represent a country, state, county or city. It is even possible to combine graphs with different resolutions so some regions may be modelled with higher resolution than others. An *edge* in a graph represents a relationship between two geographic locations (Figure 1). There are many kinds of relationships that could exist. These include "physical adjacency", "linked by state highway", or "exchange population via air travel". Each such relationship is characterized by a metric value that represents the "weight" of the relationship, for instance, the average number of people who travel between two locations (nodes) in a day. This weight value can be incorporated into the disease model computations to account for the level of contact between populations in different geographic locations.

This framework negates the need for typical epidemiological modelling assumptions of a "well mixed" uniformly distributed population. This allows for more realistic models to be represented. The framework also allows for geographic representations at different levels of resolution so that models can focus on key areas without excluding the importance of neighbouring regions. The ability to incorporate abstract relationships in the framework between geographic locations further enhances STEM's ability to model complicated scenarios.

### Disease model state representation

STEM represents the state of a disease affecting a population at a geographic location as a *label* on the corresponding node in the representational graph. Typically, the label segments the population into different discrete disease states and records the numbers of population members in each state. An SIR model [5] has states of *susceptible*, *infectious* and *recovered* (Figure 2), an SEIR model includes an *exposed* state and an SI model removes the recovered state. More than one label may be attached to a node so the states of several diseases and several populations can be maintained simultaneously at each geographic location.

In the SIR state model, for example, a population member can be born and is instantly susceptible and so enters the state S, they can then either stay in that state, become infectious and move to state I, or they can die. In state I, a population member can stay in state I, or recover and move to state R, or die. Similarly, in state R, they can stay in R, lose their immunity and move to state S, or die. The transition from R to S will not be present in all models as it represents population members losing their immunity to a disease which will not be the case for all diseases.

### Computations

The disease model computations implemented in the base version of STEM are rate equations that specify the number of population members that enter and leave each of the representational disease states (e.g., S, I or R) for a particular interval of time.

STEM comes with four "built-in" disease models corresponding to the SIR and SEIR disease models and stochastic or deterministic computations (2 × 2). It also includes a unique disease model for multi-serotype diseases such as dengue fever which maintains a combinatorial disease state that tracks the sequential serotype infection and recovery state for all possible sequence of infections [6].

### Simulation

The simulation of the progression of a disease is implemented in STEM as an iterative process that visits each node in the graph and computes a new value for each disease state label for each population at that node. The new values are computed solely from the current state of the simulation, which consists of the current simulated time and the current values of the labels on the nodes in the graph. The new values are saved as they are computed and only when all nodes have been so processed at the end of the cycle do the new labels replace the previous labels. Simulation time is then advanced by the time delta of the simulation (typically a day, but this is configurable). The process repeats until stopped by the user. As each cycle is processed, history data is collected for the disease states so that the progress of the disease can be reviewed or analyzed later.

### Implementation and extensibility

STEM is written in the Java™ programming language and runs on most platforms that support Java™. The architecture is the system is designed to be extended by the addition of new disease models. This involves the writing of software modules in the Java™ programming language. The current base distribution of STEM comes complete with example source code for all its disease models. These models can be augmented by user supplied implementations of additional disease models. A direct approach to doing this is to copy the sample code and modify a renamed version. This allows for simple changes to the base calculations to be developed quickly and incorporated for general use. More extensive changes require deeper understanding of how to manipulate the representational framework (documented in the distribution), but allowing such changes was one of the design requirements for the system so this type of extension is deliberately exposed to other researchers.

In a future version of STEM, the code base will be retargeted to the eclipse open source tool framework [7]. This framework provides a formal "plug-in" software architecture that will be leveraged by STEM to provide a more standardized approached to its extension. This will allow not just disease models to be extended, but also features such as new and different types of graphic displays and other interface items as well as the modelling of more and different relationships between locations.