# An Exploration of Modeling Techniques for the Study of the Dynamics of E-Mail Viruses

1-2011

Thesis

## Degree Name

Master of Science (MS)

## College/School

College of Science and Mathematics

## Department/Program

Mathematical Sciences

Lora Billings

John Stevens

Eric Forgoston

## Abstract

We analyze real data sets from two e-mail viruses, the Magistr.b and the Sircam.a to explore how we can use mathematical models to predict the behavior described by the data. Analysis of the data is conducted primarily with computer programming in MatLab. We focus mainly on the use of two continuous models commonly used in the study of biological diseases, the SIS and the SIR models. A discrete modeling approach using agent-based simulations is also explored and revealed to be potentially useful in developing a compartmentalized model that incorporates both SIS and SIR model behavior. The theory behind the continuous models and the programming method for the simulations are described.

The factors that affect the spread of biological infections, such as exposure rates and recovery rates, are factors with similar driving force in cyberspace. The parameters that govern the movement of these computer viruses through the susceptible population of computers on the internet are identified as the contact rate, B, and recovery rate, y. We use the real data to estimate the values for these parameters and use these values in our models to find the one that best matches the behavior described by the data.

We approximate values for B using a standard method and find that B must be very small to account for an almost linear growth in the infection early on. The recovery rate, y, is found by taking the reciprocal of the average duration of infection. Unlike biological diseases which take their course in a host for set period of time, these e-mail viruses show durations of infection that vary widely. Using a mean duration of infection calculated from the data, the SIS model reaches a non-trivial endemic state. However, such an endemic state is not supported by the data.

Closer analysis of both data sets reveals that the durations of infection for infected computers actually decreased over time. By applying a time-dependent y(r), we are able to modify the behavior of the SIS model. We are able to approximate the shape of the latter half of the time series. In a similar fashion, we apply a range of linear functions for y(t) to the SIR model. Using very small B, we can approximate the shape of the first half of the time series.

We find that the introduction of a variable y modifies the behavior of both models in such a way that it remains unclear which model best reflects the behavior of our viruses. Only qualitative fits were achieved with the Magistr.b virus and both the SIS and SIR models. It is possible that a precise match to either of the continuous models could not be achieved because the dynamics of these viruses involve both SIS and SIR behavior. That is, some of the infected computers become completely disabled by the infection and thereby enter the Removed class of an SIR model, while others are repaired and enter the Susceptible class of an SIS model. Computers with longer durations which have significant lags in time between detections suggest the possibility of reinfection consistent with the SIS model. The development of a compartmentalized model using discrete agent-based simulations may provide us with a better fit to the data and is described as a future direction for the work put forth in this paper.

The results of this project demonstrate that, even without achieving a precise match to a model, we are able to reveal the existence of a time-dependent y(t) . We show that by decreasing the recovery time for infected computers, i.e. by increasing y(t), we can drastically reduce both the magnitude of an outbreak and the time it takes for the population to reach a disease-free equilibrium.

PDF

COinS