SOCSIM Documentation


Introduction

This section is intended to serve as a technical reference for those who wish to modify SOCSIM's source code, or for those who wish to understand SOCSIM's technical idiosyncrasies in order to better criticize it. We recommend that non-programmers read this section because it is possible to winnow the logic from what to non-programmers may seem only chaff, and closer understanding of that logic will help non-programmers understand what may be possible to accomplish, so that they can give more explicit, or at least technically sensible, directions to programmers.

As SOCSIM's purpose is far too general to be accomplished by a neat off-the-shelf user-friendly program, SOCSIM is written to be modified. The code itself contains abundant comments alerting the programmer to various subtleties and idiosyncrasies, and to the extent possible, every distinct function is carried out in a distinct subroutine. Input information is based on keywords--new kinds of rates and variables can be defined in a few lines (simply by mimicry, in many cases) and added to the larger program. SOCSIM is written in C; it is hoped that this choice will both facilitate modification (because C is a relatively standardized language to program in); and guard against errors (because of C's extensive type checking features). The current version runs under the UNIX operating system on SUN workstations.

The first part of this section, Algorithms, deals with the basic algorithms. Reference will be made to specific subroutines and variables, but only to identify which algorithmic functions are carried out in which subroutines. The actual operation of the subroutines will be handled in Organization. Data Structures will describe the data structures used in SOCSIM. Consequently, although some understanding of Organization and Data Structures may be useful, Algorithms is intended to be all that most users will ever need.

Conventions

The various structures referred to in this section will be identified to the extent possible by the font in which they appear. Names of functions and procedures in SOCSIM will appear in bold face as in function_name. Mathematical constants, e.g., 0, in SOCSIM will be also appear in bold face. Names of variables and fields internal to more complex structure will appear in keyboard font ego->last_marriage, as will any lines of code that actually appear in SOCSIM. Arrays subscripts will be identified by square brackets [i]; when a particular subscript is obvious from context or unimportant, array_name[.] will be used to improve readability. Variable names used for examples but which are not actually part of SOCSIM will appear in the keyboard font, too. To refer to fields of records which are of type pointer, we will use the notation that C uses, that is, if spouse is a variable defined to be of type struct person *spouse, spouse->father denotes the field of the person record which points to the person structure corresponding to spouse's father.

Structures

C is rich in possible data structures. In large measure it is this richness that motivated our selection of C for this generation of SOCSIM. Among the most useful of C's data structures is the "pointer." This type is the fundamental building block of all linked lists and it enables C programs to accomplish dynamic memory allocation. Both of these features are widely used in SOCSIM, so a basic understanding of how C pointers work is indispensable in deciphering what follows.

Each element in a linked list is composed of a record which contains both data and the location of the next element. For example. the population list in SOCSIM is composed of elements (hereafter, person structures) each of which refers to an individual. In earlier versions of SOCSIM, a person record was analogous to row i of a large matrix. Each column j of the matrix held some particular piece of information, and entry [i,j] was the information about the i'th individual. In the current version, no such population matrix exists. Instead, when an individual is born, a person structure is created and inserted in several linked lists. Each person record includes several data fields which contain such information as date of birth and sex as well as several fields which point to other person structures. These fields are C pointers; they are what linked lists are composed of.

Pointers to Structures

C pointers can be difficult to understand because they "point" to locations that contain data rather than contain data themselves. A variable, ego, that's declared as type

struct person *ego

actually contains the absolute address in memory of a particular individual's person data structure rather than the information about the individual. For this reason, it is meaningless to inspect the contents of such variables. When ego (a variable of type struct person *) "points" to person number 456, the data referring to person number 456 is accessible and changeable via ego. Person number 456's date of birth can be accessed as ego->birthdate. But suppose we also set the variable alter_ego to point to person number 456. The statement

alter_ego = ego

will do it. Now if we were to change some of person number 456's data fields via alter_ego, e.g.,

alter_ego->birthdate = 999,

the changes will be visible when we inspect ego. How are these addresses obtained? If a data object is assigned to a particular structure, then the "referencing operation" will get its address. Now suppose super_ego is a particular person, declared as a structure:

struct person super_ego

its relevant information stored in a known-size block of contiguous memory. and &super_ego is the address of its starting location, (these are only necessary when programming in C).

Individual fields are accessed differently in the structure and pointer-to-structure cases. If ego is declared using struct person ego the individual components are noted using the . character; ego.last_marriage is ego's last marriage. If the declaration is struct person *ego the component is "pointed to" using and arrow: ->, e.g., ego->last_marriage

Links and Linked Lists

SOCSIM is full of linked lists. The calendar of future scheduled events, the list of people searching for spouses, and all of the kinship relations among individuals are represented as linked lists. Linked lists are more flexible and use less memory (and require no adjustment on part of the user) than arrays of predetermined size, but they are a little more complicated to change and maintain (but that's all handled by SOCSIM). In a linked list at least one field of the basic data element is of type pointer. For example each struct person variable has a field called "father" which points to the individual's father's person data structure. Similarly, ego->father->father points to ego's grandfather's person structure. Note that ego is the name of a variable and that father is the name of a defined field in the person data structure. When ego has a child, the father field of the child's person data structure will point to ego's person data structure. And right after the birth of this child, the lborn field in ego's person data structure will point to this child. This is the simplest example of the linking used in SOCSIM.

Most of SOCSIM's linked lists are ordered (generally by age) and circular. This always makes insertion and deletion more difficult, but it makes locating and grouping individuals much easier.

Algorithms

Event Competition

At the start of every month in the simulation, every living person has exactly one event scheduled for some future date. In the course of this month, events occur to individuals, their spouses, and marriages, and new individuals are born. A new next event is necessary for everyone still living after an event and affected by it; it's selected by an "event competition." This event competition is also held once for each living person at the beginning of each simulation segment (that is, every time the demographic rates or societal constants change).

Each event for which the individual is at risk (e.g., men rarely give birth) can be modeled as a piecewise exponential distribution. A random number is used to generate a waiting time until this event occurs (which is bounded by the individual's maximum possible age at death). The individual's next event is the one with the shortest waiting time. Since there is a maximum lifetime, everyone can have death as their default next event.

The procedure date_and_event manages the event competition by repeatedly calling the function datev to generate a potential waiting time for each event for which the individual is currently at risk. The rates tables used can be based on the individual's age--for age-specific rates--or the amount of time since another particular event (duration-specific rates).

Before passing the individual's "age" and the vector of rates for each event to datev, date_and_event calls the function modify_rates. This function is designed to give the programmer/user an opportunity to change the rates in systematic ways before each potential event waiting time is calculated. Using modify_rates, the user can implement features in SOCSIM which are analogous to those described below. The values to use in the modification can be single values that vary by individual (e.g., heterogeneous fertility multipliers), or single or age-specific values that affect all individual rates for a given events (when birth rates are modified to achieve a target number of births in the course of the simulation interval).

Although the waiting time distributions for all events are specified as piecewise exponential, each type of event has its own idiosyncrasies which complicate waiting time generation. The modifications to the basic waiting-time generation algorithm for each type of event will be described below.

Generating Potential Waiting Times

The waiting time algorithm is conceptually equivalent to drawing a random number u, from a uniform (0,1) distribution, calling u the probability that the event will not yet have occurred, then finding the first month by which the probability of non-occurrence is less than or equal to u. The probability that an event will not have occurred by a particular month T is given by the expression

add some equations here

Where Pt is the probability of the event's occurrence in period t conditioned on it not having occurred at any time before t. Since (1-Pt) is always between 0 and 1, the expression given above is nonincreasing in T. Consequently, beginning with t = 0 we can successively multiply the (1-Pt) terms together until the value of the product falls below u. What datev does is mathematically equivalent to this procedure, however, datev takes advantage of fact that the probabilities can be the same over months or years and works with powers of (1-Pt).

The first substantive operation in datev is drawing a uniform (0,1) random number, u, accomplished by a call to rrand. This operation is repeated until a number that is not identically 0 is obtained (as 0 is in the range of the generator used). This value is logged and assigned to the variable logu. The variable waiting_time is then set to 0, and each age category is successively examined in order to find the lowest age category during which the probability of non-ocurrence is less than or equal to u.

This is done, starting with the lowest age category, by dividing logu by ln(1-P) where P is the monthly probability of occurrence of the event in question. If this quotient exceeds the number of months in the age category, then logu is reduced by

ln(1-P)*(the number of months in the age category)

and waiting_time is incremented by the number of months in the age category. The next older age category is then examined in the same manner until one is found for which

ln(u)/ln(1-P) < (the number of months in the age category)

When such an age category is found, waiting_time, which must be an integer, is incremented by (int) ln(u)/ln(1-P). (the parenthetical "int" truncates a real value to the least integer not greater than the number).

The justification for this procedure is as follows. For a given period, during which the probability of the event occurring in each month is P, the probability that a particular event will not occur in k months is (1-P)k. Taking the log of both sides and dividing by ln(1-P) gives

see diagram

thus k is the number of months until the probability of the event's non-occurrence reaches u.

If no age category has proability 1 of event occurrence, then it is possible that datev will not find an age category for which


ln(u)/ln(1-P) < (the number of months in the age category).
This is likely to be the case with such events as marriage and divorce. In this situation, datev returns the number of months which would make the person 100 years old (the maximum age attainable). This will, in most cases, ensure that the particular event will not be the one chosen as the individual's next event.

Waiting time until Death

Death is the simplest of SOCSIM's events. Everyone is always at risk, so whenever an event competition is held, a waiting time until death is always generated. In the current version, there is no built in heterogeneity of death rates. To include such heterogeneity would be a simple matter, following the algorithm described below for imposing heterogenaiety of fertility.

Waiting Time until Birth

Generating waiting times until a woman gives birth is complicated by the heterogeneous fertility option, the birth spacing option, and by parity-specific fertility rates. These complications are all handled in the function birth_datev, which is called by date_and_event, and which in turn calls datev. The rates can also be affected by mechanisms which attempt to achieve a target number of births in a segment. (These methods are a bit more successful at reducing the number, as arbitrary increase is moderated by the birth interval requirement).

Parity Specific Fertility

The simplest of the above complications is SOCSIM's handling of multiple, parity-specific fertility rates. If parity-specific fertility rates are available, the user may specify them in the rate file (as described in section on birth rate format). Parity-specific rates are not, however, required to run SOCSIM. But SOCSIM will keep track of parity in all cases--this is useful when alternating segments have rates with and without parity specificity. The system adjusts automatically to the number of levels available. When specified the parities must begin at 0 and increase. For example, if 4 sets of rates are given it would imply that the probability of giving birth -- for a woman of a particular age and marital status -- had 4 distinct levels: one associated with each of parities 0, 1, and 2, and a final level, parity 3, to be used for parities 3 and above. The rates may be specified in any order, and there is a fairly high maximum, 10, for the number of different supported levels.

In practice the parity is used as an index to an array containing pointers to the proper rate set. This index is the lesser of the maximum supported parity and the individual's actual parity, but the distinctions need not matter (in practice, all pointers may even point to a single parity-0 rate set that will be used for all parities). In short, the calculation is handled transparently. The order of default used for rates left unspecified means that any group whose rates default to a lower group with multiple levels will in turn have multiple levels (default to rates for a lower group has higher precedence than default to rates for a lower parity, and the group rates won't all default to parity 0 for that group and then to the lesser group parity 0). For example, suppose a simulation has 3 groups and rates for married women from group 1, parities 0, 1, and 2 and group 3, parity 0 are specified. Parity 0 individuals from group 3 will use the parity 0 rates from group 3, but parity 3 in group 3 defaults to parity 3 for group 2, which defaults to parity 3 for group 1, which defaults to the specified parity 2 for group 1.

Heterogeneous Fertility

The heterogeneous fertility option is equally simple conceptually though slightly more complicated to implement. Its purpose is to increase the variance in sibling set sizes without affecting the aggregate fertility of the population. When the heterogeneous fertility option is activated, a function is associated with each female in the population. When birth_datev is called, it checks the global variable (reset each segment) hetfert; if its value is TRUE (i.e., 1), then all the the hazard rates of giving birth in each month is modified by a "fertility multiplier function" before being passed to datev. In practice, this simply involves multiplying the rates by the female's ego->fmult value, though more complicated approaches are possible.

In the current implementation, individual fertility multipliers, fmult's, are pseudorandomly distributed as a cubic approximation to the beta distribution with mean 1.0, variance 0.416, and a range of 0 to 2.4. The application of fmult to the vector of age-specific hazard rates is carried out by the more general procedure rate_multiplier. Currently, the user may elect to have SOCSIM generate new fertility multipliers or use those which are stored in the starting population file. The function which interprets (and actually produces the value that is passed to rate_multiplier can also be modified by the user for different parts of the population. New fmult values are assigned to daughters born during the simulation. It's also possible to specify an inherited component using the alpha and beta optional variables.

Using the beta distribution, fmult will have a mean value of 1.0. Consequently, E(fmult*h)=h where h is the original hazard rate or probability of giving birth in a particular month. In this way, each woman's fertility can be significantly altered by application of fmult, yet the overall fertility level of the population is not altered.

It should be noted also, that fmult is generally assigned once only to each woman, and its effect stays with her throughout her life. A woman with a high fertility multiplier will have a proportionally higher risk of giving birth at all ages and marital statuses than a similar woman with a lower fertility multiplier.

Birth Spacing

SOCSIM allows the user to specify both the fertility rate (average number of births per woman surviving an arbitrarily defined age category) and the minimum duration between successive births. If no attention were paid to interbirth waiting times, then in SOCSIM's waiting time algorithm would assign births too close together in time, resulting in unrealistic age distributions among siblings.

In order to impose a constraint on interbirth waiting times -- while still achieving the overall fertility levels specified by the user, it's necessary to do some additional modification to the waiting time algorithm. SOCSIM approaches this problem by viewing births as a single server queue with loss. A real-world analogy to such a process would be a machine, such as a Geiger counter, that counts particles which arrive as a Poisson process. When a particle arrives, the machine registers it and then goes into a "busy" state for a fixed (or possibly random with finite mean) period of time. Particles that arrive while the machine is busy are not seen. Such a machine will systematically undercount arriving particles.

To continue the analogy, a birth in SOCSIM can be thought of as an arriving particle which is counted only if the duration since the mother's previous birth exceeds the minimum birth interval, otherwise, it is ignored. Without the modification outlined below, it is clear that counted births will be lower than the user specified fertility rates imply, as some births are simply ignored. In order to bring counted births into agreement with the specified rates, the rates which SOCSIM actually uses to generate the waiting times must be increased.

The input rates must be modified by an amount determined determined by the Renewal-Reward Theorem (Wolfe, 1988). The Renewal-Reward Theorem states that for an alternating renewal process, such as this one, the long-run average time that the machine is "not busy" is equal to the expected interval between events divided by the expected duration of one complete cycle. A cycle here is defined as the period of time between counted arrivals. In our case, a cycle begins (and ends) with the first birth after the completion of the minimum birth interval. More concisely

add some equations here

where

F = fraction of time the the system is not "busy"

and

bint = the specified minimum birth interval

By further assuming that the system is in long-run equilibrium, we can use the resulting fact that poisson arrivals are uniformly distributed over any finite interval together with F calculated above, to adjust fertility rates. The long-run equilibrium assumption is appropriate where the beginning of the process is extremely remote in time. Clearly, this is not the case in SOCSIM, however, by imposing a half of a normal birth interval starting at the time of marriage, the process appears to behave in the same way.

add some equations here

Because births (both observed and unobserved) are assumed to arrive with equal likelihood at any time, the expected number of births that count (are observed) is equal to

F * E(number of births (observed and unobserved)

occurring during the time period.

Letting beta equal the user specified fertility rate -- that is the average number of births per woman surviving the particular age category, and lambda equal the fertility rate that SOCSIM will actually use (both are poisson arrival rates defined over the same time period), we have the following:

add some equations here

Using this formula, SOCSIM converts the rates beta which are read directly from the rate file into rates lambda which SOCSIM uses to generate waiting times. This operation is performed each time a waiting time until the next birth is generated in an event competition for an individual. It is necessary to do this repeatedly, because when the heterogeneous fertility option is activated, each woman effectively has a different fertility rate vector.

Rather than actually discarding waiting times which do not exceed the specified minimum birth interval, SOCSIM simply adds the remaining portion of the minimum birth interval to each birth waiting time that it generates. This is distributionally equivalent to generating and discarding waiting times until one that exceeds the minimum is drawn.

Target Numbers of Births

Birth rates almost always need minor adjustment. When historical rates are used it is actually known how many births occurred during the time interval corresponding to the simulation interval. In other cases, an synthetic population can be run under particular constraints. SOCSIM can make an instantaneous approximation to the number of births expected during the interval by looking at the age distribution and expected number of births and doing a tally, then multiplying by the length of the simulation. (It's obvious that the this is most accurate for short intervals). All rates can then be scaled by the single value that corresponds to the ratio, though, in practice, the results work best when the effect is to reduce the number of births.

Waiting Time Until Marriage

Marriage is a simple event to schedule, but a complicated event to execute. Because of the inescapable need for two spouses, it is not possible to guarantee that actual realized nuptiality rates will closely parallel the user-specified rates. All that SOCSIM can do is schedule the beginning of a marriage search, which depending on the supply of eligible spouses. can be either quick or slow. What happens to people, once their marriage search commences is described below.

Waiting Time Until Divorce

Divorce is an event which can properly be thought of as occurring to marriages rather than to individuals or as a duration-specific event based on the time since marriage. In generating waiting times for divorce, SOCSIM calculates the age of the marriage and passes this value to datev in a manner analogous to passing a person's age for age-specific events. And datev uses this in place of the age of the individual. Although it will accept gender specific divorce rates, SOCSIM expects to find identical divorce rates for both sexes in the rate file. When cohabitation is possible, SOCSIM expects dissolution rates (also called divorce rates, here) for each sex, too. In a standard case when rates that refer to marriages are all that are available, the user divides the rates in half and supplies the same set to SOCSIM for both males and females. Thus, divorce may be triggered by either spouse drawing a divorce from the event competition. The thing to remember when simulating divorce events is that the age of the spouses is of no consequence, the rate file uses age of the marriage.

In the case where there is no divorce, SOCSIM will accept rates that are identically 0 for marriages of all durations.

Divorce rates must be specified, even if identically 0, as SOCSIM will look for a set of divorce rates for each sex in the process of the completeness check.

Maintaining the Calendar of Future Events

At every point in time, every living individual has exactly one next event scheduled for up to 100 years from the current month. By limiting lifespans to 100 years or less, SOCSIM only needs to keep track of 100 years, or 1200 months, of future events. The algorithms and data structures which SOCSIM uses to maintain the complete list of future events is described in this section. Because the number of events scheduled for a particular month is unknown -- and can even grow while the month's events are being executed (as the event competition held right after the execution of an event can return a waiting-time of 0 for the next scheduled event), SOCSIM uses an array of linked lists to store the information.

The array which contains the heads of each month's linked list of events, can best be though of as a ring, with 1200 locations, each representing the head of a linked list of events scheduled for a particular month. Since life spans are limited to 100 years from the start of the current month, no event can ever be scheduled for more than 1199 months in the future, thus after each month's events are executed, the location on the 1200 month "ring" is recycled and used to hold the head of the linked list of events scheduled for 100 years forward. This is accomplished via the C operation % 1200, or "mod 1200" which maps any integer onto [0,1199] which will give the correct subscript of the 1200-location array("ring"). As its name suggests, %1200, divides its argument (e.g. the current month) by 1200, and returns the remainder. The current month corresponds to entry 0 and waiting time 0.

The 1200-month ring is called event_queue. Thus, event_queue[3456 % 1200] is a structure which includes an integer (the number of individuals with events scheduled for month 3456) and a pointer to the first person structure of a person with an event scheduled for month 3456. If no one has an event scheduled for month 3456, then the pointer field in the record in event_queue[3456 % 1200] will point to NULL.

The person structure contains a variant record which will point to either the next person with an event scheduled for month 3456 or, if the only or last person with an event in that month, the actual queue element. If event_queue[3456 % 1200] points to person record ego then ego->NEXT_PERSON is a macro that expands to ego->u_event_queue.next_person which points to the second person with an event scheduled for month 3456. The last person on the list will point to ego->MONTH a macro that expands to ego->u_event_queue.month. A three-valued field elsewhere in the person structure keeps track of the kind of pointer to use--it is either a null pointer (PTR_NULL--i.e., not yet set), a pointer to the event_queue element (PTR_Q), or the pointer to another individual(PTR_N). Note that if there were no way to get back to the month entry from the queue elements themselves (e.g., if the event_queue array entries just pointed to a ring of person pointers and did not eventually come around to the event_queue element itself) there wouldn't be any way, save for exhaustive comparison to all first elements on all lists, if the lists are ordered, to reset the particular entry's pointer to NULL just before the last element is removed. (As some dequeueing is possible at any time, it's not just an issue that affects the entry for the current month). In this system, the list need not be ordered, but order is use in operations on the marriage_queue and the same insertion routine, install_in_order is used for both.

Of course, when month 3456 begins, the events for the list of people with events scheduled for that month will be executed in random order, so position on this list is unimportant, as far as the order of execution is concerned.

Organization

Scheduling and Unscheduling Events

The procedures install_in_order and queue_delete perform opposite and expected functions for both event and marriage queues. Waiting times to the new event are returned by date_and_event (the kind of event is stored with the individual) and the person waits on the event_queue:

 
m = date_and_event(p);
m %= 1200;
install_in_order(p, event_queue + m, EVENT_QUEUE);
Removal from the queue occurs when the individual is selected at random from the event calendar just prior to execution of the event.

It is also necessary to unschedule events for reasons other than their immediate execution. For example, when a person's spouse dies, their marital status immediately changes, and since SOCSIM's demographic rates are marital-status specific, the next event for which a newly widowed individual is scheduled is no loger valid. Suppose, for example, that all fertility in a simulation is marital fertility--widows and single women do not give birth. Now, when a woman's husband dies, if her next scheduled event was a birth, then obviously something will need to be done. What is done is that queue_delete is called to delete the woman's next event from the event queue and a new event competition is held for her using rates specific for widows.

One advantage of the circular event_queue is that we can unschedule a person's next event without knowing what month that event is scheduled for. In order to delete an element from a linked list, one must be able to locate the element on the list that points to the element that one wishes to delete. After the deletion, that element must point to the element which the to be deleted element currently points to. The way SOCSIM finds the element which points to the element to be deleted is is by maintaining the list in order, and making it circular in a manner that includes the event_queue entry. If another person points or the event_queue pointer itself to the element to be deleted, all that is necessary is some readjustment so the gap is closed. If the element to be deleted is the only person (or person remaining) scheduled for that month, the event_queue pointer must be set to NULL. But int this case the pointer field of the element to be deleted points to the right event_queue entry. This system would work even if the elements were not stored in person_id order.

Maintaining the Marriage Queue

Because two prospective spouses may not arrive at the altar in the same month, SOCSIM can only schedule the beginning of a "marriage search" rather than a marriage per se. This is an inescapable consequence of the need for two spouses and of SOCSIM's being a closed simulation. When a marriage event is to be executed, a search in random order of all prospective spouses is conducted. If a suitable spouse is found, the marriage is completed, if not, then the unsuccessful suitor joins the appropriate list of prospective spouses and waits there to be selected by a suitor of the opposite gender.

Thus rather than finding, by some criteria, the optimal groom for each prospective bride (or vice versa). SOCSIM chooses a spouse using a preference function (which is defined on an individual basis and stored defined (as a "pointer to a function that returns a pointer to a person") in the


struct person *(*pref)()
field of the person data structure) from those which are already available and meet certain minimum standards, which are controlled by the user. Typically, these standards will preclude incest, remarriage of previously divorced couple, extreme age differences between the spouses, and perhaps intermarriage across group lines. The preference function may be as simple as choosing a spouse at random from those that meet the minimal criteria, or as complicated as finding the maximum of a scoring function (which is is associated with the individual, but elsewhere in the person record) using a system which incorporates age differences, number of previous marriages ("marity"), and ages of children.

Each time a marriage event is executed -- that is, each time a marriage search is begun -- a list of eligible spouses is constructed from the list of people of the opposite gender who are waiting in the marriage queue (if a no-trivial scoring system is used, only those candidates that achieve or exceed the previous high score are included at each step). The marriage queue consists of all of the people who have had a marriage event come up, but were unable to find a suitable mate from those available at that time. The marriage queue is organized so as to make the eligibility list, which must be constructed for each suitor, as small as possible and also to reuse as much of the event queue's queue maintenance code as possible.

The marriage queue consists of a linked list ordered by age for each sex separately. The purpose of keeping the lists separate by gender is fairly obvious, The individual's scoring function is sufficient to make further distinctions (e.g., by marital status) or enforce endogamy. Candidates that meet minimal criteria with non-decreasing score are linked into a "working marriage queue," where the actual spouse is chosen at random from those with the highest scores (candidates that fail the incest check are rejected and another spouse is chosen at random from those remaining, in the highest scoring group). The queue is constructed in order of increasing score (in effect the criteria become stricter--as there is an increasing score to match) so it's easy to tally and isolate the highest scores. There is an incest check every time the score increases, to avoid a case where a unique high score just happens to be a brother (and other candidates weren't put on the working marriage queue because their score wasn't as high). But the incest checks still must be performed when the spouse is chosen at random--all that the intermediate check guarantees is that there is at least one non-sibling in the group with highest score.

The male and female linked lists are stored in a data structure rather similar to the event_queue--a two-element marriage_queue array. Each element of the array contains an integer value with the number of individuals of the appropriate sex on the queue (this can be 0) and a pointer to the first, i.e., oldest person structure of a person of the appropriate sex who is on the queue. If no one of a particular sex is on the queue then the pointer field in the record in the corresponding marriage_queue structure will point to NULL. The linkages that form the queue are via pointer fields stored in the person data structure. As in the event_queue case, the person structure contains a variant record which will point to either the next person on the same marriage queue or, if the only or last person on the marriage queue, the actual queue element. If marriage_queue[FEMALE] points to person record ego then ego->NEXT_ON_MQUEUE is a macro that expands to ego->u_marriage_queue.next_on_mqueue which points to the second woman waiting to find a spouse. The last (youngest) woman on the list will point to ego->MQUEUE a macro that expands to ego->u_marriage_queue.mqueue. A three-valued field elsewhere in the person structure keeps track of the kind of pointer to use--it is either a null pointer (PTR_NULL--i.e., not yet set), a pointer to the marriage_queue element (PTR_Q), or the pointer to another individual (PTR_N).

Actual maintenance is a bit simpler than in the event_queue case because, given an individual, there is only one possible marriage_queue that individual can wait on. But it's still convenient to reuse the event_queue code (the actual deletion also involves a great deal of pointer restructuring and other kinds of updating) so deleting from the queue, using queue_delete with MARRIAGE_QUEUE as the q_type argument, includes cycling through all elements to find the head. Dequeueing occurs quite often (i.e., whenever a marriage occurs, but also at death). Individuals who cannot find a suitable partner in the course of the marriage procedure are placed on the appropriate queue of those waiting, using the appropriate queue as the slot and MARRIAGE_QUEUE as the q_type, i.e.,


install_in_order(p, marriage_queue + p->sex, MARRIAGE_QUEUE)

Constructing the List of Eligible Spouses

When a marriage search is begun, the procedure marriage is called from the main simulator loop, process_month. Function marriage applies minimal some minimal screening, then the individual's own scoring function to all opposite-sex individuals currently waiting on a marriage queue. Those individuals with score that match the current highest are accumulated in a "working" marriage queue in the procedure create_working_mqueue. Unlike the standard marriage queue, this structure is composed of units consisting of a person pointer, an integer (to keep track of the score), and a pointer to the next element on the queue. These elements can be manipulated without affecting the underlying structure of the marriage queues. The highest-scoring individual (or one randomly chosen from all those with the highest score, or all those remaining, as this might not work out the first time) is checked for "compatibility"--i.e., no incest, or other difficult-to-check excluding details. If the prospective spouse passes this check, new_marriage is called to create a marriage structure and do the associated bookkeeping chores. The no-longer-needed working queue is dismantled using procedure destroy_working_mqueue. The working marriage queue isn't constructed until at least one "compatible" and suitable spouse is found.

Cohabitation

When cohabitation is possible, the union created after the partner is found can take the form of a cohabitation instead of a marriage. (Different spouses may be affected by different cohabitation probabilities, so this distinction can't take effect until both partners are found). The option is implemented using an age-based probability table on the fraction of unions for a particular group and sex are cohabitations. Both the person looking and, later, the spouse are checked for non-NULL cohabitation probability tables, cohab_probs. If a table exists, function lookup_cohab_prob will retrieve the probability, which can be 0. At the point of creation of the marriage structure a single random number is generated--if it is not greater than the appropriate cohabitation probability the union is recast as a cohabitation. The participants in a cohabitation are then eligible for two new events--converting the cohabitation into a marriage or breaking the cohabitation. The union doesn't count toward the individual's "marity" in the current implementation. Both new events are duration-specific events, so the probabilities for each of these events must be divided between the two partners when performing the event competitions at the end of the marriage procedure.

When entering the marriage procedure (which as seen here can have two roles--finding mates for the single and legalizing cohabitations) a cohabitor doesn't need to find a partner--he or she can go right to the part of the procedure that implements record changes. A new record is created for the new marriage (its predecessor, the cohabitation, "ends" for the reason MARAFTERCOHAB.) This second record will result in increased marity, and if a divorce occurs the marital status of the participants will be DIVORCED while if the breakup had occurred during the cohabitation the participants would both have reverted back to their original marital status.

The Working Marriage Queue and the Scoring Function

Generally, only one working marriage queue is created and processed for each individual's initial marriage search, even though different rules may apply to candidates of different groups. It's possible to adjust the scoring function accordingly. For example, under conditions of preferential exogamy, the user can implement an order of preference for candidates by scaling the function so that the lowest acceptable spouse from the preferred group has a higher score than the maximum score possible from a less-favored group: it's simply a matter of adding some points to scores inside some conditional statements (that is, adding points to individuals who would have gotten a non-negative score to begin with).

To permit these distinctions the scoring function is actually part of the individual record (though an alternative implementation could be some conditional statements to find the function to use). It's defined in the person data structure:


int (*score)()
(in the parlance of C, it's a "pointer to a function that returns an integer"). Scoring and preference are related but distinct operations, hence two parts to the process. It's assumed that if a scoring function that has a wide range is used the preference will not be for an individual that has a sub-maximal score (though a random choice from anyone of the opposite sex can work, too).

Selecting a Spouse from the Working Marriage Queue

Once a working marriage queue is constructed using the individual's scoring function, the marriage procedure applies the individual's preference function to examine the working marriage queue highest scorers. If the working queue is non-NULL there is at least one minimally acceptable candidate. A preference function can be as simple as picking these at random until one that doesn't violate cultural taboos is chosen--the scoring procedure guarantees at least one. The scores of unsuccessful candidates are modified, to -1, so that they'll be skipped over the next time the nth candidate is chosen at random. (Candidates that would otherwise score -1 wouldn't get to the working marriage queue, so this the value has particular meaning in this context). If a suitable spouse is found, marriage deletes the spouse from the marriage queue and the event queue calls new_marriage with both spouses as arguments to create a new marriage record for the pair. A similar procedure is used to change cohabitations into marriages--here, only event-queue deletion is necessary for the known spouse, and both are passed to new_marriage.

The cultural taboos are enforced using the function check. This function's purpose is to prevent marriages which violate cultural rules. In the current version, it excludes candidates with common parents and grandparents. Obviously, check could be made to enforce much more complex rules than these. If check returns true, then the non-NULL spouse is passed back to the marriage procedure, which cleans up the working marriage queue calls the new_marriage procedure.

If check returns false, then the working mqueue element is marked to be skipped and the preference function is applied again, to what remains of the working marriage queue (which now contains one fewer usable element). Ultimately, either a suitable spouse is found, or the working marriage queue is exhausted (though this isn't really possible as the working queue isn't constructed until there is at least one candidate that passes the check). If the marriage queue is exhausted, the unsuccessful suitor is installed in the proper part of the proper marriage queue (there to participate in another individual's marriage search), gets a new next event, and is installed on the event queue.

If new_marriage is called outside the cohabitation context, it will determine if the union being created can be a cohabitation. Cohabitations differ in the marital status assumed by the participants and the effect on their marity, otherwise the bookkeeping is parallel. The likelihood of cohabitation cannot be calculated until both spouses are known. In both contexts the new_marriage procedure also constructs the new marriage record and inserts it in the linked list of marriages, updates the marities (if appropriate). and generates new events for both partners with rates that reflect the changed marital status, and places them on the event queue.

Data Structures

SOCSIM is full of data structures. C permits very intricate and specialized data structures--including structures which contain functions--and SOCSIM takes full advantage. Almost all of the data structures used by SOCSIM are defined globally in the shared file defs.h. The most important ones are described in this section.

Population List Records (struct person)

Data on individuals are stored in linked lists of population list records (known elsewhere as struct person). These person records contain information on kinship, marital status and pointers to marital history, dates of birth and death, sex, next scheduled event, pointers to allow interaction with the event queue and marriage queue, pointers to the individual's marriage market scoring and preference functions, and a pointer to the structure with user-defined variables (that might be included in a particular simulation). Since some of this information is ad hoc and not part of a "standard" simulation, the field

 struct extra * egos_extra;
contains an extensible set of information peculiar to the simulation which is read and/or written separately.

The primary record of type struct person is declared as follows:


struct person {
    int person_id;
    int sex;
    int group;
    int birthdate;
    int deathdate;
    int next_event;
    struct person * mother;
    struct person * father;
    struct person * e_sib_mom;
    struct person * e_sib_dad;
    struct person * lborn;
    struct marriage * last_marriage;
    int mstatus;
    double fmult;
    struct person *down;
    struct person *(*pref)();
    int (*score)();
    int pointer_type[2];
    union {
        struct person *next_on_mqueue;
        struct queue_element *mqueue;
    }u_marriage_queue;
    union {
				struct person *next_person;
				struct queue_element *month;
    }u_event_queue;
    struct extra * egos_extra;
    };

The person structure contains all of the information necessary to reconstruct the individual's place in the kinship network. Note that long after an individual's death, their person structure must be available in order to reconcile relationships among those still living. Other pointers serve other roles and only need a functional restoration. The struct person *down pointer is necessary to be able to traverse the entire population list, but it need not be reconstructed exactly to maintain that use.

Kinship Links and the person Structure

After 1200 months--a few generations--of simulation, everybody in a SOCSIM population is related to everybody else. Simulations often begin with a population of notional individuals about whom nothing but sex and age are known. These individuals have no antecedents and are therefore unrelated; these individuals each form the root of a family tree. In a short time, however, the branches of the trees become so intertwined that one can think of the population as a single kinship network. The relationship between any two individuals in the population can be deduced by tracing the path between them through linked kinship structures. In simulation, as distinct from the real world, no kinship information is ever lost. As the population grows, so does the complexity of the kinship network. One of the advantages of the current data structure, however, is that the cost adding a person to the network is not dependent on how complex the network is.

The basic building block of stored kinship network is the person structure. This object contains the pointers by which linked lists of fathers, mothers, (half)siblings, offspring, and spouses are maintained. The diagram below illustrates how linked lists are used to store kinship information for an individual ego. The person structure also contains a field which points to the marriage list structure (marriage) which corresponds to the individual's most recent marriage. A description of the mechanism for storing marital histories is described in below.

include the figure from other version

The large object in the center of the figure represents ego's person data structure. The five small boxes marked "mother", "father", "e_sib_mom", "e_sib_dad", and "lborn" represent pointer fields of person. Pointers to mother and father point quite naturally to person structures of ego's mother and father. The e_sib_mom pointer points to the person structure of ego's next eldest sibling with a common mother. The lborn pointer points to ego's youngest child. The figure is drawn to indicate that ego's next eldest sibling is a full sibling because both the e_sib_mom and e_sib_dad pointers point to the same person structure. If only one of these pointers pointed to the person shown, it would indicate that ego had a half sibling. An example of how half-sibling relationships are stored can be found by tracing the linked list from ego->lborn. The person structure pointed to directly by ego->lborn is ego's most recently born child by her current husband (this is indicated by the child's "father" pointer pointing to the person of ego's husband (i.e., the spouse found via ego's most recent marriage). But note that the child's e_sib_mom and e_sib_dad pointers point to different person structures. In the case pictured above, "step_child" is ego's husband's child by a previous marriage. Many arrow have been omitted from the figure above to improve readability. Also, to make the labels understandable, every person structure is labeled relative to ego. Of course, in SOCSIM there is no such point of view.

Marriage Records

All information pertaining to past and present marriages is stored in a linked-list network of marriage structures. This scheme is similar to but much less complicated than the way in which the kinship network is stored in the person structures.

Each time a marriage is executed, space in memory is allocated for a marriage structure. This record contains all of the data relevant to the marriage as well as various pointers which link it to the population kinship network. The marriage structure is declared as follows:


struct marriage {
    int marr_id;
    struct person *wife;
    struct person *husband;
    int date_start;
    int date_end;
    int reason_end;
    struct marriage *husbands_prior;
    struct marriage *wifes_prior;
    struct marriage *down;
};

Here, marr_id is a unique marriage identifier assigned sequentially as the marriage structures are constructed. Because the person structures of the husband and the wife each point to the marriage structure, marr_id is not actually used by the program except for reading and writing the marriage list file and by post-processing programs. (The marr_id replaces the pointer when the records are written out, and pointers are adjusted when the marriage list is read).

This pointers to person data structures point to the structures for the wife and husband. The other characteristics of the union are stored here--start and end dates, by month in simulation. The value of the end date is set to 0 in marriages that haven't ended. The int reason_end field is similarly set to 0. The pointers to marriage structures are NULL if this is a first marriage. A person's entire marital history can be reconstructed by traversing marriage structures via linked prior marriages until there is a NULL link (or a 0 value in the output file, in post-processing).

The only way to tell a current "cohabitation" union and a "marriage" apart is by the marital status of the participants. Historical unions can be distinguished by the code used to indicate the reason_end-- cohabitations have distinct codes for the analogues to death and divorce, as well as a distinct code for cohabitations that turn into a marriages (these marriages are stored as separate records).

Running Simulations

To run a simulation, the user specifies a "supervisory" (or "sup") file, generally suffixed .sup, which contains a number of flags and options which SOCSIM needs in order to execute the user's simulation. The file can contain the rates themselves, or an include instruction which gives the name of another file containing rates. (It's even possible to nest include statements.) A multi-stage simulation governed by several sets of rates and flags can be set up with a single .sup file, as simulations can be broken into "segments," and each can be specified sequentially. Each segment can correspond to different rates and/or flag settings. Some settings will apply to the whole simulation, and are specified before the settings for the first segment; other settings change at the level of the individual segment. Complete rate sets must be specified for each segment--even if the only change that characterizes the segment is a changed flag--so the include command is particularly convenient. The current implementation of SOCSIM allows the values to be specified in any order.

Demographic Rate Files

A set of demographic rates must be available for each simulation segment. The same rate set may be used for more than one segment, however, each simulation requires at least one rate set. The current version of SOCSIM relys on key words within the rate file to determine which numbers represent which rates and which flags. After a key word is found and deciphered, then SOCSIM expects to find rate data in a particular order. The keyword conventions are explained in below.

The information contained in the key word includes the demographic event for which the rates are about to be read, the gender and marital status of persons to whom the about to be read rates will apply, and if the rates to be read are birth rates, the keyword also includes the parity of births to which the rates apply. All rates resemble age-specific rates even if they are event-specific, and rates up to "age" 1200 months must be specified in all cases. As each line of a rate set is recognized, the procedure add_rate_block is invoked to process the next block of rates for the particular demographic event.

The processing of rate data is complicated by SOCSIM's ability to use arbitrary age ranges. That is, rates do not need to be constant over 5 year intervals as in previous versions. The current version accepts rates for any interval which the user cares to specify. To handle this complication, SOCSIM converts all rates to their equivalent single month rate and stores the result in a linked list of age_block structures which includes the upper age bound and the duration of the age interval over which the rate is effective.


struct age_block {
    int upper_age;
    int width;
    double lambda;
    double mu;
    double modified_lambda;
    struct age_block *previous;
    struct age_block *next;
    double (*mult)();
};

An age_block structure is allocated for each age interval over which a demographic rate is defined. An additional b>age_block is created each time an upper age of less than 1200 months is specified for a particular rate. In this way, each unique event/gender/marital status and possibly parity gives rise to a double-linked list which holds all of the rates that SOCSIM needs to generate a waiting time. The fields of the b>age_block are as follows:

upper_age
is the oldest age (in months) over which the rate is in effect. The youngest age is assumed to be either 0 or previous->upper_age + 1
width
is the number of months overwhich the rate is effective
lambda
is the rate (in monthly terms) which was actually read from the rate file
mu
allows a second parallel rate set share the structure--this is done with the age-specific parts of Lee-Carter rates. These sets must be perfectly aligned at age boundaries.
modified_lambda
is the rate which is used by datev to generate a waiting time. These are generally equal, but not always. When heterogeneous fertility is invoked, and when there is a nonzero minimum birth interval (the segment variable bint), then modified_lambda will differ from lambda and the modification will be made just prior to calling datev. It is necessary to have both fields when, as above, the modification algorithm depends on other quickly changing simulation variables.
previous_block
is a pointer to the age_block structure which holds the rates relevant to a person of the next younger age group--i.e., the previous line (or NULL) in the rate specification.
next_block
is a pointer to the age_block structure which holds the rate relevant to a person of the next younger age group--i.e., the next line (or NULL) in the rate specification.

The heads of these various linked lists are held together in two matrices: rate_set and birth_rate_set. A reference to rate_set[3, E_DEATH, FEMALE, SINGLE] is a reference to the first age_block structure for mortality of single females in group 3. To get the rate for older single females, you would follow the next_blocklink. The linked structure can be very simple, too--if there is no divorce, these rates can be specified with a single line showing 0 probability for age 0 to 1200.

Because there are so many sex/marital status/group/event permutations, SOCSIM does not expect separate rate sets for each and every one. If, for example, mortality rates are independent of group affiliation, then you need only specify the rates for group 1. In general, SOCSIM fills in gaps in the table by first looking for data relevant to similar people from the next lowest group. The procedure fill_rate_gaps is invoked after the entire rate file has been read. It applies a simple set of rules to determine how blank elements in rate_set and birth_rate_set should be filled. Since the elements of rate_set and birth_rate_set are pointers, what actually happens is that the pointers inside these matrices which point to NULL are made to point to non-NULL heads of linked lists which are already pointed to by other elements of the particular matrix. The figure below presents a schematic diagram of the data structure used to store demographic rates.

see figure

Reading and Writing Files

SOCSIM reads and writes several types of files. The preceding section covers rate files, the present section will cover population (.opop), marriage (.omar), extra nonstandard variables (.opox). The structures of these files, in particular the .opop) and .omar files is standardized so that the output of one simulation can serve as the input to another. The pointers necessary for the reading and writing of all files are set during the initialization, prior to the start of the firtst segment.

Population files

The population list is stored in and read from files with the suffix .opop. The file is simply a two dimensional array where each row represents a person and each column a characteristic. Many of the characteristics are actually person ids of the members of the person's kinship network. The sequence of the numbers determines their meaning, so columns need not conform to any fixed format.

Structure

The population list or .opop file is ASCII text and contains 13 integers and a final double precision value--under some circumstances an input file with only the initial 13 integers is allowed. The last value is a fertility multiplier, for females, and it can be generated at the time the file is first read. The order in which the records in the .opop file (or the .omar file) are arranged is unimportant. Numerical order is just as good as any. But different orders will result in "different" simulations. The initial events are generated for individuals in the order they were read in, so different random numbers will be used for each individual. Also, different fertility multipliers may be assigned (if they are necessary in the simulation but are otherwise unspecified).

Processing: Reading and Writing

Processing population files is complicated by the conversion from what is basically a two-dimensional sequential data structure (the .opop file) and the multiply-linked list structure described above. SOCSIM accomplishes this task by making two passes through the population list (though only one file read is necessary). In the first pass, the population pointers are created and stored in an extensible look-up table along with all the integer person_id's that will have to be translated into pointers to the person structure with that index. The table can be use to look up individuals by person_id to obtain their person structures. The entries of individuals include the person_id's of all relatives, so a second pass, for each structure in turn. can lookup the person_id's of all the linked relatives and convert these id's to pointers (the last marriage is translated from id to pointer, too). All pointers to all person structures will be available by the start of the second pass (this won't work at all in one pass, as some pointers are reciprocal, and one of the record has to be read before the other) The same process applies to the current marriage pointer in the person structure. It's obvious that the marriage list has to be processed before the second population list pass. Order constraints on the marriage file allow that work to be done in one pass--all pointers point to marriages with lower marr_id.

The look-up table is a "hash table." If the table wasn't made large enough for efficient use for the population that was actually read in, performance will suffer but the system, and users, won't have to deal with the obscure errors associated with hard-coded array boundaries. The hash table here can hold both marriage and population records. These are tagged so that the entries for marr_id = 703 and person_id = 703 can be distinguished.

The procedure read_initial_pop reads the input .opop file inside a loop which processes one line (with at most 110 characters) at a time. If fertility multipliers haven't been added (either as an absent field, or as a dummy field containing 0) a new value is drawn for females from a beta distribution (males get or keep 0). Fields must be separated by at least one space, but there is no other limit aside from the one created by the line length constraint. The first reading sets up much of each person structure and initializes pointers set in the course of the simulation. These structures are also linked in order of creation, so it is possible to start at the top and do something to each person in some order. (Fortunately, actual order isn't relevant here)

All pointer fields are read as integer id's. Lines containing person structures are installed in the hash table with the set of integer values that need to be translated. To accomplish the second pass, procedure fix_pop_pointers starts at the head of the linked list of all person structures and retrieves the relevant set of integer id's from the hash table entry for that person. It looks up each id in turn, returning pointers to the corresponding person or marriage structure. It then follows the ->down link to the next record, in order of creation.

If a marriage file is not going to be read, then the only relevant information in the starting population list is the age and sex of the members. Even in this situation, two passes over the population file are performed. (It's useful to allow for a more general case where there may be parental information without marriages. And if not, lots of pointers are set to NULL).

Each record of the .opop file stores information in the following order: (1)person_id; (2)sex: 0 if male, 1 if female; (3)group; (4)next scheduled event, or death if already dead; (5)birthdate; (6)mother's person_id, or 0; (7)father's person_id, or 0; (8)eldest sibling via mother person_id, or 0; (9)eldest sibling via father person_id, or 0; (10)last-born child's person_id, or 0; (11)marr_id of last union, or 0; (12)marital status; (13)deathdate (14)fertility multiplier, or 0, if male. Some input files may need the fertility-multipliers--they can be in a 12-column format, or have a column-14 of 0's. All output files have a multiplier--from an assignment, or the one that was read in.

Writing the .opop file is quite straightforward relative to reading it. SOCSIM writes out the population it read in the same order--living or dead, differing by the effects of the events in the course of the simulation--because once again it starts at the head of the list of individuals. Individuals born in the course of the simulation are attached to the end of this linked list and they are written out as they are reached, in order of birth. Note that the last event in the simulation may be something that occurred after the birth date of the last child--deaths, marriages, or divorces could have happened later on.

The procedure write_population, performs the necessary chores for writing the .opop file at the completion of a simulation segment; it traverses the linked list of records via the ->down pointer and writes each in turn, until the end of the list, by calling write_person. Information pertaining to the still living population is all contained in the linked network of person structures. Procedure b>write_person replaces NULL pointers with 0 and replaces the others with person_id or marr_id, as appropriate.

Marriage Files

Marriage files are in most respects simpler than population records. The fact that marriages always contain two people adds only a small measure of complexity. The marriage list file (which must exist, even if empty) will be in a file suffixed .omar. Each record in this file pertains to a single marriage. In polygynous simulations, and in simulations which admit remarriage after divorce of widowhood, an individual's marital history may be represented by more than one marriage record. Unlike population files, marriage files must be in numerical order, so the pointers to previous marriages (with lower id's therefore read and reconstructed earlier) can be assigned to these marriages in one pass. However, that usually isn't difficult to arrange, as marriage files are typically in-order SOCSIM output, anyway.

Structure

As with the population table, SOCSIM stores the marriage information in a linked network of marriage structures, but converts that information into a rectangular array of integers for external storage. The marriage structure is described above. The .omar file is just a rectangular array of integers, where each row corresponds to a marriage. Fields are separated in text files by blank spaces, .omar files written by SOCSIM use only one blank space as field separator, but SOCSIM will read files with any number of blanks, subject to a total line width constraint.

Each record of the .omar file stores information in the following order: (1) marriage id; (2)person_id of wife;(3) person_id of husband;(4) date marriage began; (5) date marriage ended; (6) event which caused marriage to end or "the NULL event" if marriage has not yet terminated;(7)marriage id of wife's most recent prior marriage;(8) marriage id of husband's most recent prior marriage.

Extra Files

Extra files (files with suffix .opox)contain a vector of egos_extra characteristics associated with each person that aren't standard across all simulations. The kinds of "characteristic" defined in an egos_extra structure can be anything that can be stored in a C record. These can be used to store the dates of other events by storing the dates as an integer value. Real numbers, such as a "death multiplier" or a value for wealth could go in into a double-precision values. An individual's wealth-transfer function can go into a pointer to a function that returns an integer (to signal that the transfer succeeded). The file is read in at the start of the simulation. The relevant person_id must be part of the line, to allow connection to the proper person structure. (the .opop file is read after the the first pass through the population list). Access to the structure is via the ego->egos_extra field.

The egos_extra structure is also used to as a place to park individual-specific values that are relatively hard to compute. For example, it's possible to count the number of children, parity, before calculating the waiting time until the next birth (as many rate sets have several levels, for different parities), but since parity changes within the simulation, only, it's also a reaconable tradeoff to save the value in the a defined field in the egos_extra structure and update it (ego->egos_extra->parity++)

Structure

The extra file structure is defined in defs.h. Like the marriage file, .opox file need not be read, and not all need be written. The standard version includes the values that are easier to update than compute and the migration_date. The structure and the I/O format statements will have to be adjusted for other formats, in read_xtra and write_xtra, too. And not all characteristics can be read via a file. For example, if there are several possible transfer functions, the only way to inform the program is via a temporary value assigned to a code, which can be converted from code to the approriate function as the .opox file is read.

The starndard egos_extra structure:


struct extra {
    int parity;
    int marity;
    int migration_date;
    int prev_marital_status;
};