The Supervisory (.sup) File

This document describes supervisory file format and features.

Overview of Top-Level Supervisory Structure

Supervisory files are easiest to understand as nested structures. The outermost part gives information that will characterize the entire simulation--the rate and population files used and the parameter settings--and the control commands. It's probably a good idea to set all parameters in the top level file and provide source information (as comments) for all lower-level files (which typically contain rate sets), just to make sure that the simulation can be reproduced.

Any number of simulation segments can be nested within this outermost structure. A "simulation segment" refers to a period of simulated time under which a specific rate regime holds sway (hence the close association between segments and distinct rate files). For example, one might wish to model a population that experienced 100 years under a high fertility/high mortality regime and then underwent a demographic transition to low fertility/low mortality rates. Such a simulation would require two simulation segments. Modeling the kinship struture of a country with a census (and a new rate set) every 10 years could be done with as many 10-year segments, each with a distinct set of rates, as necessary.

In addition to changing the demographic rates, each segment may also be governed by a distinct set of options and variables which govern the behavior of the simulated population during that period. For example the average inter-birth interval, the ratio of male/female births, or the level of heterogeneity of fertility may change from simulation segment to segment. These should be set in the highest level file, as this localizes required changes and makes it easier to keep consistent.

In the example .sup file below, much use is made of the

include filename 
commands. These have the effect of splicing the named file into the input stream at that point. This is most useful for rate files, which tend to be quite long, full of numbers, relatively static once they are created. (The top level supervisory file could simply have contained all included files directly but that would have made it somewhat harder to use--it also would be much longer and much more difficult to understand.)

The example below sets up a two-segment simulation: segment 1 is 480 months long and is governed by rates stored in file "RATES/rates.1840.1879" and parameters bint and hetfert set in the toplevel file. Segment 2 is 360 months long and is governed by rates stored in "RATES/rates.1880.1909" and parameters bint and hetfertset in the toplevel file.

Example of a .sup File


segments 2
input_file test
output_file test.out
duration 480
bint 12
hetfert 0
include RATES/rates.1840.1879
run
duration 360
bint 24
hetfert 0
include RATES/rates.1880.1909
run
The character * is used to indicate a comment if it is the first non-white-space charater on a line. This is an annotated file version of the file shown above and would be processed identically. The rate-set provenance comments would be appropriate even in an otherwise uncommented file:
*number of segments in the simulation
segments 2
*input file prefix. Read files test.opop, test.omar,
*in the current directory
input_file test
*output file prefix. Write files test.out.opop, test.out.omar
*in directory */tmp.
output_file /tmp/test.out
*set up segment 1
*duration of segment 1
*model to approximate 1840-end of 1879
duration 480
*birth interval setting for segment 1
bint 12
*hetergeneous fertility setting for segment 1
hetfert 0
*file containing the birth, death, marriage and divorce rates
*for segment 1
*is rates.1840.1879 in the directory RATES, which is one below this one
include RATES/rates.1840.1879
*run the first segment of the simulation and continue reading this file
*when done
run
*return to this file and set up segment 2
*duration of segment 2
*model to approximate 1880-end of 1909
duration 360
*birth interval setting for segment 2
bint 24
*hetergeneous fertility setting for segment 2
hetfert 0
*file containing the birth, death, marriage and divorce rates
*for segment 2
*is rates.1880.1909 in the directory RATES, which is one below this one
include RATES/rates.1880.1909 
*run the second segment of the simulation. Segment 2 is the last segment
*so there is no need to continue reading.
run

Included Rate Files

It's obviously easier to separate sets of rates by segment and keep all the commands in a single "command file" (even though they could all be jointed together in single large file). The rates can then be put into arbitrarily many sets of other files. This is made possible by the include command. The command has a single argument--the name of the file--and treats the contents of the named file as though they were spliced into the current file at that point. The command can nest--the included file can include others in in turn. For example, all individual segement ratefiles can include a one file which contains one set of Lee-Carter parameters.

For example, this can be the file named cn the command line when the simulation is run:

segments 3
input_file test
output_file test.out
*segment 1
duration 120
bint 12
hetfert 1
include RATES/rates.1950
run
*segment 2
duration 120
bint 12
hetfert 1
include RATES/rates.1960
run
*segment 3
duration 120
bint 12
hetfert 1
include RATES/rates.1970
run

In addition, each of the rate files can contain lines to include other files. One reason there is a specified start year for the Lee-Carter mortality model is so that the parameter file doesn't need to be modified when it is used for several different segments--the same file can be read unchanged using the include command but the point at which the kt values become relevant can be specified. Accordingly, in file RATES/rates.1950 expect to find lines like:


lc_init 1 F 1950
lc_init 1 M  1950
include lee_carter_us
and in RATES/rates.1960:

lc_init 1 F 1960
lc_init 1 M  1960
include lee_carter_us
and finally, in RATES/rates.1970

lc_init 1 F 1970
lc_init 1 M  1970
include lee_carter_us

The same initialization can be done at an even higher level, in the top-level .sup file:

lc_init 1 F 1950
lc_init 1 M  1950
include RATES/rates.1960
run
...more lines 
lc_init 1 F 1960
lc_init 1 M  1960
include RATES/rates.1960
run
...more lines 
lc_init 1 F 1970
lc_init 1 M  1970
include RATES/rates.1970
run

Lee-Carter Rate Files

Lee-Carter kt values can be read in via a file (with eventual linear continuation, as necessary) or specified using a formula. The program checks to determine whether a complete set--in either form--is specified. In all cases it is necessary to set up Lee-Carter mortality structures for each group and sex modeled using a separate rate set using the lc_init command, analogous to what is shown in the previous section. The year to use as the index into the kt array must be specified here:

lc_init 1 F 1990
lc_init 1 M  1990
SOCSIM must find one of the following sets before the next run command (it can find one form for some of the rates, and the other for the others). The ax values have to appear before the corresponding bx. It's not an error if the start year is after the last specified kt value--it will be within the range of the linear continuation.

Case 1: The kt values are given by a formula

lc_ax 1 F
.. the values
 lc_bx 1 F
.. the values
lc_k_val 1 F  -12.17
lc_k_mean 1 F -.496
lc_k_std_dev 1 F 0.651

Case 2: The kt values are read via the rate file

lc_ax 1 F
.. the values
 lc_bx 1 F
.. the values
lc_k_start_list 1 M
1900    18.3796
1901    17.8665
1902    17.0566
1903    17.0231
...many years
1988    -10.1178
1989    -10.7226
everything before 1990 will be ignored
1990    -11.10
1991    -11.47
1992    -11.85
..many more years
2061    -37.71
2062    -38.08
2063    -38.46
2064    -38.83
2065    -39.21
done