README
Aaron Gullickson
5/11/2001
UPDATED: 01/14/2002
(Minor edits by Hammel in May 2008)

This readme is intended as an overview of all the files employed in family reconstitution of the Croatian dataset.  The perl and Splus scripts each have further instructions and directions contained within them. 

THE LINKING PROGRAMS

The perl programs access the raw data files sortedbirths, sortedmars,
and sorteddeaths which contain the parish records for these events, from the relevant
*.w files and sorted on date of event. Since these sorted* files are redundant,
they are not posted on the web page.

The perl programs use this raw data to perform linkages, constructing
family histories.  Six perl programs are used to do this.
their names indicate their function. For example b2m links births to
marriages, m2d links marriages to deaths, and so on.  In
general, they follow a similar pattern.  Hashes are created of the
relevant raw data sets using names as the keys.  These are then
matched across two kinds of records, e.g. birth and marriage, and scored by various factors
(same parish, age match, etc.).  In addition, some matches are
rejected completely because they would conflict with previous matches
(a woman dying before she gives birth for example).  Three of the five
files followed this format.  The remaining two (m2m and m2d) simply
imputed the links from Marcia Feitel's previous work on remarriage links
and marriage to death links.  I list each
of the files in the order they should be run below and describe them
briefly.  I do not describe the explicit scoring routines or values
here as we may adjust them later.  You should examine the actual perl
scripts for that information.

m2b.pl - this script looks at birth records and for each birth
attempts to find the marriage from which it came.  It does this
primarily by matching on three of the four names of the spouses in the
marriage and the parents of the birth.  It is highly reliable since
name combinations of three or four names are seldom repeated.  That is
why it is run first.  m2b.pl also links together children for which
parents cannot be found in the marriage records, but who likely share the same parent. 

b2m.pl - This scripts attempts to match births to their subsequent
marriage.  It matches on names and scores individuals on a variety of
factors.  In addition, it rejects matches if they occur before and
after a certain age (see file for exact ages as we may adjust them)
and rejects them if they contradict the matches from m2b.pl (we do
allow some "shotgun" weddings if they occur within a reasonable span
before marriage).

m2m.pl - This script simply goes through marcia's links for remarriage
in combodat2 and assigns them if they do not contradict our earlier
matches (i.e. cut off childbearing from previous marriage too early).

b2d.pl - This script attempts to match births to their own deaths.  It
follows the classic matching and scoring routine and rejects matches
that lead to a death before a marriage or the last linked birth for the person (or
9 months before the last linked birth for men). 

createdatasets.pl - This program looks at the output from the previous
programs and constructs life histories for each person and puts these
in a file called croatdata.txt

m2d.pl - This script cycles through croatdata.txt and if there is a
missing death record which can be filled in by Marcia's marriage to
death links, we assign it here.  The resulting datafile is called
croatdata2.txt.

There is also a seventh perl program called lastevent.pl which will
assign the last event for each person.  It creates a new data file
called croatdata3.txt

croatdata4.txt is created by an eighth program called addgptoloe.pl which uses the
last recorded service of an individual as a godparent or marriage witness as the last event,
if that occurs later than the last observed event assigned by lastevent.pl,

Each of these programs access a subroutine file called generalsubs.pm
which contain some general subroutines.  In addition there is a
matchrecs subroutine at the bottom of m2b, b2m, and b2d.


The entire matching routine can be run with one command from the
directory containing all the relevant code and data files:

./runmatches

This will write over any previous output.

OTHER FILES

The other files  are designed to check the validity
of the matching and compare it to Marcia's findings. 

OUTPUT FILES

Each of the five principal programs produces a file called *.diag.txt (where * is
b2m, b2d, etc).  This file contains diagnostics from the matching
routine.  All except m2d also produce files containing the final
matches called *.matches.txt (b2m actually produces two:
b2m.mmatches.txt referenced by the marriage, and b2m.bmatches.txt
referenced by the birth).  The format of each of these is below:

------------------------------------------------------------------------
b2d.matches.txt

birth id   death id   score           age at death            deathdate

------------------------------------------------------------------------
m2b.matches.txt

marriage id*    last birthdate    # of kids  birthid1  birthid2 ...

*marriage id above 30000 indicates a kinset with missing parents

------------------------------------------------------------------------
b2m.mmatches.txt

mar. id   husbands bid   wifes bid   husbands age   wifes age    score

------------------------------------------------------------------------
b2m.bmatches.txt
bid   marriage id      score        age   sex

------------------------------------------------------------------------
m2m.matches.txt

marriage id      spouse type      remarriage id   ?



In addition, for m2b, b2m, and b2d there are additional files.
*.match.prelim.txt contains the scores for all potential matches.
These were then sorted and the highest score for each potential match
was selected.  *.ties.txt contain any ties on scoring that occurred.

The final output files are croatdata.txt, croatdata2.txt,croatdata3.txt,

and croatdata4.txt. 

croatdata.txt - before m2d links are added
croatdata2.txt - m2d links added but missing last observed event
croatdata3.txt - m2d links and last observed event added
croatdata4.txt - service as godparent or marriage witness added as possible last observed  event

OTHER INPUT/OUTPUT FILES

The same output directory also contains the initial input datasets.
These are sortedbirths, sortedmars, and sorteddeaths, and
combodat2.txt.  combodat2.txt is Marcia's links and is the same as
combodat1.txt except empty cells have been replaced by an explicit
"NA". 

The other files all relate to the Splus programs used to verify and
check the links.  I will not discuss them here.