User Tools

Site Tools


howtos:workwithdata:importing_and_tabulating_record-based_datasets

Importing and Tabulating Record Based Datasets

The record dataFormat option for the create tool allows you to read in a source data file into a two dimensional variable consisting of a numerical record id and a user defined set of fields. This technique requires the format of the source data to have unique column descriptions with fields that are usually related in some way. This technique is useful for:

creating sets from source data;
converting text fields to integer codes;
extracting source data from a data set that is not continuous, and
tabulating or cross-tabulating variables.

All of the following examples refer to the data found in the image below

creating sets from source data

The create-fieldname parameter in the sample code below allows you to populate a pre-defined informant from your source data file. In this example the data is based on age and region, so we are intereseted in creating informants for these two parameters. We can either use these informants directly in the model structure or they can be mapped to an existing informant.

local recordData[recordID,recordFields] = create (; dataFormat=record, fields=recordFields, \
	file=$fullFileName, firstLine=1, delimiter=",", \
	replaceWhiteSpaceInData="_", \
	indexTo-Column_A=Column_A, create-Column_A=on, \ (the create-Column parameter is only required if the corresponding informant has not already been created)
	indexTo-Column_C=Column_C, create-Column_C=on, \
	takeField=Column_F, \
	takeField=Column_I)

Using the following display tool we see the elements of these informants in the tool console as seen below

display (Column_A[],Column_C[])

{{:howtos:workwithdata:record_format_ex1_rev1.jpg|
back to top

converting text fields to integer codes

Only numerical data can be read into a tool variable using record format. Any text that you have in your data set is simply treated as zero as seen by running the code sample below

local recordData[recordID,recordFields] = create (; dataFormat=record, fields=recordFields, \
	file=$fullFileName, firstLine=1, delimiter=",", \
	replaceWhiteSpaceInData="_", \
	takeField=Column_A, \
	takeField=Column_C, \
	takeField=Column_I)

table (recordData[recordID,recordFields])

By using the table tool you can see the values in the recordData variable in the Table window as seen below (by default Table hides all zero values)

{{:howtos:workwithdata:record_format_ex2_1.jpg|

However the indexTo-fieldname parameter in the sample code below allows you to convert the text into integer codes allowing you to create an index that can be used as a variable map.

local recordData[recordID,recordFields] = create (; dataFormat=record, fields=recordFields, \
	file=$fullFileName, firstLine=1, delimiter=",", \
	replaceWhiteSpaceInData="_", \
	takefield=Column_A, \
	indexTo-Column_C=Column_C, create-Column_C=on, \ (the create-Column parameter is only required if the corresponding informant has not already been created)
	takeField=Column_I)

table (recordData[recordID,recordFields])

{{:howtos:workwithdata:record_format_ex2_2.jpg|

When used in conjuction with creating sets from source data you can use this text to create informant sets that can either be used directly in the model structure or mapped to an existing informant.

back to top

extracting source data from a data set that is not continuous

Source data can sometimes contain infomration that is not relevant to the project you are working on. The takeField parameter, in the sample code below, allows you to pick and choose the data you would like to extract from the source data file. (Note without any take fields the create tool will import the entire data set)

local recordData[recordID,recordFields] = create (; dataFormat=record, fields=recordFields, \
	file=$fullFileName, firstLine=1, delimiter=",", \
	replaceWhiteSpaceInData="_", \
	takeField=Column_D, \
	takeField=Column_G, \
	takeField=Column_I)

table (recordData[recordID,recordFields])

By using the table tool you can see the values in the recordData variable in the Table window as seen below (by default Table hides all zero values)

Once you have imported your data you will need to extract the field(s) to populate individual model variables. For example,

You can extract column G inorder to feed these values directly into a model variable,

local columnsG[recordID,dataG] = extract (recordData[recordID,recordFields]; \
	recordFields:"Column_G", recordFields->dataG)

or you can also extract mulitple columns in order to manipulate the data in some way.

local columnsG_I[recID,dataG_I] = extract (recordData[recordID,recordFields]; \
	recordFields:"Column_G, Column_I", recordFields->dataG_I)

local columnsG_I_sum[recordID] = sum (columnsG_I[recordID,dataG_I]; dim=dataG_I)


back to top

tabulating or cross-tabulating variables

Usually when you are using the record format to import data into your model the data fields are related to each other in some way. Therefore the crosstab (or cross tabulation) tool is used in conjunction with this import method. Cross tabulation is often used to analyze the realtionship between two or more categroical variables. The sample code below illustrates how you could cross tabulate the number of times an age variable occurs in each region.

(Note column I is ignored in this instance becuase no set has been created for that variable, see creating sets [hyperlink] for more information)

local recordData[recordID,recordFields] = create (; dataFormat=record, fields=recordFields, \
	file=$fullFileName, firstLine=1, delimiter=",", \
	replaceWhiteSpaceInData="_", \
	indexTo-Column_A=Column_A, create-Column_A=on, \ (the create-Column parameter is only required if the corresponding informant has not already been created)
	indexTo-Column_C=Column_C, create-Column_C=on, \
	takeField=Column_I)

local recordDataRaw[colA, colC, colGI] = crosstab (recordData[recordID,recordFields]; \
	tabOn1=Column_A, \
	tabOn2=Column_C)

table (recordDataRaw[colA, colC, colGI])

By using the table tool you can see the values in the recordData variable in the Table window as seen below (by default Table hides all zero values)

back to top

howtos/workwithdata/importing_and_tabulating_record-based_datasets.txt · Last modified: 2011/06/24 18:55 by chris.strashok