====== Importing and Tabulating Record Based Datasets ======
The record dataFormat option for the create tool allows you to read in a source data file into a two dimensional variable consisting of a numerical record id and a user defined set of fields. This technique requires the format of the source data to have unique column descriptions with fields that are usually related in some way. This technique is useful for:
[[importing_and_tabulating_record-based_datasets#creating sets from source data|creating sets from source data]]; \\
[[importing_and_tabulating_record-based_datasets#converting text fields to integer codes|converting text fields to integer codes]]; \\
[[importing_and_tabulating_record-based_datasets#extracting source data from a data set that is not continuous|extracting source data from a data set that is not continuous]], and \\
[[importing_and_tabulating_record-based_datasets#tabulating or cross-tabulating variables|tabulating or cross-tabulating variables]]. \\
All of the following examples refer to the data found in the image below \\
{{:howtos:workwithdata:recordformat.jpg|}}
===== creating sets from source data =====
The create-fieldname parameter in the sample code below allows you to populate a pre-defined informant from your source data file. In this example the data is based on age and region, so we are intereseted in creating informants for these two parameters. We can either use these informants directly in the model structure or they can be [[howtos:toolcoding:using_maps|mapped]] to an existing informant.
local recordData[recordID,recordFields] = create (; dataFormat=record, fields=recordFields, \
file=$fullFileName, firstLine=1, delimiter=",", \
replaceWhiteSpaceInData="_", \
indexTo-Column_A=Column_A, create-Column_A=on, \ (the create-Column parameter is only required if the corresponding informant has not already been created)
indexTo-Column_C=Column_C, create-Column_C=on, \
takeField=Column_F, \
takeField=Column_I)
Using the following display tool we see the elements of these informants in the tool console as seen below
display (Column_A[],Column_C[])
{{:howtos:workwithdata:record-format-ex1.gif|{{:howtos:workwithdata:record_format_ex1_rev1.jpg|}}
\\
[[importing_and_tabulating_record-based_datasets#Importing and Tabulating Record Based Datasets|back to top]]
===== converting text fields to integer codes =====
Only numerical data can be read into a tool variable using record format. Any text that you have in your data set is simply treated as zero as seen by running the code sample below
local recordData[recordID,recordFields] = create (; dataFormat=record, fields=recordFields, \
file=$fullFileName, firstLine=1, delimiter=",", \
replaceWhiteSpaceInData="_", \
takeField=Column_A, \
takeField=Column_C, \
takeField=Column_I)
table (recordData[recordID,recordFields])
By using the table tool you can see the values in the recordData variable in the Table window as seen below (by default Table hides all zero values)
{{:howtos:workwithdata:record-format-ex2_1.gif|{{:howtos:workwithdata:record_format_ex2_1.jpg|}}
However the indexTo-fieldname parameter in the sample code below allows you to convert the text into integer codes allowing you to create an index that can be used as a variable map.
local recordData[recordID,recordFields] = create (; dataFormat=record, fields=recordFields, \
file=$fullFileName, firstLine=1, delimiter=",", \
replaceWhiteSpaceInData="_", \
takefield=Column_A, \
indexTo-Column_C=Column_C, create-Column_C=on, \ (the create-Column parameter is only required if the corresponding informant has not already been created)
takeField=Column_I)
table (recordData[recordID,recordFields])
{{:howtos:workwithdata:record-format-ex2_2.gif|{{:howtos:workwithdata:record_format_ex2_2.jpg|}}
When used in conjuction with [[importing_and_tabulating_record-based_datasets#creating sets from source data|creating sets from source data]] you can use this text to create informant sets that can either be used directly in the model structure or mapped to an existing informant.
\\
\\
[[importing_and_tabulating_record-based_datasets#Importing and Tabulating Record Based Datasets|back to top]]
===== extracting source data from a data set that is not continuous =====
Source data can sometimes contain infomration that is not relevant to the project you are working on. The takeField parameter, in the sample code below, allows you to pick and choose the data you would like to extract from the source data file. (Note without any take fields the create tool will import the entire data set)
local recordData[recordID,recordFields] = create (; dataFormat=record, fields=recordFields, \
file=$fullFileName, firstLine=1, delimiter=",", \
replaceWhiteSpaceInData="_", \
takeField=Column_D, \
takeField=Column_G, \
takeField=Column_I)
table (recordData[recordID,recordFields])
By using the table tool you can see the values in the recordData variable in the Table window as seen below (by default Table hides all zero values)
{{:howtos:workwithdata:record-format-ex3.gif|}}
Once you have imported your data you will need to extract the field(s) to populate individual model variables. For example,\\
You can extract column G inorder to feed these values directly into a model variable,
local columnsG[recordID,dataG] = extract (recordData[recordID,recordFields]; \
recordFields:"Column_G", recordFields->dataG)
{{:howtos:workwithdata:record-format-ex3_1.gif|}}
or you can also extract mulitple columns in order to manipulate the data in some way.
local columnsG_I[recID,dataG_I] = extract (recordData[recordID,recordFields]; \
recordFields:"Column_G, Column_I", recordFields->dataG_I)
local columnsG_I_sum[recordID] = sum (columnsG_I[recordID,dataG_I]; dim=dataG_I)
{{:howtos:workwithdata:record-format-ex3_2.gif|}}
\\
[[importing_and_tabulating_record-based_datasets#Importing and Tabulating Record Based Datasets|back to top]]
===== tabulating or cross-tabulating variables =====
Usually when you are using the record format to import data into your model the data fields are related to each other in some way. Therefore the crosstab (or cross tabulation) tool is used in conjunction with this import method. Cross tabulation is often used to analyze the realtionship between two or more categroical variables. The sample code below illustrates how you could cross tabulate the number of times an age variable occurs in each region.\\
(Note column I is ignored in this instance becuase no set has been created for that variable, see creating sets [hyperlink] for more information)
local recordData[recordID,recordFields] = create (; dataFormat=record, fields=recordFields, \
file=$fullFileName, firstLine=1, delimiter=",", \
replaceWhiteSpaceInData="_", \
indexTo-Column_A=Column_A, create-Column_A=on, \ (the create-Column parameter is only required if the corresponding informant has not already been created)
indexTo-Column_C=Column_C, create-Column_C=on, \
takeField=Column_I)
local recordDataRaw[colA, colC, colGI] = crosstab (recordData[recordID,recordFields]; \
tabOn1=Column_A, \
tabOn2=Column_C)
table (recordDataRaw[colA, colC, colGI])
By using the table tool you can see the values in the recordData variable in the Table window as seen below (by default Table hides all zero values)
{{:howtos:workwithdata:record-format-ex4.gif|}}
[[importing_and_tabulating_record-based_datasets#Importing and Tabulating Record Based Datasets|back to top]]
/*
Describe how to:
* import a table, with attribute columns of different units of measure
* convert continuous variables to bin levels
* convert text fields to integer codes
* tabulate or cross-tabulate, cumulating record counts or a weight value
What about tables a large number of rows? Describe workaround for long informants that won't load through Documenter?
Use import/create with format=record.
Use the created sets to make mappings.*/