User Tools

Site Tools


howtos:workwithdata:overview_of_reading_data

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
howtos:workwithdata:overview_of_reading_data [2011/05/11 13:22]
marcus.williams
howtos:workwithdata:overview_of_reading_data [2011/06/01 19:30]
marcus.williams
Line 9: Line 9:
  
 However, in practice there are complicating factors such as: However, in practice there are complicating factors such as:
-  * The categories ​which define the input object are labelled ​and/or ordered differently from those of the source files. +  * The sets ((The term //set// is used here for simplicity. The term //​informant//​ in whatIf terminology is the most general description for that which defines a classifying dimension for one or more objects. A //set// is a type of informant - there are also //​sequence//​ and //​category//​ -type informants. //​Category//​ in its narrow whatIf context - the definition of permissible combinations of multiple informants - should not be confused with its broader meaning of classifying entity, similar to an //element// within a //​set//​.)) ​which define the input object ​data are labeled ​and/or ordered differently from those of the source files. 
-  * The source and object ​categories ​are fundamentally different - therefore the source data require aggregation or splitting (generically called mapping). +  * The source and object ​sets are fundamentally different - therefore the source data require aggregation or splitting (generically called ​//mapping//). 
-  * The time dimension of the object may span multiple source files, requiring time-series assembly. Furthermore,​ there maybe temporal gaps requiring interpolation or projection. +  * dimension of the target ​object ​- say for example, a time dimension - may span multiple source files, requiring time-series assembly. Furthermore,​ there maybe temporal gaps requiring interpolation or projection. 
-  * Multidimensionality ​can cause the internal structure of the source file(s) to be complex, or the number of source files to proliferate.+  * Multi-dimensionality ​can cause the internal structure of the source file(s) to be complex, or the number of source files to proliferate.
  
-Therefore, some data processing is required to bridge the gap between the source data and the target object. This can be accomplished in a number of ways, but a simple ​approach is shown in Figure 2. Here the source files are assembled and processed through a script (or //view//, in whatIf terminology) written in TOOL.+Therefore, some data processing is required to bridge the gap between the source data and the target object. This can be accomplished in a number of ways - one approach is shown in Figure 2. Here the source files are assembled and processed through a script (or //view//, in whatIf terminology) written in TOOL.
  
 {{:​howtos:​workwithdata:​import_process_2.png|Figure 2}} \\ {{:​howtos:​workwithdata:​import_process_2.png|Figure 2}} \\
 **Figure 2 - Importing and processing source data with a view** **Figure 2 - Importing and processing source data with a view**
  
-Embedding a large amount of processing logic in a view - as opposed to a framework diagram - has drawbacks ​such as reduced transparency. To mitigate this the diagram structure can be "​grown"​ further back to "​meet"​ the source data as exemplified in Figure 3. Note that import views are still required but compared to the implied view logic of Figure 2, the views in Figure 3 are oriented more towards simply importing and less towards processing.+Embedding a large amount of processing logic in a view - as opposed to a framework diagram - has drawbacks, one being reduced transparency. To mitigate this the diagram structure can be "​grown"​ further back to "​meet"​ the source data as exemplified in Figure 3. Note that import views ((Although multiple view arrows are shown, in practice the import procedures are likely to be condensed into a single view)) ​are still required but compared to the implied view logic of Figure 2, the views in Figure 3 are oriented more towards simply importing and less towards processing.
  
 {{:​howtos:​workwithdata:​import_process_3.png|}} \\ {{:​howtos:​workwithdata:​import_process_3.png|}} \\
Line 26: Line 26:
 In some special cases, additional pre-processing is performed using another language or tool (e.g. awk, PERL, R). Or, if the pre-processing task is sufficiently large and complex, a separate whatIf model framework might be developed (often called a //​database//​ model). In some special cases, additional pre-processing is performed using another language or tool (e.g. awk, PERL, R). Or, if the pre-processing task is sufficiently large and complex, a separate whatIf model framework might be developed (often called a //​database//​ model).
  
-The problem ​framed ​here is generally ​encountered ​in the broader model development cycle - most intensively in the data assembly and calibration stage using historical data - but also during scenario creation ​using external forecasts and projections. The target objects are defined during the model design stage.+The problem ​described ​here is encountered ​at several points within ​the broader model development cycle - most intensively in the data assembly and calibration stage, involving ​historical data - but also during scenario creation ​with external forecasts and projections. The target objects are defined during the initial ​model design stage.
  
 ===== Import "​channels"​ ===== ===== Import "​channels"​ =====
Line 33: Line 33:
   * reading data through Documenter for interactive/​prototyping/​testing   * reading data through Documenter for interactive/​prototyping/​testing
   * create views in SAMM (import views)   * create views in SAMM (import views)
 +
 +Most of the articles in this section ​ are oriented towards ultimately getting data into diagram-based objects loaded in SAMM.
 +
 +Point out some of the differences between the channels (e.g. stand-alone scripts don't have arrays of tool objects, indexes, etc; availability of informants)
  
 ===== Considerations and best practices ===== ===== Considerations and best practices =====
Line 39: Line 43:
   * Document source data origins in the diagram (variable description and notes fields).   * Document source data origins in the diagram (variable description and notes fields).
   * Name time dimensions explicitly.   * Name time dimensions explicitly.
 +    * convention for index naming (e.g. t2010_2015)
 +    * name time informants with data source in name (e.g. cs_t19762006)
   * Import into objects (either diagram or view locals) in their native units of measure - i.e. no magic number conversion hacks. Let TOOL's built-in unit handling and conversion do all the work.   * Import into objects (either diagram or view locals) in their native units of measure - i.e. no magic number conversion hacks. Let TOOL's built-in unit handling and conversion do all the work.
   * Where possible use //​coordinate//​ data format. Explain why. Provide link. FIXME.   * Where possible use //​coordinate//​ data format. Explain why. Provide link. FIXME.
   * Describe other considerations for view vs. "​growing"​ diagram logic vs. stand-along TOOL script pre-processor vs. database.   * Describe other considerations for view vs. "​growing"​ diagram logic vs. stand-along TOOL script pre-processor vs. database.
  
howtos/workwithdata/overview_of_reading_data.txt · Last modified: 2011/06/01 20:20 by marcus.williams