User Tools

Site Tools


howtos:workwithdata:overview_of_reading_data

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
howtos:workwithdata:overview_of_reading_data [2011/05/11 18:40]
shona.weldon
howtos:workwithdata:overview_of_reading_data [2011/06/01 20:20] (current)
marcus.williams
Line 9: Line 9:
  
 However, in practice there are complicating factors such as: However, in practice there are complicating factors such as:
-  * The categories/sets which define the input object data are labeled and/or ordered differently from those of the source files. +  * The sets ((The term //set// is used here for simplicity. The term //​informant//​ in whatIf terminology is the most general description for that which defines a classifying dimension for one or more objects. A //set// is a type of informant - there are also //​sequence//​ and //​category//​ -type informants. //​Category//​ in its narrow whatIf context - the definition of permissible combinations of multiple informants - should not be confused with its broader meaning of classifying entity, similar to an //element// within a //​set//​.)) ​which define the input object data are labeled and/or ordered differently from those of the source files. 
-  * The source and object ​categories ​are fundamentally different - therefore the source data require aggregation or splitting (generically called mapping). +  * The source and object ​sets are fundamentally different - therefore the source data require aggregation or splitting (generically called ​//mapping//). 
-  * The time dimension of the object may span multiple source files, requiring time-series assembly. Furthermore,​ there maybe temporal gaps requiring interpolation or projection. +  * dimension of the target ​object ​- say for example, a time dimension - may span multiple source files, requiring time-series assembly. Furthermore,​ there maybe temporal gaps requiring interpolation or projection. 
-  * Multidimensionality ​can cause the internal structure of the source file(s) to be complex, or the number of source files to proliferate.+  * Multi-dimensionality ​can cause the internal structure of the source file(s) to be complex, or the number of source files to proliferate.
  
-Therefore, some data processing is required to bridge the gap between the source data and the target object. This can be accomplished in a number of ways, but a simple ​approach is shown in Figure 2. Here the source files are assembled and processed through a script (or //view//, in whatIf terminology) written in TOOL.+Therefore, some data processing is required to bridge the gap between the source data and the target object. This can be accomplished in a number of ways - one approach is shown in Figure 2. Here the source files are assembled and processed through a script (or //view//, in whatIf terminology) written in TOOL.
  
 {{:​howtos:​workwithdata:​import_process_2.png|Figure 2}} \\ {{:​howtos:​workwithdata:​import_process_2.png|Figure 2}} \\
 **Figure 2 - Importing and processing source data with a view** **Figure 2 - Importing and processing source data with a view**
  
-Embedding a large amount of processing logic in a view - as opposed to a framework diagram - has drawbacks ​such as reduced transparency. To mitigate this the diagram structure can be "​grown"​ further back to "​meet"​ the source data as exemplified in Figure 3. Note that import views are still required but compared to the implied view logic of Figure 2, the views in Figure 3 are oriented more towards simply importing and less towards processing.+Embedding a large amount of processing logic in a view - as opposed to a framework diagram - has drawbacks, one being reduced transparency. To mitigate this the diagram structure can be "​grown"​ further back to "​meet"​ the source data as exemplified in Figure 3. Note that import views ((Although multiple view arrows are shown, in practice the import procedures are likely to be condensed into a single view)) ​are still required but compared to the implied view logic of Figure 2, the views in Figure 3 are oriented more towards simply importing and less towards processing.
  
 {{:​howtos:​workwithdata:​import_process_3.png|}} \\ {{:​howtos:​workwithdata:​import_process_3.png|}} \\
Line 26: Line 26:
 In some special cases, additional pre-processing is performed using another language or tool (e.g. awk, PERL, R). Or, if the pre-processing task is sufficiently large and complex, a separate whatIf model framework might be developed (often called a //​database//​ model). In some special cases, additional pre-processing is performed using another language or tool (e.g. awk, PERL, R). Or, if the pre-processing task is sufficiently large and complex, a separate whatIf model framework might be developed (often called a //​database//​ model).
  
-The problem described here is encountered ​in the broader model development cycle - most intensively in the data assembly and calibration stage, involving historical data - but also during scenario creation with external forecasts and projections. The target objects are defined during the model design stage.+The problem described here is encountered ​at several points within ​the broader model development cycle - most intensively in the data assembly and calibration stage, involving historical data - but also during scenario creation with external forecasts and projections. The target objects are defined during the initial ​model design stage.
  
 ===== Import "​channels"​ ===== ===== Import "​channels"​ =====
 +
 +Marcus to flesh out this and the section below. FIXME
  
   * stand alone TOOL scripts   * stand alone TOOL scripts
   * reading data through Documenter for interactive/​prototyping/​testing   * reading data through Documenter for interactive/​prototyping/​testing
   * create views in SAMM (import views)   * create views in SAMM (import views)
 +
 +Most of the articles in this section ​ are oriented towards ultimately getting data into diagram-based objects loaded in SAMM.
 +
 +Point out some of the differences between the channels (e.g. stand-alone scripts don't have arrays of tool objects, indexes, etc; availability of informants)
  
 ===== Considerations and best practices ===== ===== Considerations and best practices =====
Line 39: Line 45:
   * Document source data origins in the diagram (variable description and notes fields).   * Document source data origins in the diagram (variable description and notes fields).
   * Name time dimensions explicitly.   * Name time dimensions explicitly.
 +    * convention for index naming (e.g. t2010_2015)
 +    * name time informants with data source in name (e.g. cs_t19762006)
   * Import into objects (either diagram or view locals) in their native units of measure - i.e. no magic number conversion hacks. Let TOOL's built-in unit handling and conversion do all the work.   * Import into objects (either diagram or view locals) in their native units of measure - i.e. no magic number conversion hacks. Let TOOL's built-in unit handling and conversion do all the work.
   * Where possible use //​coordinate//​ data format. Explain why. Provide link. FIXME.   * Where possible use //​coordinate//​ data format. Explain why. Provide link. FIXME.
   * Describe other considerations for view vs. "​growing"​ diagram logic vs. stand-along TOOL script pre-processor vs. database.   * Describe other considerations for view vs. "​growing"​ diagram logic vs. stand-along TOOL script pre-processor vs. database.
  
howtos/workwithdata/overview_of_reading_data.1305139226.txt.gz · Last modified: 2011/05/11 18:40 by shona.weldon