User Tools

Site Tools


howtos:workwithdata:overview_of_reading_data

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
howtos:workwithdata:overview_of_reading_data [2011/05/26 14:36]
marcus.williams
howtos:workwithdata:overview_of_reading_data [2011/06/01 20:20] (current)
marcus.williams
Line 9: Line 9:
  
 However, in practice there are complicating factors such as: However, in practice there are complicating factors such as:
-  * The sets ((The term //​informant//​ in the whatIf terminology is the most general description for that which defines a classifying dimension for one or more objects. A //set// is a type of informant - there are also //​sequence//​ and //​category//​ -type informants. //​Category//​ in its narrow whatIf context - the definition of permissible combinations of multiple informants - should not be confused with its broader meaning of classifying entity, similar to an //element// within a //set//.)) which define the input object data are labeled and/or ordered differently from those of the source files. +  * The sets ((The term //set// is used here for simplicity. ​The term //​informant//​ in whatIf terminology is the most general description for that which defines a classifying dimension for one or more objects. A //set// is a type of informant - there are also //​sequence//​ and //​category//​ -type informants. //​Category//​ in its narrow whatIf context - the definition of permissible combinations of multiple informants - should not be confused with its broader meaning of classifying entity, similar to an //element// within a //set//.)) which define the input object data are labeled and/or ordered differently from those of the source files. 
-  * The source and object ​categories/sets are fundamentally different - therefore the source data require aggregation or splitting (generically called //​mapping//​). +  * The source and object sets are fundamentally different - therefore the source data require aggregation or splitting (generically called //​mapping//​). 
-  * The time dimension ​(or other dimension) ​of the object may span multiple source files, requiring time-series assembly. Furthermore,​ there maybe temporal gaps requiring interpolation or projection.+  * dimension of the target ​object ​- say for example, a time dimension - may span multiple source files, requiring time-series assembly. Furthermore,​ there maybe temporal gaps requiring interpolation or projection.
   * Multi-dimensionality can cause the internal structure of the source file(s) to be complex, or the number of source files to proliferate.   * Multi-dimensionality can cause the internal structure of the source file(s) to be complex, or the number of source files to proliferate.
  
Line 19: Line 19:
 **Figure 2 - Importing and processing source data with a view** **Figure 2 - Importing and processing source data with a view**
  
-Embedding a large amount of processing logic in a view - as opposed to a framework diagram - has drawbacks ​such as reduced transparency. To mitigate this the diagram structure can be "​grown"​ further back to "​meet"​ the source data as exemplified in Figure 3. Note that import views are still required but compared to the implied view logic of Figure 2, the views in Figure 3 are oriented more towards simply importing and less towards processing.+Embedding a large amount of processing logic in a view - as opposed to a framework diagram - has drawbacks, one being reduced transparency. To mitigate this the diagram structure can be "​grown"​ further back to "​meet"​ the source data as exemplified in Figure 3. Note that import views ((Although multiple view arrows are shown, in practice the import procedures are likely to be condensed into a single view)) ​are still required but compared to the implied view logic of Figure 2, the views in Figure 3 are oriented more towards simply importing and less towards processing.
  
 {{:​howtos:​workwithdata:​import_process_3.png|}} \\ {{:​howtos:​workwithdata:​import_process_3.png|}} \\
Line 26: Line 26:
 In some special cases, additional pre-processing is performed using another language or tool (e.g. awk, PERL, R). Or, if the pre-processing task is sufficiently large and complex, a separate whatIf model framework might be developed (often called a //​database//​ model). In some special cases, additional pre-processing is performed using another language or tool (e.g. awk, PERL, R). Or, if the pre-processing task is sufficiently large and complex, a separate whatIf model framework might be developed (often called a //​database//​ model).
  
-The problem described here is encountered ​in the broader model development cycle - most intensively in the data assembly and calibration stage, involving historical data - but also during scenario creation with external forecasts and projections. The target objects are defined during the initial model design stage.+The problem described here is encountered ​at several points within ​the broader model development cycle - most intensively in the data assembly and calibration stage, involving historical data - but also during scenario creation with external forecasts and projections. The target objects are defined during the initial model design stage.
  
 ===== Import "​channels"​ ===== ===== Import "​channels"​ =====
 +
 +Marcus to flesh out this and the section below. FIXME
  
   * stand alone TOOL scripts   * stand alone TOOL scripts
Line 35: Line 37:
  
 Most of the articles in this section ​ are oriented towards ultimately getting data into diagram-based objects loaded in SAMM. Most of the articles in this section ​ are oriented towards ultimately getting data into diagram-based objects loaded in SAMM.
 +
 +Point out some of the differences between the channels (e.g. stand-alone scripts don't have arrays of tool objects, indexes, etc; availability of informants)
  
 ===== Considerations and best practices ===== ===== Considerations and best practices =====
Line 41: Line 45:
   * Document source data origins in the diagram (variable description and notes fields).   * Document source data origins in the diagram (variable description and notes fields).
   * Name time dimensions explicitly.   * Name time dimensions explicitly.
 +    * convention for index naming (e.g. t2010_2015)
 +    * name time informants with data source in name (e.g. cs_t19762006)
   * Import into objects (either diagram or view locals) in their native units of measure - i.e. no magic number conversion hacks. Let TOOL's built-in unit handling and conversion do all the work.   * Import into objects (either diagram or view locals) in their native units of measure - i.e. no magic number conversion hacks. Let TOOL's built-in unit handling and conversion do all the work.
   * Where possible use //​coordinate//​ data format. Explain why. Provide link. FIXME.   * Where possible use //​coordinate//​ data format. Explain why. Provide link. FIXME.
   * Describe other considerations for view vs. "​growing"​ diagram logic vs. stand-along TOOL script pre-processor vs. database.   * Describe other considerations for view vs. "​growing"​ diagram logic vs. stand-along TOOL script pre-processor vs. database.
  
howtos/workwithdata/overview_of_reading_data.1306420565.txt.gz · Last modified: 2011/05/26 14:36 by marcus.williams