User Tools

Site Tools


howtos:workwithdata:overview_of_reading_data

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
howtos:workwithdata:overview_of_reading_data [2011/05/26 14:39]
marcus.williams
howtos:workwithdata:overview_of_reading_data [2011/06/01 19:30]
marcus.williams
Line 9: Line 9:
  
 However, in practice there are complicating factors such as: However, in practice there are complicating factors such as:
-  * The sets ((The term //set// is used here for simplicity. The term //​informant//​ in the whatIf terminology is the most general description for that which defines a classifying dimension for one or more objects. A //set// is a type of informant - there are also //​sequence//​ and //​category//​ -type informants. //​Category//​ in its narrow whatIf context - the definition of permissible combinations of multiple informants - should not be confused with its broader meaning of classifying entity, similar to an //element// within a //set//.)) which define the input object data are labeled and/or ordered differently from those of the source files.+  * The sets ((The term //set// is used here for simplicity. The term //​informant//​ in whatIf terminology is the most general description for that which defines a classifying dimension for one or more objects. A //set// is a type of informant - there are also //​sequence//​ and //​category//​ -type informants. //​Category//​ in its narrow whatIf context - the definition of permissible combinations of multiple informants - should not be confused with its broader meaning of classifying entity, similar to an //element// within a //set//.)) which define the input object data are labeled and/or ordered differently from those of the source files.
   * The source and object sets are fundamentally different - therefore the source data require aggregation or splitting (generically called //​mapping//​).   * The source and object sets are fundamentally different - therefore the source data require aggregation or splitting (generically called //​mapping//​).
-  * The time dimension ​(or other dimension) ​of the object may span multiple source files, requiring time-series assembly. Furthermore,​ there maybe temporal gaps requiring interpolation or projection.+  * dimension of the target ​object ​- say for example, a time dimension - may span multiple source files, requiring time-series assembly. Furthermore,​ there maybe temporal gaps requiring interpolation or projection.
   * Multi-dimensionality can cause the internal structure of the source file(s) to be complex, or the number of source files to proliferate.   * Multi-dimensionality can cause the internal structure of the source file(s) to be complex, or the number of source files to proliferate.
  
Line 19: Line 19:
 **Figure 2 - Importing and processing source data with a view** **Figure 2 - Importing and processing source data with a view**
  
-Embedding a large amount of processing logic in a view - as opposed to a framework diagram - has drawbacks ​such as reduced transparency. To mitigate this the diagram structure can be "​grown"​ further back to "​meet"​ the source data as exemplified in Figure 3. Note that import views are still required but compared to the implied view logic of Figure 2, the views in Figure 3 are oriented more towards simply importing and less towards processing.+Embedding a large amount of processing logic in a view - as opposed to a framework diagram - has drawbacks, one being reduced transparency. To mitigate this the diagram structure can be "​grown"​ further back to "​meet"​ the source data as exemplified in Figure 3. Note that import views ((Although multiple view arrows are shown, in practice the import procedures are likely to be condensed into a single view)) ​are still required but compared to the implied view logic of Figure 2, the views in Figure 3 are oriented more towards simply importing and less towards processing.
  
 {{:​howtos:​workwithdata:​import_process_3.png|}} \\ {{:​howtos:​workwithdata:​import_process_3.png|}} \\
Line 26: Line 26:
 In some special cases, additional pre-processing is performed using another language or tool (e.g. awk, PERL, R). Or, if the pre-processing task is sufficiently large and complex, a separate whatIf model framework might be developed (often called a //​database//​ model). In some special cases, additional pre-processing is performed using another language or tool (e.g. awk, PERL, R). Or, if the pre-processing task is sufficiently large and complex, a separate whatIf model framework might be developed (often called a //​database//​ model).
  
-The problem described here is encountered ​in the broader model development cycle - most intensively in the data assembly and calibration stage, involving historical data - but also during scenario creation with external forecasts and projections. The target objects are defined during the initial model design stage.+The problem described here is encountered ​at several points within ​the broader model development cycle - most intensively in the data assembly and calibration stage, involving historical data - but also during scenario creation with external forecasts and projections. The target objects are defined during the initial model design stage.
  
 ===== Import "​channels"​ ===== ===== Import "​channels"​ =====
Line 35: Line 35:
  
 Most of the articles in this section ​ are oriented towards ultimately getting data into diagram-based objects loaded in SAMM. Most of the articles in this section ​ are oriented towards ultimately getting data into diagram-based objects loaded in SAMM.
 +
 +Point out some of the differences between the channels (e.g. stand-alone scripts don't have arrays of tool objects, indexes, etc; availability of informants)
  
 ===== Considerations and best practices ===== ===== Considerations and best practices =====
Line 41: Line 43:
   * Document source data origins in the diagram (variable description and notes fields).   * Document source data origins in the diagram (variable description and notes fields).
   * Name time dimensions explicitly.   * Name time dimensions explicitly.
 +    * convention for index naming (e.g. t2010_2015)
 +    * name time informants with data source in name (e.g. cs_t19762006)
   * Import into objects (either diagram or view locals) in their native units of measure - i.e. no magic number conversion hacks. Let TOOL's built-in unit handling and conversion do all the work.   * Import into objects (either diagram or view locals) in their native units of measure - i.e. no magic number conversion hacks. Let TOOL's built-in unit handling and conversion do all the work.
   * Where possible use //​coordinate//​ data format. Explain why. Provide link. FIXME.   * Where possible use //​coordinate//​ data format. Explain why. Provide link. FIXME.
   * Describe other considerations for view vs. "​growing"​ diagram logic vs. stand-along TOOL script pre-processor vs. database.   * Describe other considerations for view vs. "​growing"​ diagram logic vs. stand-along TOOL script pre-processor vs. database.
  
howtos/workwithdata/overview_of_reading_data.txt ยท Last modified: 2011/06/01 20:20 by marcus.williams