User Tools

Site Tools


howtos:toolcoding:performance_tuning

This is an old revision of the document!


Tune your TOOL code for memory and/or runtime constraints

Background Info

TOOL was originally built when memory was scarce and processors were slow. Each tool was optimized for performance in that world, which meant fixed positions for dimensions. Usually the key dimension a single input tool operated on would have to be on the far right and for binary tools (operating on 2 inputs) would require matching dimensions in specific locations. Reorders were scattered through out the code to accommodate these requirements.

Over the years processors and memory improved and the more commonly used tools which operate on one key dimension have been improved to take advantage of the better machines by allowing the key dimension to be anywhere. This reduces the need for reorders and makes the tool code much easier to read and to write.

Recently the “binary” type tools which operate with 2 inputs that interact with each other have been improved too so the order of dimensions does not have to be matching in any way. This has really improved code readability but at a cost of memory.

Today TOOL code is much easier to read and write which is ideal and as long as your objects have a “reasonable” number of elements (say 100M or less) you don't need to think about TOOL memory efficiency. If you have more elements or you are running with a memory strapped machine you do have to concern yourself with efficiency, read on.

General Performance Tuning Info

Facts for a 32bit machine:

  • The actual size of a TOOL object is usually 8 * number of elements since 8 bytes are required for each double.
    • To bring a whole TOOL object into memory to cover that size (we don't always bring the full thing into memory though!)
    • A TOOL object always refers to a file which is the compressed size of the TOOL object (compression depends on the #0's in the object)
  • The maximum signed integer is 2^31 = 2147483648
  • In some cases the max file size on the server is also 2^31 = 2147483648 and each tool object will be saved as a file.

These are the limitations we have until we move to 64bit machines and optimize the TOOL language to use it.

Any tool will need memory for all it's inputs AND outputs but not always enough for the full objects to be in memory. How much of the objects are pulled into memory depends on how the objects are ordered. Read on!

Specificl TOOL Performance Tuning

map, mapcat, sum, (single input tools with key dimension)

These tools can have the “key” dimension usually specified with dim= anywhere in the object but the further to the right it is the less of the input objects will have to be in memory at a time.

Example

Given object A[a,b,c,d] that you want to run a single input tools with key dimension:

If key dimension is d the tool will bring in y elements of the object into memory at a time where y = extent(d). It will then process that section and then move on to the next section also size extent(d). This will be repeated x times where x = extent(a) * extent(b) * extent©.

If key dimension is b the tool will bring in y elements of the object into memory at a time where y = extent(b) * extent© * extent(d). It will then process that section and then move on to the next section. This will be repeated x times where x = extent(a)

Case Study: dimension mapping of 116M elem object

Heres a real world example and a test that was done.

Input object:

tmp1[]: array of real numbers, single precision
 version: 3
 desc:    local tmp1[] =
 dim1:    SET; stateLOT; 8
 dim2:    SET; SD96; 58
 dim3:    SET; lndCByIrr; 2
 dim4:    CAT; agrCACat: agrRegType.crop; 48
 dim5:    SEQ; yearbuilt: 1856:2001:5; year
 dim6:    SET; agrCondType; 3
 dim7:    SEQ; time: 1861:2001:5; year
 units:   tonne / hectare
   scientific measure: tonne / hectare
   SI signature:       m^-2 kg 
 data:    116259840 elements, empty

Final output object:

tmp2[]: array of real numbers, single precision
 version: 3
 desc:    local tmp2[] = map (tmp1[];stateLOT->state)
 dim1:    SET; state; 9
 dim2:    SET; SD96; 58
 dim3:    SET; lndCByIrr; 2
 dim4:    CAT; agrCACat: agrRegType.crop; 48
 dim5:    SEQ; yearbuilt: 1856:2001:5; year
 dim6:    SET; agrCondType; 3
 dim7:    SEQ; time: 1861:2001:5; year
 units:   tonne / hectare
   scientific measure: tonne / hectare
   SI signature:       m^-2 kg 
 data:    130792320 elements, empty

Test1 - map the first dimension (stateLOT) using a dimension mapping:

local tmp2[] = map (tmp1[]; stateLOT->state)

Test2 - reorder the stateLOT dimension to the far right, do the dimension mapping, and reorder back:

local tmp1Reord[] = reorder (tmp1[]; 2,3,4,5,6,7,1)
local tmp2Reord[] = map (tmp1Reord[]; stateLOT->state)
local tmp3Reord[] = reorder (tmp2Reord[]; 7,1,2,3,4,5,6)

Runtime comparisons:

  • Test1 - completed in 11 min 23 sec
  • Test2 - completed in 2 min 40 sec
    • first reorder - 23 sec
    • map call - 25 sec
    • second reorder - 1 min 52 sec

multiply, divide, add, subtract, (2 input tools)

FIXME

– Again order matters

– Basically if there are matching dimensions keep them together in the LEFT most postions

insert, reorder

FIXME

– These tools require bringing the WHOLE input and output object into memory

howtos/toolcoding/performance_tuning.1270139919.txt.gz · Last modified: 2010/04/01 16:38 by shona.weldon