User Tools

Site Tools


howtos:toolcoding:performance_tuning

Tune your TOOL code for memory and/or runtime constraints

Background Info

TOOL was originally built when memory was scarce and processors were slow. Each tool was optimized for performance in that world, which meant fixed positions for dimensions. Usually the key dimension a single input tool operated on would have to be on the far right and for binary tools (operating on 2 inputs) would require matching dimensions in specific locations. Reorders were scattered through out the code to accommodate these requirements.

Over the years processors and memory improved and the more commonly used tools which operate on one key dimension have been improved to take advantage of the better machines by allowing the key dimension to be anywhere. This reduces the need for reorders and makes the tool code much easier to read and to write.

Recently the “binary” type tools which operate with 2 inputs that interact with each other have been improved too so the order of dimensions does not have to be matching in any way. This has really improved code readability but at a cost of memory.

Today TOOL code is much easier to read and write which is ideal and as long as your objects have a “reasonable” number of elements (say 100M or less) you don't need to think about TOOL memory efficiency. If you have more elements or you are running with a memory strapped machine you do have to concern yourself with efficiency, read on.

General Performance Tuning Info

Facts for a 32bit machine:

  • The actual size of a TOOL object is usually 8 * number of elements since 8 bytes are required for each double.
    • To bring a whole TOOL object into memory you need to ensure that you have enough memory available on your machine to handle the size of your biggest TOOL object (although TOOL usually does not always bring the full object into memory as seen in the examples below)
    • A TOOL object always refers to a file which is the compressed size of the TOOL object (compression depends on the #0's in the object)
  • The maximum signed integer is 2^31 = 2147483648
  • In some cases the max file size on the server is also 2^31 = 2147483648 and each tool object will be saved as a file.

These are the limitations we have until we move to 64bit machines and optimize the TOOL language to use it.

Any tool will need memory for all it's inputs AND outputs but not always enough for the full objects to be in memory. How much of each object that is pulled into memory depends on how the objects are ordered. Read on!

Specificl TOOL Performance Tuning

map, mapcat, sum, (single input tools with key dimension)

These tools can have the “key” dimension usually specified with dim= anywhere in the object, however the further to the right this dimension is the fewer of the input's objects that will have to be read into memory.

Example

The input object is A[a,b,c,d]:

If the key dimension is d, the TOOL will bring in all of the elements contained in the dimension d into memory. The memory requirement can be written as memory = extent[d]. It will then process that section and then move on to the next section also size extent[d]. This will be repeated x times where x = extent[a] * extent[b] * extent[c].

If the key dimension is b, the TOOL will bring in all of the elements contained in the dimensions b, c and d into memory. The memory requirement can be written as memory = extent[b] * extent[c] * extent[d]. It will then process that section and then move on to the next section. This will be repeated x times where x = extent[a]

Case Study: dimension mapping of 116M elem object

Heres a real world example and a test that was done.

Input object:

tmp1[]: array of real numbers, single precision
 version: 3
 desc:    local tmp1[] =
 dim1:    SET; stateLOT; 8
 dim2:    SET; SD96; 58
 dim3:    SET; lndCByIrr; 2
 dim4:    CAT; agrCACat: agrRegType.crop; 48
 dim5:    SEQ; yearbuilt: 1856:2001:5; year
 dim6:    SET; agrCondType; 3
 dim7:    SEQ; time: 1861:2001:5; year
 units:   tonne / hectare
   scientific measure: tonne / hectare
   SI signature:       m^-2 kg 
 data:    116259840 elements, empty

Final output object:

tmp2[]: array of real numbers, single precision
 version: 3
 desc:    local tmp2[] = map (tmp1[];stateLOT->state)
 dim1:    SET; state; 9
 dim2:    SET; SD96; 58
 dim3:    SET; lndCByIrr; 2
 dim4:    CAT; agrCACat: agrRegType.crop; 48
 dim5:    SEQ; yearbuilt: 1856:2001:5; year
 dim6:    SET; agrCondType; 3
 dim7:    SEQ; time: 1861:2001:5; year
 units:   tonne / hectare
   scientific measure: tonne / hectare
   SI signature:       m^-2 kg 
 data:    130792320 elements, empty

Test1 - map the first dimension (stateLOT) using a dimension mapping:

local tmp2[] = map (tmp1[]; stateLOT->state)

Test2 - reorder the stateLOT dimension to the far right, do the dimension mapping, and reorder back:

local tmp1Reord[] = reorder (tmp1[]; 2,3,4,5,6,7,1)
local tmp2Reord[] = map (tmp1Reord[]; stateLOT->state)
local tmp3Reord[] = reorder (tmp2Reord[]; 7,1,2,3,4,5,6)

Runtime comparisons:

  • Test1 - completed in 11 min 23 sec
  • Test2 - completed in 2 min 40 sec
    • first reorder - 23 sec
    • map call - 25 sec
    • second reorder - 1 min 52 sec

multiply, divide, add, subtract, (2 input tools)

When looking at the inputs for the multiply, divide, add subtract tools, TOOL examines the dimensions from left to right. If the dimensions are matching (i.e. a = a) TOOL will loop over these dimensions until tool finds a non-matching dimension. Once a non-matching dimension is found all of the dimensions including and after this dimension need to be taken into memory.

Example

The input objects are A[a,b,c,d] and B[e,f,a,b]:

If the input objects are used as they are shown above, TOOL will bring in all of the elements contained in the dimensions a, b, c, d, e and f into memory. The memory requirement can be written as memory = extent[a] * extent[b] * extent[c] * extent[d] * extent[e] * extent[f].

However, if we reorder the dimensions of object B to look like B[a,b,e,f] then TOOL will bring in the all of the elements contained in the dimensions c, d, e and f into memory. The memory requirement can be written as memory = extent[c] * extent[d] * extent[e] * extent[f]. This will be repeated x times where x = extent[a] * extent[b].

insert

Insert is a unique tool because the shape of the input object and output object need to be the same to ensure that you are inserting apples into apples. Because of this TOOL reads the dimensions of both the input and output into memory.

Example

The input object is A[a,b] and the output object is B[c,d]

TOOL will bring in all of the elements contained in the dimensions a, b, c, and d into memory. The memory requirement can be written as memory = extent[a] * extent[b] * extent[c] * extent[d]

howtos/toolcoding/performance_tuning.txt · Last modified: 2011/06/22 21:01 by chris.strashok