efficiency change by dynamic allocation in microprocessor

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Question>
Hi,
I should make scientific program which use linea alebra with so man
matrixs.

I would like to know whether it is good to make matrix class using dynami
memory allocation (new/delete or malloc/free) or using two-D arrays wit
fixed (big enough)size?

of course, my focus is efficiency in terms of speed.
How much I would experience speed down if I use matrix class using dynami
allocation?

If the quantity of speed down is big enough, where can I get some usefu
information about matrix computation in embedded system?

c.f.> I already made matrix class using new/delete and am using it for m
project. however I think it does not have enough performance.
It seems to be slower than I thought.
and I am using samsung S3c2410 (ARM920T core with 16K chache and mmu).

        
This message was sent using the comp.arch.embedded web interface o
www.EmbeddedRelated.com

Re: efficiency change by dynamic allocation in microprocessor
Quoted text here. Click to load it


If you know the size of your arrays, use the stack. Dynamic memory uses
the heap and it will have a performance impact both in memory
allocation time and data access time. Only use dynamic memory
allocation if you have a good reason for using it.
Regards
db


Re: efficiency change by dynamic allocation in microprocessor


Quoted text here. Click to load it

This of course depends how often the new/delete is called. If
temporary objects are created for each matrix operation, the dynamic
memory handling could easily consume most of the time. The advantage
with allocating arrays of the correct size is that all elements are
close together and with cache and/or virtual memory, quite high hit
rates can be achieved with few cache conflicts.

While a large fixed 2-D array does not have run time overhead.
However, if only part of each line and column is used, the actual data
is distributed all around the big array into small segments. This can
be bad for the cache performance.  At least make sure that each
segment starts at the beginning of a cache lane.

Especially with primitive cache algorithms, such as the single
associative 1 to 1 mapping, a physical address can only be loaded into
a specific cache address. If the fixed 2-D array allocation is done
carelessly, two adjacent lines or columns, which are in wildly
separate memory segments could map to the same cache address, causing
frequent reloads and dramatically dropping the performance. Using
array dimensions that are _not_ a power of 2 usually avoids this
problem. Anyway, it is important to know how the cache system works on
a specific processor and design the table allocation accordingly.

One way around this would be to allocate just a big 1-D vector and at
each matrix reference calculate the actual 1-D element index. In this
way all the used elements would be close together. For each matrix,
define a macro

#define A(i,j)    a[(i)*a_dim1+(j)]

To compare the performance with fixed 2-D allocation or dynamically
allocated 2-D, simply redefine

#define A(i,j)    a[i][j]

or even

#define A(i,j)    this->a[i][j]


By defining macros for each matrix,you could always write the program
nearly in good old FORTRAN style :-) e.g. like

      C(1,4) = A(2,4) + B(3,1) ;

After performance tests the best implementation could be selected,
without making the program hard to read.

Paul


Re: efficiency change by dynamic allocation in microprocessor
Thank you for your kind advice.

I have been programming matrix library with fixed size array.

In example, I have made a structure like this:

typedef struct {
int row;
int col;
double M[1000];
} Matrix;
 
And I have used matrix as follows:

Matrix m_A, m_B, m_C;
..
void function_name(Matrix &A,Matrix &B)
{
..
}
void main(void)
{
..
function_name(&A,&B);
..
}

Of course, I am afraid of increasing of cache miss rate caused by a lot o
matrix structure used (with big -1000- size of array).

Do you mean should I use just one array such as A[100000].
and then I should utilize the adequate portions of array for each matri
using index technique?

or my mehtod is right?

regards.
J.J.

Quoted text here. Click to load it


        
This message was sent using the comp.arch.embedded web interface o
www.EmbeddedRelated.com

Re: efficiency change by dynamic allocation in microprocessor

Quoted text here. Click to load it

Since you are apparently already doing your own array subscripting
based on the array size, you have eliminated the potential cache miss
problem.
Quoted text here. Click to load it

No, that isn't necessary.  You might get some small cache advantage
from packing all arrays within a single area, but not as big as
running the rows together, as you have apparently done.

Since you posted in comp.arch.embedded, it is worth noting that many
embedded programmers avoid dynamic memory allocation whenever
possible, due to the potential for memory fragmentation causing
problems when run for days / weeks / months without restart.

Thad

Site Timeline