efficiency change by dynamic allocation in microprocessor


Hi, I should make scientific program which use linea alebra with so man matrixs.

I would like to know whether it is good to make matrix class using dynami memory allocation (new/delete or malloc/free) or using two-D arrays wit fixed (big enough)size?

of course, my focus is efficiency in terms of speed. How much I would experience speed down if I use matrix class using dynami allocation?

If the quantity of speed down is big enough, where can I get some usefu information about matrix computation in embedded system?

c.f.> I already made matrix class using new/delete and am using it for m project. however I think it does not have enough performance. It seems to be slower than I thought. and I am using samsung S3c2410 (ARM920T core with 16K chache and mmu).

This message was sent using the comp.arch.embedded web interface o

formatting link

Reply to
Loading thread data ...

If you know the size of your arrays, use the stack. Dynamic memory uses the heap and it will have a performance impact both in memory allocation time and data access time. Only use dynamic memory allocation if you have a good reason for using it. Regards db

Reply to

This of course depends how often the new/delete is called. If temporary objects are created for each matrix operation, the dynamic memory handling could easily consume most of the time. The advantage with allocating arrays of the correct size is that all elements are close together and with cache and/or virtual memory, quite high hit rates can be achieved with few cache conflicts.

While a large fixed 2-D array does not have run time overhead. However, if only part of each line and column is used, the actual data is distributed all around the big array into small segments. This can be bad for the cache performance. At least make sure that each segment starts at the beginning of a cache lane.

Especially with primitive cache algorithms, such as the single associative 1 to 1 mapping, a physical address can only be loaded into a specific cache address. If the fixed 2-D array allocation is done carelessly, two adjacent lines or columns, which are in wildly separate memory segments could map to the same cache address, causing frequent reloads and dramatically dropping the performance. Using array dimensions that are _not_ a power of 2 usually avoids this problem. Anyway, it is important to know how the cache system works on a specific processor and design the table allocation accordingly.

One way around this would be to allocate just a big 1-D vector and at each matrix reference calculate the actual 1-D element index. In this way all the used elements would be close together. For each matrix, define a macro

#define A(i,j) a[(i)*a_dim1+(j)]

To compare the performance with fixed 2-D allocation or dynamically allocated 2-D, simply redefine

#define A(i,j) a[i][j]

or even

#define A(i,j) this->a[i][j]

By defining macros for each matrix,you could always write the program nearly in good old FORTRAN style :-) e.g. like

C(1,4) = A(2,4) + B(3,1) ;

After performance tests the best implementation could be selected, without making the program hard to read.


Reply to
Paul Keinanen

Thank you for your kind advice.

I have been programming matrix library with fixed size array.

In example, I have made a structure like this:

typedef struct { int row; int col; double M[1000]; } Matrix; And I have used matrix as follows:

Matrix m_A, m_B, m_C; .. void function_name(Matrix &A,Matrix &B) { .. } void main(void) { .. function_name(&A,&B); .. }

Of course, I am afraid of increasing of cache miss rate caused by a lot o matrix structure used (with big -1000- size of array).

Do you mean should I use just one array such as A[100000]. and then I should utilize the adequate portions of array for each matri using index technique?

or my mehtod is right?

regards. J.J.

This message was sent using the comp.arch.embedded web interface o

formatting link

Reply to

Since you are apparently already doing your own array subscripting based on the array size, you have eliminated the potential cache miss problem.

No, that isn't necessary. You might get some small cache advantage from packing all arrays within a single area, but not as big as running the rows together, as you have apparently done.

Since you posted in comp.arch.embedded, it is worth noting that many embedded programmers avoid dynamic memory allocation whenever possible, due to the potential for memory fragmentation causing problems when run for days / weeks / months without restart.


Reply to
Thad Smith

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.