Hi, I should make scientific program which use linea alebra with so man matrixs.
I would like to know whether it is good to make matrix class using dynami memory allocation (new/delete or malloc/free) or using two-D arrays wit fixed (big enough)size?
of course, my focus is efficiency in terms of speed. How much I would experience speed down if I use matrix class using dynami allocation?
If the quantity of speed down is big enough, where can I get some usefu information about matrix computation in embedded system?
c.f.> I already made matrix class using new/delete and am using it for m project. however I think it does not have enough performance. It seems to be slower than I thought. and I am using samsung S3c2410 (ARM920T core with 16K chache and mmu).
This message was sent using the comp.arch.embedded web interface o
If you know the size of your arrays, use the stack. Dynamic memory uses the heap and it will have a performance impact both in memory allocation time and data access time. Only use dynamic memory allocation if you have a good reason for using it. Regards db
This of course depends how often the new/delete is called. If temporary objects are created for each matrix operation, the dynamic memory handling could easily consume most of the time. The advantage with allocating arrays of the correct size is that all elements are close together and with cache and/or virtual memory, quite high hit rates can be achieved with few cache conflicts.
While a large fixed 2-D array does not have run time overhead. However, if only part of each line and column is used, the actual data is distributed all around the big array into small segments. This can be bad for the cache performance. At least make sure that each segment starts at the beginning of a cache lane.
Especially with primitive cache algorithms, such as the single associative 1 to 1 mapping, a physical address can only be loaded into a specific cache address. If the fixed 2-D array allocation is done carelessly, two adjacent lines or columns, which are in wildly separate memory segments could map to the same cache address, causing frequent reloads and dramatically dropping the performance. Using array dimensions that are _not_ a power of 2 usually avoids this problem. Anyway, it is important to know how the cache system works on a specific processor and design the table allocation accordingly.
One way around this would be to allocate just a big 1-D vector and at each matrix reference calculate the actual 1-D element index. In this way all the used elements would be close together. For each matrix, define a macro
#define A(i,j) a[(i)*a_dim1+(j)]
To compare the performance with fixed 2-D allocation or dynamically allocated 2-D, simply redefine
#define A(i,j) a[i][j]
#define A(i,j) this->a[i][j]
By defining macros for each matrix,you could always write the program nearly in good old FORTRAN style :-) e.g. like
C(1,4) = A(2,4) + B(3,1) ;
After performance tests the best implementation could be selected, without making the program hard to read.
Since you are apparently already doing your own array subscripting based on the array size, you have eliminated the potential cache miss problem.
No, that isn't necessary. You might get some small cache advantage from packing all arrays within a single area, but not as big as running the rows together, as you have apparently done.
Since you posted in comp.arch.embedded, it is worth noting that many embedded programmers avoid dynamic memory allocation whenever possible, due to the potential for memory fragmentation causing problems when run for days / weeks / months without restart.