Parallel DFT Implementation and Benchmarking in Cluster Architecture
Resumen
For the execution of very large DFT
(Discrete Fourier Transform), where the
implementation size is limited by the memory
available in a single processor, it is still convenient
to use the larger memory afforded by the use of
cluster architectures. In this work we did different
parallel MATLAB implementations of a onedimensional DFT through a two-dimensional DFT,
which was coded using the row-column algorithm.
One version of the code has client-based pre and
post-processing stages. The cluster master node
was used as the client computer. Since the pre and
post-processing involves matrix transpositions,
which could pose memory limitations for large data
sets, we also did an implementation that distributes
the data directly from disc to the cluster cores. This
second approach allowed us to quadruple the
largest length signal that we could tackle in the
cluster architecture. Largest core memory in the
nodes should allow even larger increases in signal
size. We benchmarked both implementations and
did scalability studies using up to 64 cores.
Key Terms - FFT, Large Signals, Parallel
Processing, Row-Column Algorithm