4-Valued spectral transforms implementation on GPU with Tensor Cores

E-viri

Recenzirano

4-Valued spectral transforms implementation on GPU with Tensor Cores

Marković, Ivica; Stojković, Suzana

The Journal of supercomputing, 2023/1, Letnik: 79, Številka: 1

Journal Article

A spectral transform maps a function from one domain into an appropriate function in another domain where certain characteristics of the function are clearly visible. Spectral transforms have great importance in signal analysis, image processing, logic design, etc. The main problem with spectral transforms is their exponential computational complexity. In the case of discrete functions, spectral transform computation comes down to multiplying the transform matrix and the truth vector of the function. Most of the previously developed algorithms for spectral transforms computation are based on the fast Fourier transform algorithm, some use a compact representation of the functions (such as decision diagrams), and some use special single instruction multiple data hardware structures (such as graphics processing units). In the last years, a special type of graphics processing units with Tensor Cores has been developed for matrix multiplication. These units usually support matrix operations on limited data types and matrix dimensions. In this paper, we propose algorithms for 4-valued Reed–Muller–Fourier and Vilenkin–Chrestenson transforms on the Tensor Cores hardware. Our solution is a customization of the Cooley–Tuckey algorithm for execution on the hardware with specified limitations. Computation times of spectral transforms by the proposed algorithm are compared with computation times of the same transforms on a central processing unit by using serial and parallel algorithms, and on a standard graphics processing units. The described experiments showed that, for a large number of variables, both implementations that are executed on graphics processing units are significantly more efficient than those that are executed on central processing unit. If only implementations on graphics processing units are compared, for the functions of 14 variables, the Tensor Cores implementation of the Reed–Muller–Fourier transform is 2.03 times faster, and the implementation of the Vilenkin–Chrestenson transform is 1.5 times faster. Poorer results obtained for the Vilenkin–Chrestenson transform are due to the limited set of data types provided by the NVIDIA Turing Graphics Processing Units that were used in the experiments. Therefore, one integer spectral coefficient is represented by 4-byte values. Regardless, the proposed algorithms and the Tensor Cores architecture have proven to be a good solution for the spectral transforms calculations.

Išči dalje

Avtor

Dostop do baze podatkov JCR je dovoljen samo uporabnikom iz Slovenije. Vaš trenutni IP-naslov ni na seznamu dovoljenih za dostop, zato je potrebna avtentikacija z ustreznim računom AAI.

Leto	Faktor vpliva		Izdaja		Kategorija		Razvrstitev
Leto	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Povezave do osebnih bibliografij avtorjev	Povezave do podatkov o raziskovalcih v sistemu SICRIS

Vir: Osebne bibliografije in: SICRIS

Naloži sliko

Vnos na polico

Dodajanje gradiva na polico je uspelo.

Dodajanje gradiva na polico je spodletelo.

Dodajanje gradiva na polico ni bilo potrebno.

Trajna povezava

E-pošta

Faktor vpliva

Izberite knjižnično izkaznico:

Baze podatkov, v katerih je revija indeksirana

Citiranje

Tema