NUK - logo
E-viri
Celotno besedilo
Recenzirano
  • Software for Component-by-C...
    Begaev, A. A.; Salnikov, A. N.

    Lobachevskii journal of mathematics, 09/2023, Letnik: 44, Številka: 9
    Journal Article

    Information about the delays between nodes of the computing cluster potentially can help to reduce the running time of a distributed application. However, this information is not always possible to obtain for the entire supercomputer. The paper proposes a method for load testing and transmission delays estimating by sending messages in the communication environment of the computing cluster. The method allows, according to the data obtained from a part of the nodes of the computing cluster, disseminate information about the delays on the untested part of computing cluster accordingly to the topological structure of cluster interconnects. An automatic search for pairs of computing nodes for which the expected behavior delays will be similar is performed. Estimation of the delay values is made on the basis of the similar pairs searching results and values of previously measured delays for a certain subset of connections between nodes. The method was tested on computing clusters K60 (Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences (KIAM RAS)) and BlueGene/P (Lomonosov Moscow State University) and clusters in Federal Research Center ‘‘Computer Science and Control’’ RAS.