The Fast TracKer (FTK) is an ATLAS trigger upgrade built for full-event, low-latency, high-rate tracking. The FTK core, made of 9U VME boards, performs the most demanding computational task. The ...associative memory board (AMB) serial link processor and the auxiliary card (AUX), plugged on the front and back sides of the same VME slot, constitute the processing unit (PU), which finds tracks using hits from eight layers of the inner detector. The PU works in pipeline with the second stage board (SSB), which finds 12-layer tracks by adding extra hits to the identified tracks. In the designed configuration, 16 PUs and four SSBs are installed in a VME crate. The high power consumption of the AMB, AUX, and SSB (respectively, of about 250, 70, and 160 W per board) required the development of a custom cooling system. Even though the expected power consumption for each VME crate of the FTK system is high compared with a common VME setup, the 8 FTK core crates will use ≈60 kW, which is just a fraction of the power and the space needed for a CPU farm performing the same task. We report on the integration of 32 PUs and eight SSBs inside the FTK system, on the infrastructures needed to run and cool them, and on the tests performed to verify the system processing rate and the temperature stability at a safe value.
A multi-core FPGA-based 2D-clustering implementation for real-time image processing is presented in this paper. The clustering algorithm is using a moving window technique to reduce the time and data ...required for the cluster identification process. The implementation is fully generic, with an adjustable detection window size. A fundamental characteristic of the implementation is that multiple clustering cores can be instantiated. Each core can work on a different identification window that processes data of independent "images" in parallel, thus, increasing performance by exploiting more FPGA resources. The algorithm and implementation are developed for the Fast TracKer processor for the trigger upgrade of the ATLAS experiment but their generic design makes them easily adjustable to other demanding image processing applications that require real-time pixel clustering.
A high-performance "pattern matching" implementation based on the Associative Memory (AM) system is presented. It is designed to solve the real-time hit-to-track association problem for particles ...produced in high-energy physics experiments at hadron colliders. The processing time of pattern recognition in CPU-based algorithms increases rapidly with the detector occupancy due to the limited computing power and input-output capacity of hardware available on the market. The AM system presented here solves the problem by being able to process even the most complex hadron collider events produced at a rate of 100 kHz with an average latency smaller than 10 μs. The board built for this goal is able to execute ~12 petabyte comparisons per second, with peak power consumption below 250 W, uniformly distributed on the large area of the board.
We present a fast general-purpose algorithm for high-throughput clustering of data “with a two-dimensional organization”. The algorithm is designed to be implemented with FPGAs or custom electronics. ...The key feature is a processing time that scales linearly with the amount of data to be processed. This means that clustering can be performed in pipeline with the readout, without suffering from combinatorial delays due to looping multiple times through all the data. This feature makes this algorithm especially well suited for problems where the data have high density, e.g. in the case of tracking devices working under high-luminosity condition such as those of LHC or super-LHC.
The algorithm is organized in two steps: the first step (core) clusters the data; the second step analyzes each cluster of data to extract the desired information. The current algorithm is developed as a clustering device for modern high-energy physics pixel detectors. However, the algorithm has much broader field of applications. In fact, its core does not specifically rely on the kind of data or detector it is working for, while the second step can and should be tailored for a given application. For example, in case of spatial measurement with silicon pixel detectors, the second step performs center of charge calculation. Applications can thus be foreseen to other detectors and other scientific fields ranging from HEP calorimeters to medical imaging.
An additional advantage of this two steps approach is that the typical clustering related calculations (second step) are separated from the combinatorial complications of clustering. This separation simplifies the design of the second step and it enables it to perform sophisticated calculations achieving offline quality in online applications. The algorithm is general purpose in the sense that only minimal assumptions on the kind of clustering to be performed are made.
The associative memory (AM) system of fast tracker (FTK) processor has been designed for the tracking trigger upgrade to the ATLAS detector at the Conseil Europeen Pour La Recherche Nucleaire large ...hadron collider. The system performs pattern matching (PM) using the detector hits of particles in the ATLAS silicon tracker. The AM system is the main processing element of FTK and is mainly based on the use of application-specified integrated circuits (ASICs) (AM chips) designed to execute PM with a high degree of parallelism. It finds track candidates at low resolution which become seeds for a full resolution track fitting. The AM system implementation is based on a collection of large 9U Versa Module Europa (VME) boards, named "serial link processors" (AMBSLPs). On these boards, a huge traffic of data is implemented on a network of 900 2-Gb/s serial links. The complete AM-based processor consumes much less power (~50 kW) than its CPU equivalent and its size is much smaller. The AMBSLP has a power consumption of ~250 W and there will be 16 of them in a crate. This results in unusually large power consumption for a VME crate and the need for complex custom infrastructure in order to have sufficient cooling. This paper reports on the design and testing of the infrastructures needed to run and cool the system which will include 16 AMBSLPs in the same crate, the integration of the AMBSLP inside a first FTK slice, the performance of the produced prototypes (both hardware and firmware), as well as their tests in the global FTK integration. This is an important milestone to be satisfied before the FTK production.
The authors describe a VLSI processor for pattern recognition based on content addressable memory (CAM) architecture, optimized for on-line track finding in high-energy physics experiments. A large ...CAM bank stores all trajectories of interest and extracts the ones compatible with a given event. This task is naturally parallelized by a CAM architecture able to output identified trajectories, searching for matches on 96-bit wide patterns, in just a few 40-MHz clock cycles. We have developed this device (called the AMchip03 processor) for the silicon vertex trigger (SVT) upgrade at the Collider Detector experiment at Fermilab (CDF) using a standard-cell VLSI design methodology. This approach provides excellent pattern density, while sparing many of the complexities and risks associated to a full-custom design. The cost/performance ratio is better by well more than one order of magnitude than an FPGA-based design. This processor has a flexible and easily configurable structure that makes it suitable for applications in other experimental environments. They look forward to sharing this technology