Multi-Processor System-on-Chip 2. Liliana Andrade

Читать онлайн.
Название Multi-Processor System-on-Chip 2
Автор произведения Liliana Andrade
Жанр Зарубежная компьютерная литература
Серия
Издательство Зарубежная компьютерная литература
Год выпуска 0
isbn 9781119818403



Скачать книгу

high end, on the other hand, is set at 2.63 Gb/s per 400 MHz BW channel and spatial data layer. Note that we have insisted on emphasizing the “per channel, per layer”, since handsets have many operating modes and some of those modes may require a multitude of channels or layers active simultaneously, for example, CA12. The 5G standard allows modes that support a multitude of each, further increasing the effective number of RBs communicated. The extra RBs can be used to increase the overall throughput or to increase redundancy by sending the same data on another channel frequency or layer. 5G as of now supports up to 4 × 400 MHz CA (3GPP 2019d) for its upper-end mode and MIMO up to 8 layers of data (3GPP 2019e) extension. Whether or not both extension modes can overlap within 5G is not clear, since the two are often used with opposite goals. Namely, data layer extensions conserve the spectrum and provide throughput by reusing the same spectrum on a different link, while CA extensions use excess unused spectrum to provide throughput. However, reaching 6G, we cannot dismiss the possibility that the overlap could serve as a way of increasing throughput.

      With this in mind, let us make two high-end cases, first using the CA extension and the second high-end use case overlapping both the CA and multiple data layers. Calculating the first, we have 10.52 Gb/s and 84.16 Gb/s, respectively. These rates could be used, for example, for large file transfers. The difference between the low-end and the two high-end throughput corners is approximately 5 × 104× or 4 × 105×, respectively. Therefore, the system needs to deal with vastly varying data processing loads during operation, highlighting the need for flexibility of the compute engine.

      1.2.2.3. Specification summary

Use Case Throughput TTI
[μs]
Low-end LTE legacy (3GPP 2019a, b) 72 6 6 6 0.2 m 1,000
CA high-end FR2 (3GPP 2019d, f) 4×CA, µ = 3,400MHz 3,168 264 1,056 8,448 10.52 125
MIMO CA high-end FR2 (3GPP 2019d, e, f)8×8, 4×CA, µ = 3,400MHz 3,168 264 8,448 67,584 84.16 125

      1.2.3. Outcome of workloads

      We see that the 3GPP specifications follow the trend and vision of 5G laid out in section 1.2.1, incorporating the variability of workloads as the central paradigm.

      With throughput requirements varying by several orders of magnitude, a homogeneous HW solution would be very inefficient for both high-end and low-end use cases. Rather, a heterogeneous HW architecture that is a mixture of HW accelerator engines, banks of programmable processing elements and supporting memory systems would be efficient. Accelerator engines such as dedicated (application-specific) HW accelerators and ASIPs are ideal to deal with extreme high-end use cases and easy-to-scale low-varying algorithms or processing steps, due to their speed and efficient energy per data point consumption. While banks of programmable processing elements such as vDSPs (SIMD cores with signal processing-oriented instruction set architecture) and generic scalar reduced instruction set computer (RISC) cores are ideal to deal with moderate–high to low-end use cases and processing steps that require flexibility, for example, choosing from a set which algorithm to perform, based on the device’s situational parameters and environmental conditions. Such HW is well suited for dealing with highly variable loads by powering HW modules on and off based on the current load. For example, if enough compute resources are available on the vDSPs, i.e. available idle cycles, we could run the communication kernels on the vDSPs in a time-multiplexed manner and keep the HW accelerators off.

      Figure 1.5. Tiled “Kachel” MPSoC with decentralized tightly coupled memories

      Figure 1.6. Heterogeneous MPSoC with a central shared memory architecture

      Without discussing the layouts, as both have their advantages, let us delve into the common thread and analyze the combined effect of workloads and algorithms on HW provisioning requirements and possibly confirm our hypothesis that a heterogeneous MPSoC is required for an efficient future-proof solution.