• LETTER •



March 2025, Vol. 68, Iss. 3, 139301:1–139301:2 https://doi.org/10.1007/s11432-024-4277-0

## Fast construction and exploration of performance-cost design space for belief propagation polar decoders

Chao JI<sup>1,2</sup>, You YOU<sup>2</sup>, Weikang QIAN<sup>3</sup>, Yongming HUANG<sup>1,2</sup> & Chuan ZHANG<sup>1,2\*</sup>

<sup>1</sup>Laboratory of Efficient Architectures for Digital-communication and Signal-processing (LEADS),

National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China

<sup>2</sup>Purple Mountain Laboratories, Nanjing 211111, China

<sup>3</sup>University of Michigan-Shanghai Jiao Tong University Joint Institute,

Ministry of Education Key Laboratory of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai 200240, China

Received 31 May 2024/Revised 11 September 2024/Accepted 20 January 2025/Published online 13 February 2025

Citation Ji C, You Y, Qian W K, et al. Fast construction and exploration of performance-cost design space for belief propagation polar decoders. Sci China Inf Sci, 2025, 68(3): 139301, https://doi.org/10.1007/s11432-024-4277-0

Considering polar codes in the channel coding field, belief propagation (BP) polar decoding stands out as a potential solution for high-throughput applications due to its inherent advantage on high parallelism, and also emerging other improved BP-based decoding such as BP flip (BPF) [1,2] and BP list (BPL) [3,4] polar decoders. Choosing the original BP polar decoding as a running example, we propose an autogeneration framework of BP polar decoders in the previous work [5], but obtaining the optimal constraint-compatible designs still relies on the post-synthesis results.

Typically, multiple iterations of the full synthesis flow going through circuit translation, logic optimization, and gate mapping are performed utilizing existing EDA tools to evaluate each generated decoder and obtain a numerical report. However, if a substantial number of complex designs are evaluated, the design space construction would be timeconsuming. Furthermore, the design space exploration of autogenerated decoders merely evaluates hardware metrics without algorithm metrics. At the same time, the design space that comprises feasible solutions expands exponentially with the extensive assortment of design parameters, which brings increasing challenges to finding the solutions that satisfy the stipulated constraints and construct the set of optimal designs without exhaustive exploration. Therefore, it is important to achieve a time-saving and multidimension evaluation that incorporates both algorithm and hardware, and also improves the efficiency of design space exploration.

In this study, we propose a fast and reliable evaluation of performance-cost metrics for BP polar decoders. Our contributions are summarized as follows. (1) We introduce one algorithm performance metric related to the decoding frame error rate (FER), and employ the NAND gate as a universal constitution unit to evaluate one hardware performance metric concerning the throughput as well as an other hardware cost metric concerning the overhead. (2) We quickly construct the performance-cost design space of BP polar decoders utilizing the three metrics with acceptable evaluation errors, thus reducing expensive time and design costs. (3) We model the performance-cost design space exploration as a nonlinear discrete integer multi-objective optimization problem and efficiently solve the Pareto optimal set, thus improving the co-design efficiency of the algorithm and hardware.

Baseline architecture. The study takes the single-column implementation of BP polar decoders in [5] as the baseline architecture to evaluate the performance-cost metrics. Note that the early termination unit of the baseline architecture is removed and all simulations with error-correction performance are performed at the same fixed iterations to fairly evaluate signal-to-noise ratio (SNR) of parameterized decoders. The figure of the baseline architecture can be found in Appendix A.

Equivalent gates. In terms of ASIC-based synthesis results in digital circuits, equivalent gates are typically used to evaluate the area. A common strategy is to select the 2-input NAND gate as the equivalent gate, and the entire consumption of equivalent gates can be computed by dividing the synthesized area by the area of a single 2-input NAND gate. Different from the existing NAND-gate-based approaches, we divide digital circuits into two categories: combinational logic and sequential logic, then convert each part into a combination of 2-input NAND gates based on the mapping relationship between logic gates in this work. Detailed information on the logic gate mapping can be found in Appendix B.

Performance-cost metric evaluation. We define FSNR as one algorithm performance metric denoting the SNR at a target FER of  $10^{-3}$ . We configure binary phase-shift keying (BPSK) modulation, additive white Gaussian noise

<sup>\*</sup> Corresponding author (email: chzhang@seu.edu.cn)



Figure 1 (Color online) (a) Three-dimension design space exploration of BP polar decoders; (b) verification of constraint-satisfying solutions

(AWGN) channel, Gaussian approximation (GA) construction, and the two-dimension offset min-sum (OMS) BP decoding with 7-bit quantization and 15 iterations to simulate the FER performance of polar codes. Under the constraint of FER =  $10^{-3}$ , we calculate the approximate FSNR for those simulated decoders that have different values of code length N and the number of information bits K. Based on the least-squares fitting, we determine the approximate expression between FSNR and N in

$$FSNR = (11.3 + 0.4(\log_2 N - 6))K/N - 0.9 - 0.6(\log_2 N - 6).$$
(1)

Detailed information on how to determine FSNR can be found in Appendix C.1.

To evaluate hardware metrics of BP polar decoders, we propose a NAND-gate-based method to approximate the NAND gate consumption and data path delay, where the former and the latter are measured on the consumed number and the passed number of 2-input NAND gates, respectively. Considering all units listed in the baseline architecture, we approximately derive the NAND gate consumption and the data path delay related to each unit. Then, we sum the results of all associated modules to obtain the NAND gate consumption NAND(N, M) and the critical path delay Delay(N, M) of BP polar decoders with code length N and decoding parallelism M. Detailed information on how to determine NAND(N, M) and Delay(N, M) can be found in Appendix C.2.

*Experiments.* To validate the efficacy of our proposed evaluation strategy for hardware metrics, we present the estimated results of NAND(N, M) and Delay(N, M) under different values of N and M. The comparisons with the EDA tools-based method demonstrate that our approach provides a low estimation error and also achieves a good area and delay evaluation for BP polar decoders. Detailed information on the experiments can be found in Appendix D.

Design space exploration. We introduce GATE (NAND gate consumption) as one hardware cost metric and TP (information throughput) as one hardware performance metric. Then, we obtain the multi-objective optimization model of BP polar decoders with the constraints on N, M, and K in

Goals:  $\min FSNR = f_1(N, K),$ min GATE =  $f_2(N, M)$ ,  $\max \mathrm{TP} = f_3(N, M, K);$ (2)Constraints:  $2 \leq M \leq N$ ,  $N/4 \leq K \leq 3N/4$ ,

N, M are the power of 2, K is an integer,

where  $f_1(\cdot)$  refers to (1),  $f_2(\cdot)$  relies on NAND(N, M), and  $f_3(\cdot)$  relies on Delay(N, M). Detailed information on how to determine  $f_1(\cdot)$ ,  $f_2(\cdot)$ , and  $f_3(\cdot)$  can be found in Appendix E.1.

We evaluate the decoders under  $N = \{128, 256, 512\}$ and present the design space of  $\{\mathrm{TP},\mathrm{GATE},\mathrm{FSNR}\}$  in Figure 1(a), where blue and red solid circles also represent the feasible points and Pareto optimal points. Notably, each feasible point with different metric values is represented by unique design parameters of N, M, and K. Imposing additional requirements such as TP  $\ge 10^7$  bps, GATE  $\leq 10^5$  gates and FSNR  $\leq 1.45$  dB at FER =  $10^{-3}$ , we determine the decoders with {Solution 1: N = 128, M =32, K = 32 and {Solution 2: N = 128, M = 64, K = 32} as the matched solutions. To validate the found decoders, we list the implementation results of TP, GATE, and FSNR for the two designs in Figure 1(b), where the three metrics all meet the predefined constraints. Detailed information on the design of space exploration can be found in Appendix E.2.

Conclusion. We propose a fast and reliable performancecost evaluation and exploration for BP polar decoders, which incorporates both algorithm and hardware metrics. Future work would focus on introducing more performance-cost metrics to achieve a high-dimension design space exploration and proposing other effective estimation approaches of hardware metrics to better emulate the realistic behavior of the synthesis flow.

Acknowledgements This work was supported in part by National Natural Science Foundation of China (Grant Nos. 62122020, 62331009, 62401642), Jiangsu Provincial Natural Science Foundation (Grant Nos. BG2024004, BK20243059, BM2023016), Fundamental Research Funds for the Central Universities, and Postgraduate Research and Practice Innovation Program of Jiangsu Province (Grant No. KYCX21\_0105).

Supporting information Appendixes A-E. The supporting information is available online at info.scichina.com and link. springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

## References

- Yu Y, Pan Z, Liu N, et al. Belief propagation bit-flip de-coder for polar codes. IEEE Access, 2019, 7: 10937–10946 1
- Shen Y, Song W, Ji H, et al. Improved belief propagation polar decoders with bit-flipping algorithms. IEEE Trans Commun, 2020, 68: 6699–6713
- Elkelesh A, Ebada M, Cammerer S, et al. Belief propa-3 gation list decoding of polar codes. IEEE Commun Lett, 2018, 22: 1536–1539 Ren Y, Shen Y, Zhang L, et al. High-throughput and flex-
- 4 lible belief propagation list decoder for polar codes. IEEE Trans Signal Process, 2024, 72: 1158–1174 Ji C, Shen Y, Zhang Z, et al. Autogeneration of pipelined belief propagation polar decoders. IEEE Trans VLSI Syst,
- 5 2020. 28: 1703-1716