## **SCIENCE CHINA** Information Sciences



• RESEARCH PAPER •

February 2023, Vol. 66  $122405{:}1{-}122405{:}10$  https://doi.org/10.1007/s11432-021-3483-6

# Knowledge-based neural network SPICE modeling for MOSFETs and its application on 2D material field-effect transistors

Guodong QI<sup>1†</sup>, Xinyu CHEN<sup>2†</sup>, Guangxi HU<sup>1†</sup>, Peng ZHOU<sup>2,3\*</sup>, Wenzhong BAO<sup>2,3\*</sup> & Ye LU<sup>1,3\*</sup>

<sup>1</sup>State Key Laboratory of ASIC and System, School of Information Science and Technology, Fudan University, Shanghai 200433, China;

<sup>2</sup>State Key Laboratory of ASIC and System, School of Microelectronics, Fudan University, Shanghai 200433, China; <sup>3</sup>Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 200433, China

Received 27 December 2021/Revised 9 February 2022/Accepted 13 April 2022/Published online 12 January 2023

Abstract As the traditional scaling of silicon metal-oxide-semiconductor field-effect transistors (MOS-FETs) reaches its physical limit, research efforts on novel semiconductor devices are increasingly desired. To enable the joint optimization of early-stage circuit design and process of novel devices, the rapid creation of an accurate compact model of these devices with the capability to cover process variations is required. In this work, a knowledge-based neural network (KNN) modeling method is proposed. This method separates the geometrical variables from the other input variables of the device, where the geometrical variables are modeled with physics-based analytical equations, while the remaining part is modeled by an artificial neural network. The KNN model takes advantage of the automated numerical fitting capability of the neural network and the geometrical scalability from device physics. The created KNN model is first validated with silicon MOSFET data from the industry standard BSIM6 and shows more than 20% accuracy improvement as compared with the traditional neural network model. Furthermore, MoS<sub>2</sub> field-effect transistors and circuits, such as ring oscillators, standard cells, and logic functional circuits, are experimentally fabricated for model verification. The results show that the KNN model is capable of predicting the electrical characteristics of devices beyond the measurement geometry and facilitates the accurate simulations of statistical circuits with respect to experimental data. This work paves the way for future circuit designs and simulations of novel semiconductor devices.

**Keywords** knowledge-based neural network, MOSFET, 2D material FETs, Monte Carlo simulations, circuit benchmark

Citation Qi G D, Chen X Y, Hu G X, et al. Knowledge-based neural network SPICE modeling for MOSFETs and its application on 2D material field-effect transistors. Sci China Inf Sci, 2023, 66(2): 122405, https://doi.org/10.1007/s11432-021-3483-6

# 1 Introduction

Semiconductor devices are traditionally modeled by using analytical equations for circuit simulations. However, as the devices approach the nanometer scale, the underlying physics of such devices becomes much more complicated, making them difficult to be modeled with solely physics-based compact models. In addition, the actual electrical properties of a device are case sensitive due to process variations [1].

In contrast, the neural network compact model can create high-accuracy numerical models with a short turnaround time, which is critical for the joint optimization of early-stage design and technology for novel devices based on emerging semiconductors [2-4]. However, many experiments have demonstrated that the current neural network modeling method still suffers from several major limitations [5-7]. (1) The accuracy of the created model typically depends on the number of available data, which increases the burden of electrical measurements; and (2) the modeling method is entirely based on mathematical

 $<sup>\</sup>label{eq:corresponding} \ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^{\circ}}\ensuremath{^$ 

 $<sup>\</sup>dagger\,\mathrm{Qi}$  G D, Chen X Y, and Hu G X have the same contribution to this work.

functions that lack physical meanings, which restricts the model scalability. In other words, the model output that exceeds the range of the measured training data can be largely unphysical. This further hinders statistical circuit simulations. Due to these drawbacks, the application of present neural network modeling methods is limited.

In this work, a knowledge-based neural network (KNN) modeling method is proposed to alleviate the aforementioned issues. To begin with, geometrical parameters of a transistor, such as the channel width (W) and length (L), are extracted according to a well-defined physical formula. Then, the formula is multiplied by an artificial neural network (ANN). Hereafter, the traditional ANN modeling method is referred to as "TNN", in contrast to the new "KNN" method. The KNN method is formulated and applied to silicon MOSFET and novel 2D material MoS<sub>2</sub> FETs. Simulation results of the KNNbased model match well with the experimental measurements in the device and circuit levels. It is also demonstrated that the proposed method can achieve a higher accuracy with much less training data, as compared with the TNN-based method. The proposed method also exhibits better scalability to enable accurate statistical simulations.

### 2 Proposed modeling method and device fabrication

#### 2.1 KNN modeling method

In the TNN device model, geometric variables (W, L, ...) and electrical parameters  $(V_{gs}, V_{ds}, ...)$  are all used as inputs to the neural network as in

$$I_{\rm ds} = F(W, L, \dots, V_{\rm gs}, V_{\rm ds}, \dots). \tag{1}$$

In this KNN method, the geometric variables are presented in one function, whereas the other input variables, such as bias conditions, are presented in another function. Eq. (1) is rewritten as

$$I_{\rm ds} = g(W, L, \ldots) f(V_{\rm gs}, V_{\rm ds}, \ldots).$$

$$\tag{2}$$

Therefore, geometric parameters, such as W and L, are first extracted in the KNN method as a function  $g(W, L, \ldots)$ . The purpose of this preprocessing procedure is to separate the geometric parameters from the other parameters. Then, the preprocessed device electrical characteristics, such as terminal current  $I_{ds}$  and charge Q as a function of the input bias voltage, are modeled by a multigradient neural network algorithm [3] in the function  $f(V_{gs}, V_{ds}, \ldots)$ . Finally, the extracted physical parameters W and L are mapped back so that the drain current can be expressed as (2). As a general example, the relationship between the drain current and geometric parameters (W and L) of a MOSFET is given by

$$g(W, L, \ldots) = W^a / L^b, \tag{3}$$

where a and b are the fitting parameters and equal to 1 for the long-channel silicon MOSFET. By considering the approximated layout effect and parasitic resistance, the drain current will be reduced to some extent. Hence, a is usually smaller than 1. The parameter b is close to 0.5 in a ballistic transport regime and close to 1 in a drift-diffusion transport regime [8,9]. Therefore, in our model, the values of a and b are in the range between 0.5 and 1. When the geometric size of a MOSFET exceeds the range of the training data, Eq. (3) can be used to ensure the physicality of the model output for the geometric size exceeding the measurement data range.

#### 2.2 MoS<sub>2</sub> device fabrication

The devices are fabricated on a continuous monolayer  $MoS_2$  film on the wafer-scale sapphire substrate. All electrodes (source/drain/gate) are patterned through regular photolithography. After patterning the shape of the source and drain contacts, 40 nm Au is deposited with electronic beam (E-beam) evaporation.  $CF_4$  plasma etching is performed to define the channel geometry, followed by the dielectric layer deposition (2 nm SiO<sub>2</sub> and 20 nm HfO<sub>2</sub>) using E-beam evaporation and atomic layer deposition. Finally, 40 nm Au is deposited via E-beam evaporation to form top-gate electrodes following another lithography. For the MoS<sub>2</sub> circuit fabrication, an additional via hole layer between the source/drain contact and top-gate electrodes is exercised.  $SF_6$  plasma etching is employed to define the via holes followed by the deposition of 25 nm Au.



Figure 1 (Color online) Comparisons of  $I_{ds}$ - $V_{gs}$  predicted results with the TNN and KNN for the devices with different W/L ( $\mu$ m/ $\mu$ m): (a) 35/10 and (b) 100/10. Comparisons of the  $C_{gg}$ - $V_{g}$  predicted results with the TNN and KNN for the devices with different W/L ( $\mu$ m/ $\mu$ m): (c) 35/10 and (d) 100/10.

### 3 Results and discussion

#### 3.1 Model verification with the traditional MOSFET

In this work, long-channel and short-channel silicon-based MOSFETs are considered to validate the KNN modeling method. The length of the long-channel device is selected as 10  $\mu$ m, while the length of the short-channel device is 0.08  $\mu$ m. The experimental training data are generated with the industry standard silicon MOSFET BSIM6 model [10].

#### 3.1.1 Long-channel MOSFETs

For the MOSFET with a channel length of 10  $\mu$ m, the carrier's transport mechanism is dominated by drifts and diffusions. According to the KNN modeling method introduced in Subsection 2.1, *a* and *b* in (3) take a value close to 1.

The data of MOSFETs with different W/L (µm/µm) (10/10, 20/10, 30/10, 40/10, and 50/10) are used as training data for the TNN and KNN modeling methods, while the data of a MOSFET with W/L(100/10) are used as the benchmark. Figures 1(a)–(d) show the fitting results of *I*-V and *C*-V with the TNN and KNN methods. Both methods can effectively fit the *I*-V and *C*-V characteristics when the W/L (µm/µm) is 35/10. However, when W/L is 100/10, the predicted results with the TNN method seriously deviate from the target, with an average error of 22.27% for *I*-V and 29.24% for *C*-V. By contrast, the average error with the KNN method is only 1.06% for *I*-V and 2.01% for *C*-V.

For the TNN, to obtain the electrical characteristics of different sizes, the corresponding data must be incorporated into the neural network for training. The TNN model has very accurate prediction results within the training range (e.g., W/L is 35  $\mu$ m/10  $\mu$ m), but it shows poor prediction accuracy out of the training range (e.g., W/L is 100  $\mu$ m/10  $\mu$ m). The KNN combines the advantages of neural networks and physical modeling, and achieves accurate fitting based on neural networks. Moreover, the KNN



Qi G D, et al. Sci China Inf Sci February 2023 Vol. 66 122405:4

Figure 2 (Color online) Comparisons of the  $I_{\rm ds}$ - $V_{\rm gs}$  predicted results with the TNN and KNN for the devices with different W/L ( $\mu$ m/ $\mu$ m): (a) 0.095/0.08 and (b) 0.2/0.08. Comparisons of the  $C_{\rm gg}$ - $V_{\rm g}$  predicted results with the TNN and KNN for the devices with different W/L ( $\mu$ m/ $\mu$ m): (c) 0.095/0.08 and (d) 0.2/0.08.

model can predict the data that are not involved in training by using physical methods. Therefore, more accurate results can be obtained with the KNN method than the TNN method when the data are out of the training data range.

#### 3.1.2 Short-channel MOSFETs

For the MOSFET with a channel length of 0.08  $\mu$ m, the layout proximity effect is prominent [11, 12], and the electrical properties of a transistor will be less dependent on the geometric size. Under this circumstance, the values of a and b in (3) should be chosen close to 0.5. The data of different W/L $(\mu m/\mu m)$  (0.08/0.08, 0.09/0.08, 0.1/0.08, 0.11/0.08, and 0.12/0.08) are used as training data to obtain the TNN and KNN models, and the data of W/L  $(\mu m/\mu m)$  (0.2/0.08) are used as the benchmark. In Figures 2(a)–(d), the fitting results of the TNN and KNN to the *I-V* and *C-V* data generated from the BSIM6 model are compared.

Figure 2(a) shows that the two methods can accurately capture the results for the MOSFET with W/L (µm/µm) of 0.095/0.08, where the values of W and L fall into the training data range. However, as Figure 2(b) demonstrates that for the MOSFET with W/L (µm/µm) of 0.2/0.08, where the values of W and L are out of the training data range, the KNN can still efficiently model the benchmark data, while the TNN significantly deviates, especially for the on-state drive current. Compared with the benchmark data, the TNN method shows an average error of 34.07% for *I-V* and 13.80% for *C-V*, whereas for the KNN method, the average error is only 8.73% for *I-V* and 3.29% for *C-V*.

Similar to the long-channel MOSFET, in the short-channel MOSFET, the TNN cannot fit the data with W/L of 0.2 µm/0.08 µm, which is not included in the training data. However, the KNN can not only fit the data within the training range (e.g., W/L is 0.095 µm/0.08 µm) but also predict the data out of the training range (e.g., W/L is 0.2 µm/0.08 µm). Therefore, the KNN method is much more scalable than the TNN method outside the training data range for the short-channel MOSFET.

Qi G D. et al. Sci China Inf Sci February 2023 Vol. 66 122405:5



Figure 3 (Color online) Comparisons of the results simulated by circuits with the TNN and KNN for devices with different W/L $(\mu m/\mu m)$ . (a) Schematic diagram of a two-stage inverter chain; (b) average delay of the Monte Carlo simulation vs. different W/LMOSFETs; (c) schematic diagram of a 17-stage ring oscillator circuit; (d) delay per stage vs. device model with different W/L.

#### 3.1.3 Benchmark in circuit simulation

To further test the KNN method on the circuit level, we perform a statistical Monte Carlo simulation [13, 14] in a ring oscillator (RO) circuit for silicon CMOS, and the simulation results are presented in Figure 3.

In the Monte Carlo simulation, five sets of devices with W/L ( $\mu$ m/ $\mu$ m) (20/10, 40/10, 60/10, 80/10, 0) and 100/10) are considered. The standard deviation of W and L is assumed to be 3%. The variations of W and L are given by (4), where  $R_{\rm an}$  is a random value following the Gaussian distribution and in the range between -1 and 1.

$$\Delta \text{Size} = 0.03 \times R_{\text{an}} \times \text{Size}, \quad \text{Size} \in (W, L).$$
(4)

Twenty samples are used in the Monte Carlo simulation. The input signal enters a two-stage inverter chain, and then the delay between the output and input is obtained. Figure 3(a) shows the schematic circuit of the inverter chain, and Figure 3(b) shows the relationship between the average delays of the Monte Carlo simulation with different W/L. For MOSFETs with W/L of 20/10 and 40/10, the circuit delays resulting from the TNN and KNN methods match well with the benchmark simulations. However, when W/L exceeds the training data range, the accuracy of the TNN method seriously deteriorates. Especially for W/L of 100/10, the TNN method predicts the data with an error of 36.18%, while the KNN method can predict the data with an error of only 1.18%. The model is implemented in Verilog-A code, and a 17-stage RO is used for simulation to test the model. The schematic diagram is shown in Figure 3(c). Figure 3(d) demonstrates the delay of each inverter in the RO circuit versus the predictions obtained with the TNN and KNN methods for different W/L MOSFETs. The findings show that the KNN method can predict the data with great accuracy when W/L is 100/10. The KNN method has a 10% improvement in accuracy as compared with the TNN method.

#### Application of the model in $MoS_2$ FETs and circuits 3.2

As silicon-based MOSFET scales down to its physical limit, novel MOSFETs with new materials and/or new structures have emerged [15–18]. Among these novel devices, semiconductive transitional metal dichalcogenides, especially molybdenum disulfide  $(MoS_2)$ , have attracted great attention in the academic and industrial communities [19–23]. Therefore, it is meaningful to demonstrate our modeling methodology and its simulation capability on experimentally fabricated  $MoS_2$  FETs.



Figure 4 (Color online) (a) Schematic diagram of a MoS<sub>2</sub> FET. (b) Raman and (c) PL spectra for the monolayer MoS<sub>2</sub>.



Figure 5 (Color online) Model of the MoS<sub>2</sub> FET based on the KNN method. Fitting results with the KNN method for the  $I_{ds}$ - $V_{gs}$  data of the MoS<sub>2</sub> MOSFET. The geometric sizes W/L (µm/µm) are (a) 60/40, (b) 90/20, and (c) 30/20. (d) Fitting results with the KNN method for the C-V data of the MoS<sub>2</sub> MOSFET. Wand L are 30 and 20 µm, respectively.

The schematic diagram of a MoS<sub>2</sub>FET is shown in Figure 4(a), where the channel is monolayer MoS<sub>2</sub>, the source and drain electrodes are 40 nm Au, and the insulating layer is 20 nm HfO<sub>2</sub> The MoS<sub>2</sub> film is characterized by Raman and photoluminescence (PL) spectroscopy. The Raman spectrum in Figure 4(b) shows that the difference between  $E_{2g}^1$  and  $A_{1g}$  peaks is approximately 19 cm<sup>-1</sup>, indicating that the MoS<sub>2</sub> film is one layer. The PL spectrum displayed in Figure 4(c) exhibits an A-exciton peak of 1.875 eV, which is consistent with the direct bandgap of the monolayer MoS<sub>2</sub> film. The electrical characterizations of MoS<sub>2</sub> MOSFETs and circuits are performed with an Agilent B1500A semiconductor analyzer. To investigate the dynamic response of the circuit units and RO, the input signals are generated by an Agilent 33622A arbitrary waveform generator, and the output signals are captured by a RIGOL DS1054Z digital oscilloscope and an Agilent B1500A semiconductor analyzer. Three sets of W/L (µm/µm) (30/20, 90/20, and 60/40) are involved. Among them, the data of 60/40 and 90/20 are used for modeling, while the data of 30/20 are used for model validation.

The fitting results with the KNN method are shown in Figures 5(a)-(d). The KNN methods can fit the data of the MoS<sub>2</sub> MOSFETs with W/L (60/40, 90/20), as demonstrated in Figures 5(a) and (b). Moreover, when W/L is 30/20, which is out of the measurement data range, the output results obtained by the KNN method can still match the experimental results with the minimum error, as presented in



Qi G D, et al. Sci China Inf Sci February 2023 Vol. 66 122405:7

Figure 6 (Color online) Statistical comparison between the modeling and experimental data. (a) Optical photograph of the MoS<sub>2</sub> MOSFET with W/L (30/20); (b)  $I_{ds}$ - $V_{gs}$  characteristics of 30 samples; (c) probability density distribution of the saturation current ( $I_{d\_sat}$ ); (d) probability density distribution of the threshold voltage ( $V_{th}$ ).

Figure 5(c). To test the validity of the KNN method for the C-V characteristics of the MoS<sub>2</sub> MOSFET, the data of the MoS<sub>2</sub> MOSFET with W/L (30/20) are fitted. The model results are presented in Figure 5(d), which agree well with the experimental results. The geometric parameters, such as the length and width, may vary to some extent due to fabrication process variations and lead to the variation of the device's electrical characteristics. The variations of the electrical characteristics with respect to the skewness of the geometric parameters are modeled with the KNN method, and the results are compared with other experimental results [24, 25]. A total of 30 samples are used, and the skew ratios of W, L, threshold voltage ( $V_{\rm th}$ ), and carrier mobility ( $\mu_n$ ) are 1%.

Figure 6 statistically compares the modeling and experimental data for  $MoS_2FETs$ . Figure 6(a) shows the optical photograph of  $MoS_2FETs$  with W/L of 30/20. Figure 6(b) presents the experimental data and KNN predicted results of  $I_{ds}$ - $V_{gs}$  characteristics for 30 MoS<sub>2</sub>MOSFETs. Here,  $W/L(\mu m/\mu m)$  of 30/20 is out of the KNN training data range. The  $I_{ds}$ - $V_{gs}$  characteristics modeled by the KNN match quite well with the experimental data when W and L are assumed to be the variation sources. The probability density function represents the random variable's probability distribution, and the expectation and deviation values are represented with  $\lambda$  and  $\sigma$ , respectively. As revealed in Figures 6(c) and (d), the uniformity of the saturation current  $I_{\rm d,sat}$  and threshold voltage  $V_{\rm th}$  predicted with the KNN method are compared with those from the experiments. The KNN predicts that the average  $I_{d_{sat}}$  of the MoS<sub>2</sub>MOSFETs is  $0.1683 \,\mu\text{A}$  with a deviation of  $0.0125 \,\mu\text{A}$ , while the experiments show that the average  $I_{d\_sat}$  is  $0.1679 \,\mu\text{A}$ with a deviation of 0.0180  $\mu$ A. As for  $V_{\rm th}$  of the MoS<sub>2</sub> MOSFETs, the KNN prediction and experiments for the average values are 1.365 and 1.348 V, and those for the deviation values are 0.058 and 0.041 V, respectively. The average values of  $I_{d\_sat}$  and  $V_{th}$  predicted with the KNN are close to those of the experiments, and the errors are less than 2%. However, the modeling results for the deviation values of  $I_{d\_sat}$ and  $V_{\rm th}$  do not match well with the experiment. The reason may be that other variation sources, such as the Schottky barrier and gate dielectric uniformity, are not considered in the current KNN modeling method [26, 27].

Furthermore, statistical comparisons between the model and experiments are performed for logic circuits based on  $MoS_2$  FETs. The optical photograph and schematic views of the  $MoS_2$  inverter are shown

Qi G D, et al. Sci China Inf Sci February 2023 Vol. 66 122405:8



Figure 7 (Color online) Inverter based on  $MoS_2$  FETs: (a) optical photograph, (b) schematic, and (c) simulation and experimental VTC results of 30 samples. (d) Probability density vs. switching threshold voltages of the same samples. Five-stage  $MoS_2$  ring oscillator: (e) optical photograph, (f) schematic, and (g) simulation and experimental output waveforms.



Figure 8 (Color online) Optical photograph, truth tables, measurement, and simulation results (results\_m: measurement results (in yellow); results\_s: simulation results (in green)) for four basic logic circuits. (a) NAND, (b) NOR, (c) DFF, and (d) HALF-ADDER.

in Figures 7(a) and (b). The transfer characteristics are generated with 30 samples from the experiments and compared with the simulation outputs from the Monte Carlo simulations based on the KNN models, as shown in Figure 7(c), which show good agreement. Figure 7(d) shows the statistical distribution of the switching threshold voltages (STVs). The simulated average value of the STV is 0.637 V, while that of the experiments is 0.567 V. The deviation values of the STV for the simulation and experiments are 0.123 and 0.117 V, respectively. This difference is possibly caused by variation sources, such as gate dielectric variations, in the experiment that has not been counted in the model. Figures 7(e) and (f) show the

optical photograph and schematic views of the five-stage  $MoS_2$  RO. The output results generated with the experiment and simulation are shown in Figure 7(g). The experiments show that the frequency of the RO circuit is 19.5 kHz, while the simulated frequency of the circuit is 19.9 kHz, which is in good agreement. This finding validates the accuracy of the KNN device model.

To pave the way for the  $MoS_2$  FETs and corresponding KNN models for more general digital circuits designs, four basic logic cells based on the  $MoS_2$  FETs are fabricated and simulated with KNN in this work, as shown in Figure 8. Figures 8(a)–(d) show the optical photograph, truth tables, and measured and simulated waveforms of the NAND, NOR, D flip-flop (DFF) cell, and HALF-ADDER circuits. The simulation results agree well with the experimental data for all cases, which further validates that the KNN model can be used for future 2D material FET circuit simulation and design efforts.

### 4 Conclusion

In this work, a device compact model based on the KNN methodology was proposed. The KNN combines the physical behavior of device geometrical scaling with the ANN fitting capability. This methodology can greatly improve the accuracy and scalability of the TNN method and further enable accurate statistical simulations due to geometrical variations. First, the KNN methodology is validated with silicon MOSFET data generated with the industrial standard BSIM6. Then, it is validated against novel  $MoS_2$  FETs and circuits. All the model results agree well with the experiments. The results demonstrate that the KNN can not only capture the electrical characteristics of devices and circuits in great precision but also be suitable for statistical analysis using Monte Carlo simulations. This work provides a feasible solution for fast, compact modeling of novel semiconductor devices, which may facilitate the joint optimization of the early-stage circuit design and process technology.

Acknowledgements This work was supported in part by National Key Research and Development Program (Grant No. 2021YFA-1200500), Innovation Program of Shanghai Municipal Education Commission (Grant No. 2021-01-07-00-07-E00077), Shanghai Municipal Science and Technology Commission (Grant No. 21DZ1100900), Shanghai Pujiang Program (Grant No. 20PJ1400900), Natural Science Foundation of Shanghai (Grant No. 22ZR1403500), and Young Scientist Project of MOE Innovation Platform.

#### References

- 1 Kuhn K. Variability in nanoscale CMOS technology. Sci China Inf Sci, 2011, 54: 936–945
- 2 Wang J, Kim Y H, Ryu J, et al. Artificial neural network-based compact modeling methodology for advanced transistors. IEEE Trans Electron Devices, 2021, 68: 1318–1325
- 3 Yang Q H, Qi G D, Gan W Z, et al. Transistor compact model based on multigradient neural network and its application in SPICE circuit simulations for gate-all-around Si cold source FETs. IEEE Trans Electron Devices, 2021, 68: 4181–4188
- 4 Xu J J, Yagoub M C E, Ding R T, et al. Exact adjoint sensitivity analysis for neural-based microwave modeling and design. IEEE Trans Microwave Theor Techn, 2003, 51: 226–237
- 5 Abo-Elhadeed A F. Modeling ballistic double gate MOSFETs using neural networks approach. In: Proceedings of the 8th Spanish Conference on Electron Devices, 2011. 1–4
- 6 Fang M, He J, Zhang X K, et al. Neural network method to model nanoscale MOSFET characteristics. J Comput Theor Nanosci, 2012, 9: 2037–2041
- 7 Lamamra K, Berrah S. Modeling of MOSFET transistor by MLP Neural Networks. In: Proceedings of International Conference on Electrical Engineering and Control Applications, 2017. 407–415
- 8 Martinie S, Le Carval G, Munteanu D, et al. Impact of ballistic and quasi-ballistic transport on performances of double-gate MOSFET-based circuits. IEEE Trans Electron Dev, 2008, 55: 2443–2453
- 9 Natori K. Ballistic metal-oxide-semiconductor field effect transistor. J Appl Phys, 1994, 76: 4879–4890
- 10 Agarwal H, Gupta C, Dey S, et al. Anomalous transconductance in long channel halo implanted MOSFETs: analysis and modeling. IEEE Trans Electron Dev, 2017, 64: 376–383
- 11 Aikawa H, Sanuki T, Sakata A, et al. Compact model for layout dependent variability. In: Proceedings of IEEE International Electron Devices Meeting, 2009. 1–4
- 12 Choi Y S, Lian G, Vartuli C, et al. Layout variation effects in advanced MOSFETs: STI-induced embedded SiGe strain relaxation and dual-stress-liner boundary proximity effect. IEEE Trans Electron Dev, 2010, 57: 2886–2891
- 13 Frank D J, Laux S E, Fischetti M V. Monte Carlo simulation of a 30 nm dual-gate MOSFET: how short can Si go? In: Proceedings of International Technical Digest on Electron Devices Meeting, 1992. 553–556
- 14 Chow J C L, Leung M K K. Monte Carlo simulation of MOSFET dosimeter for electron backscatter using the GEANT4 code. Med Phys, 2008, 35: 2383–2390
- 15 Desai S B, Madhvapathy S R, Sachid A B, et al. MoS<sub>2</sub> transistors with 1-nanometer gate lengths. Science, 2016, 354: 99–102
- 16 Theis T N, Solomon P M. It's time to reinvent the transistor! Science, 2010, 327: 1600–1601
- 17 Franklin A D. Nanomaterials in transistors: from high-performance to thin-film applications. Science, 2015, 349: 2750

- 18 Lundstrom M. Moore's law forever? Science, 2003, 299: 210–211
- 19 Yu L, El-Damak D, Radhakrishna U, et al. Design, modeling, and fabrication of chemical vapor deposition grown MoS<sub>2</sub> circuits with E-mode FETs for large-area electronics. Nano Lett, 2016, 16: 6349–6356
- 20 Chen X Y, Xie Y F, Sheng Y C, et al. Wafer-scale functional circuits based on two dimensional semiconductors with fabrication optimized by machine learning. Nat Commun, 2021, 12: 5953
- 21 Ma S L, Wu T X, Chen X Y, et al. An artificial neural network chip based on two-dimensional semiconductor. Sci Bull, 2022, 67: 270–277
- 22 Li X F, Gao T T, Wu Y Q. Development of two-dimensional materials for electronic applications. Sci China Inf Sci, 2016, 59: 061405
- 23 Tang H W, Zhang H M, Chen X Y, et al. Recent progress in devices and circuits based on wafer-scale transition metal dichalcogenides. Sci China Inf Sci, 2019, 62: 220401
- 24 Wang R S, Yu T, Huang R, et al. Impacts of short-channel effects on the random threshold voltage variation in nanoscale transistors. Sci China Inf Sci, 2013, 56: 062403
- 25 Takeuchi K, Fukai T, Tsunomura T, et al. Understanding random threshold voltage fluctuation by comparing multiple fabs and technologies. In: Proceedings of IEEE International Electron Devices Meeting, 2007. 467–470
- 26 Chen J R, Odenthal P M, Swartz A G, et al. Control of Schottky barriers in single layer MoS<sub>2</sub> transistors with ferromagnetic contacts. Nano Lett, 2013, 13: 3106–3110
- 27 Kaushik N, Nipane A, Basheer F, et al. Schottky barrier heights for Au and Pd contacts to MoS<sub>2</sub>. Appl Phys Lett, 2014, 105: 113505