6.1 Overview
6.2 Meshing Controls
6.3 Numeric Controls
6.4 Linear Solvers
6.5 Parallelization
6.6 Understanding the Log File
In addition to 2D simulation capabilities, Sentaurus Device is designed to simulate the device thermoelectrical behavior for native 2D and 3D cylindrical device geometries.
Typically, 3D device simulations are more computationally expensive than 2D ones, which require special solutions to be applied to make them running more efficiently.
Some of these solutions are:
The corresponding example, where most of the solutions described in this section are applied, can be found in the Applications_Library/GettingStarted/sdevice/3Ddiode_demo directory.
Mesh generation, which produces grids suitable for 3D device simulations, might be not an easy task. See Section 2. Using the Axis-Aligned Mesh Generator for detailed descriptions of different aspects of mesh generation using Sentaurus Mesh.
This section summarizes some recommendations for those who are starting with 3D mesh generation:
AxisAligned { yCuts=(1.03, 3.09, 5.15, 7.21, 9.27) }
The high-level Sentaurus Structure Editor command to produce the corresponding Delaunizer section is shown here:
(sdesnmesh:delaunizer "maxPoints" 1e6 "maxConnectivity" 100 "delaunayTolerance" 0. "sliverRemovalAlgorithm" 2 "storeDelaunayWeight" #t )
Numeric controls can be split into different groups:
The first two controls are discussed here. The last two controls are discussed in Section 6.4 Linear Solvers and Section 6.5 Parallelization.
The full list of controls is demonstrated in the sdevice_des.cmd file from the Applications_Library/GettingStarted/sdevice/3Ddiode_demo project.
Math { Transient = BE eMobilityAveraging = ElementEdge hMobilityAveraging = ElementEdge CDensityMin = 0. ElementVolumeAvalanche AvalFlatElementExclusion = 1. ParallelToInterfaceInBoundaryLayer(FullLayer -ExternalBoundary) ComputeGradQuasiFermiAtContacts= UseQuasiFermi BM_ExtendedPrecision WeightedVoronoiBox AutoCNPMinStepFactor = 0 AutoNPMinStepFactor = 0 -PlotLoadable SimStats ExitOnFailure ... }
The general controls are:
Use this keyword with caution because, for large angle values, some "good" elements could be filtered, which can lead to a breakdown voltage overestimation. The typically suggested value should not exceed 1° or 2°.
Math { ... Digits = 5 ErrRef(electron) = 1e8 ErrRef(hole) = 1e8 Iterations = 10 NotDamped = 100 RHSMin = 1e-8 EquilibriumSolution(Iterations=100) Extrapolate RefDens_eGradQuasiFermi_ElectricField = 1e8 RefDens_hGradQuasiFermi_ElectricField = 1e8 }
For more details about the nonlinear Newton solver, see Section 9. Nonlinear Newton Solver. The most important controls are discussed here:
Specifying Digits=5 should be sufficient for most cases. For wide-bandgap device simulations, smaller values for carrier densities ErrRef controls should be used due to much smaller intrinsic carrier densities in those materials.
If more iterations are required for a specific task, such as solving
the initial guess, the number of iterations can be explicitly specified as a
Coupled command option:
Coupled (iterations=100) { Poisson }
Having an electrode crossing a p-n junction or an heterointerface can lead to an incorrect solution along such a contact due to the charge neutrality condition imposed at Ohmic electrodes. To avoid this, the EqOhmic option can be specified for an electrode, which instructs the simulator not to impose the charge neutrality condition. This option is useful for both 2D and 3D device simulations.
If the EqOhmic option is activated for the electrode:Electrode { { Name="source" Voltage= 0.0 EqOhmic} }Sentaurus Device implicitly solves the Poisson equation at the very beginning to produce the equilibrium state. This usually requires a much larger number of iterations than specified by Iterations. To ensure there are enough iterations allowed to solve the initial condition, the EquilibriumSolution command should be specified with the dedicated options (such as Iterations, LineSearchDamping, and Digits) relevant to this command only.
Quasistationary ( Goal { ... } Extrapolate ) { Coupled {Poisson Electron Hole} }
For relatively large reference densities, with values greater than ni(T), you might expect a breakdown voltage shift due to the difference between the actual gradQuasiFermi and the electric-field driving-force values, which enter the avalanche generation term computation. To avoid this, apply driving force smoothing to the high-field velocity saturation model only, which is activated using the RefDens_e|hGradQuasiFermi_ElectricField_HFS command options.
While running a simulation on a real or virtual timescale, Sentaurus Device uses a dedicated strategy to control the timescale resolution. Two scaling factors are used for this, Increment and Decrement, as shown in Figure 7.
It is recommended to apply a smaller value for Increment than used for Decrement, to avoid back and forth time-step jumping in the case of nonconvergence.
For transient simulations, Increment or Decrement factor values are used exactly as specified in the Transient section:
Transient (
InitialTime=0 FinalTime=100e-11
InitialStep=1e-11 Increment= 1.4 Decrement= 2
MaxStep=5e-11 MinStep=1e-15
){ Coupled{ Poisson Electron Hole } }
By default, the same value (2) is used for both Increment and Decrement.
For steady-state simulations, the Increment value also depends on the number of iterations in a previous step, according to the following formula:
Therefore, you might observe different time-step values reported in the log file, compared to what has been specified in the command file.
Different linear solvers are available in Sentaurus Device:
The keyword Method specifies which solver Sentaurus Device will use. By default, the SUPER linear solver is used. Table 1 summarizes different aspects of each linear solver.
Solver | Type | Thread parallelism | Extended precision support | Recommended for | Speciality |
---|---|---|---|---|---|
Blocked | Not a real solver, but a strategy to solve complex matrices in mixed mode | no | – | Circuit and mixed-mode device simulations | Mixed-mode circuit and device simulations |
SUPER | Direct, sparse matrices | no | yes | Medium 2D tasks, wide-bandgap devices | Robust, very accurate, no pivoting |
PARDISO | Direct, sparse matrices | yes | yes* | Large 2D devices, small- and medium-sized 3D silicon devices | ~2 times faster than SUPER, well parallelized |
ILS | Iterative, sparse matrices | yes | yes | Large 2D and 3D tasks, wide-bandgap devices | Adjustable behavior, slight superlinear parallel scaling |
*Combining extended-precision arithmetic and parallelization degrades the performance of PARDISO because the BLAS3 fast matrix multiplication library (only available for the standard precision arithmetic) is not applied in this case.
Math { Method= Blocked SubMethod= ILS(set=1) ACMethod= Blocked ACSubMethod= ILS(set=2) PrintLinearSolver }
Use the Blocked solver (Method=Blocked) when single or multiple numerically simulated 2D or 3D devices are combined together within a circuit. This is called the mixed-mode device operation, which is described in Section 3. Mixed Mode.
The concept of block decomposition is illustrated in Figure 8. Here, two devices (Device 1 and Device 2) are simulated numerically and coupled using the circuit. The full matrix is decomposed into blocks, which include the sparse matrices of the numerically simulated devices ("1" and "2") coupled to the circuit matrix "c" using the "a" and "b" matrices. Each "1" and "2" matrix is solved using the method specified by the SubMethod keyword, and the circuit matrix "c" is solved by the internal direct solver.
Use the option PrintLinearSolver of the Math section to obtain additional information regarding the linear solvers in the log file.
To specify the SUPER direct linear solver for single-device simulations, use:
Math { Method= Super }
To specify SUPER for mixed-mode device simulations, use:
Math { Method= Blocked SubMethod= Super }
If no solver is specified in the Math section, the SUPER solver will be used. The default solver settings for SUPER are optimized to address most standard simulation needs. Modifying the direct solver settings is not recommended.
For direct linear solvers, the approximate memory footprint M(N) increases with the number of nodes N between ~ c x N x log(N) (ideal case) and ~ c x N3/2 (realistic cases). Correspondingly, the number of arithmetic operations A(N) increases with the number of nodes N between ~ c x N3/2 (ideal case) and ~ c x N5/2 (realistic cases). The proportionality constant c reflects specific properties of the mesh structure and equations (that is, a linear system matrix portrait).
For 2D tasks, especially wide-bandgap device simulations, SUPER provides the best overall solution performance. It is generally not suitable for 3D device simulations because it lacks parallelization, and memory consumption drastically increases with the increased node count.
To specify the Parallel Direct Solver (PARDISO) for single-device simulations, use:
Math { Method= ParDiSo[(options)] }
To specify PARDISO for mixed-mode device simulations, use:
Math { Method= Blocked SubMethod= ParDiSo[(options)] }
Unlike SUPER, PARDISO takes full advantage of shared-memory multiprocessor computers while being run on multiple CPUs in parallel. The default solver settings for PARDISO are optimized to address most standard simulation needs.
Several customizations are available for PARDISO (see Table 2).
Option | Description | Default |
---|---|---|
NonsymmetricPermutation | Computes initial nonsymmetric matrix permutation and scaling, which places large matrix entries on the diagonal. | on |
IterativeRefinement | Performs up to two iterative refinement steps to improve the accuracy of the solution. | off |
MultipleRHS | Solves linear systems with multiple right-hand sides. This option applies to AC analysis only. | off |
RecomputeNonsymmetricPermutation | Computes nonsymmetric matrix permutation and scaling before each factorization. | off |
Using -NonsymmetricPermutation results in better speed but less accuracy and is not recommended.
Switching on IterativeRefinement and RecomputeNonsymmetricPermutation targets better accuracy, but typically slows down the system solve.
Running in single-CPU mode, PARDISO outperforms SUPER with respect to runtime and memory requirements. Therefore, you can consider it for standard 2D simulation tasks. For tasks that require higher accuracy (for example, wide-bandgap device simulations), these customization options might be required, which make SUPER a more favorable choice.
Having almost superlinear parallel scaling capabilities, PARDISO is also well suited for medium-sized 3D device simulations. You might expect up to a ~12 times speedup factor with PARDISO while running 3D simulations in parallel on a machine, equipped with 16 or more CPUs.
To specify the Iterative Linear Solver (ILS) for single-device simulations, use:
Math { Method= ILS[(set=<n>)] [ILSrc= "set(<n>) {...};"] }
To specify ILS for mixed-mode device simulations, use:
Math { Method= Blocked SubMethod= ILS[(set=<n>)] [ILSrc= "set(<n>) {...};"] }
Here, <n> indicates the set number, which is referred to in the subsequent ILSrc statement.
Compared to the direct method, the iterative approach reduces memory requirements for large simulation tasks dramatically. Therefore, ILS is the preferred solver for device simulations where the mesh node count exceeds the limit suitable for direct linear solvers.
ILS operation consist of the following steps:
Each step is controlled by different parameters, which are combined within groups. Some dedicated parameter sets, which target different applications, are hardcoded and do not actually need an explicit set specification as shown in Figure 9. To activate any of these sets, specify the set number, for example:
Math { Method= ILS(set=2) }
In addition, you can define custom sets using the ILSrc keyword. Table 3 refers to several ILS sets, which were established for different applications. To use these sets, they must be defined explicitly using the ILSrc command in the Math section.
Application | ILSrc set |
---|---|
3D device off-state breakdown | ILSrc= "set (5) { iterative (gmres(150), tolrel=1e-10, maxit=300); preconditioning(ilut(1e-5,-1),left); ordering(symmetric=nd, nonsymmetric=mpsilst); options(verbose=0, refineresidual=10); };" |
SiC device simulations | ILSrc= "set (6) { iterative(gmres(100), tolrel=1e-10, tolunprec=1e-4, tolabs=0, maxit=200); preconditioning(ilut(1.5e-6,-1), left); ordering(symmetric=nd, nonsymmetric=mpsilst); options(compact=yes, linscale=0, refineresidual=10, verbose=0); };" |
III–N device simulations | ILSrc= "set (7) { iterative(gmres(100), tolrel=1e-12, tolunprec=1e-8, tolabs=0, maxit=200); preconditioning(ilut(1e-8,-1),left); ordering ( symmetric=nd, nonsymmetric=mpsilst ); options( compact=yes, linscale=0, refineresidual=10, verbose=0); }; };" |
The most relevant parameters for adjusting ILS behavior are:
preconditioning(ilut(ε,-1), left);Elements smaller than ε are removed during elimination. Smaller values of ε lead to denser triangular factors L and U of the preconditioner, which will increase the accuracy of the preconditioning step. Consequently, there is a cost and accuracy trade-off when selecting the ε value.
iterative (gmres(150), tolrel=1e-10, maxit=300);GMRES stores a given number of backvectors. If the solution does not converge within a given restart number, GMRES restarts the iteration. A higher number of backvectors improves convergence but increases memory use.
iterative (gmres(150, tolrel=1e-10, maxit=300));The iterative solver stops when the residual error becomes less than a given normalized value or if the maximum number of iterations is reached.
options(verbose=0, refineresidual=10);Convergence can be further improved by forcing the iterative solver to have more iterations after the stopping criteria are fulfilled.
Math { NumberOfThreads= <n> ParallelLicense (Wait) Wallclock }
Sentaurus Device uses thread parallelism to accelerate simulations on shared-memory computers. Table 4 lists the parallelized components of Sentaurus Device.
Task | Description |
---|---|
Box discretization | Computation of box method coefficients. |
Matrix assembly | Computation of mobility, avalanche, current density, energy flux; assembly of Poisson, continuity, lattice, and temperature equations. |
Linear system solve | PARDISO and ILS solvers contribute to thread parallelism. |
The value of the keyword NumberOfThreads declares the number of threads to be allocated and used for parallelization. It must not exceed the physical number of CPUs available on a machine or a cluster node where the simulation is run.
Do not use NumberOfThreads= maximum, which will allocate all CPUs on a machine. At least one CPU must remain free to allow the system task to be run. Otherwise, instead of a speedup, performance degradation will be observed.
The ParallelLicense (Wait) command instructs the simulator to wait until a parallel license becomes available.
To display wallclock times (the real time between process invocation and termination) rather than CPU times, use the Wallclock option.
One parallel license supports parallelization on four CPUs. Therefore, to run on 16 CPUs, for example, four parallel licenses are required.
To investigate different task contributions to the overall speedup, their parallelization can be independently enabled using the following commands:
Math { NumberOfAssemblyThreads = <n> NumberOfSolverThreads = <n> }
To see how parallelization contributes to a simulation speedup, run a simulation in single-thread mode first, and then rerun it in parallel mode using a value higher than one assigned using NumberOfThreads. Then, feed both log files to the sdevicestat stand-alone routine, which will produce cumulative reports, showing the accumulated times spent for each particular task, for example:
> sdevicestat n123_des.log ... Total CPU-time : 118.04 s Rhs-time : 46.92 % ( 55.39 s ) Jacobian-time : 8.38 % ( 9.89 s ) Solve-time : 43.63 % ( 51.50 s )
Figure 10 explains the meaning of the speedup estimations shown in Figure 11.
Figure 11 shows the comparison of 3D simulation results for a structure with ~1M mesh nodes, running in parallel on different numbers of CPUs using PARDISO and ILS.
To run Sentaurus Device in parallel mode inside Sentaurus Workbench, the automatic allocation feature is available, which allows you to automatically synchronize the number of requested parallel slots between a simulation job and a scheduler.
To activate this feature, the following command must be placed in a user tooldb.tcl file:
set WB_tool(sdevice,parallel,mode) tool
For more details about this Sentaurus Workbench feature, see Section 2.3 Running Projects.
When starting your 3D simulation for the first time, it is useful to look in the log file to better understand the details of the mesh on which you run your device simulations.
All the commands you specified in the Math section of the command file should be reflected in the log file:
Model: Math Switched on Parameter: WeightedVoronoiBox = on Parameter: BM_ExtendedPrecision = on Parameter: AutoNPMinStepFactor = 0 Parameter: AutoCNPMinStepFactor = 0 Model: SimStats Switched on Parameter: ElementVolumeAvalanche = on Parameter: AvalFlatElementExclusion = 1 deg Parameter: ComputeGradQuasiFermiAtContacts = UseQuasiFermi Parameter: RefDens_eGradQuasiFermi_ElectricField = 100000000 cm^-3 Parameter: RefDens_hGradQuasiFermi_ElectricField = 100000000 cm^-3
At the beginning of the simulation, the BoxMethod analyzes the input grid and prints out detailed information about it:
NumberOfVertices = 10935 NumberOfElements = 50688 NumberOfTetrahedrons = 50688 (100.0 %) NumberOfObtuseElements = 16896 (33.33 %) NumberOfNonDelaunayElements = 0 ( 0.00 %)
The most important indication of the 3D mesh quality is how the generated mesh fulfills the Delaunay criterion. The most important criteria are the DeltaVolume, which indicates the difference between the real physical device volume and the volume computed by BoxMethod, and the number of non-Delaunay mesh elements, which are reported for each region inside the device independently. It is important to have as small as possible DeltaVolume values for semiconductor regions and as few as possible non-Delaunay elements.
For more details, see Section 7. Sentaurus Device at Runtime.
Copyright © 2017 Synopsys, Inc. All rights reserved.