#### FUTURE OF SEMICONDUCTOR HARDWARE ASCENT

Applications and Systems Driven Center on Energy Efficient Integrated Nanoelectronics

**Suman Datta**, Sayeef Salahuddin, Muhannad Bakir, Jeffrey Bokor, Kyeongjae Cho, Supriyo Datta, Patrick Fay, Steve George, Ken Goodson, Adam Hock, Sharon Hu, Subramanian Iyer, Debdeep Jena, Siddharth Joshi, Asif Khan, Andy Kummel, Umesh Mishra, Azad Naeemi, Michael Niemier, Eric Pop, Ramesh Ramamoorthy, Dan Ralph, Arijit Raychowdhury, Darrell Schlom, Madhavan Swaminathan, Jianping Wang, Shan Wang, Chuck Winter, Peide Ye, and Shimeng Yu



#### **Compute Performance Gain Comes from**



60% of compute performance gain over last decade came from process technology advancement

Source: Lisa Su (AMD), ERI DARPA Summit, 2019





Scaling continues enabled by innovation in materials, devices, process integration & DTCO

S. Salahuddin, K. Ni and S. Datta, "The era of hyper-scaling in electronics," Nature Electronics, 2018



#### **Memory Scaling**



Scaling continues enabled by devices, structures and process integration innovation

K. Kim, "The Past, Present, and Future," IEDM 2021



#### **Roofline Performance**



Arithmetic Intensity = FLOPS / Bytes (moved)

- **Compute Bound**: logic transistor performance, improve wire RC, stack more layers with high intertier via density
- **Memory bandwidth Bound**: Internal: Memory layer stacking with high TSV density, External: core to HBM interconnect using Si interposer

C Yang, T Kurth, S Williams "Hierarchical Roofline analysis for GPUs:.....", Concurrency and Computation: Practice and Experience (CCPE), Aug 2019



## Go verticalAugment chargeEmbraceComputewith spinHeterogeneitywith memory



Design, build and benchmark semiconductor prototypes to establish technology roadmap



#### **BEOL Transistors (n-type)**

#### **Monolithic 3D**



#### Integration at

- Transistor level
- Gate level
- Block level

*Highlights:* ML MoS<sub>2</sub> growth on trench sidewall below 550C; Rc ~ 190  $\Omega\mu$ m (In/Au) and 220  $\Omega\mu$ m (Sn/Au)

**2D Materials** 

Monolayer MoS<sub>2</sub> nFET

))

Au

х

X = In or Sn

SiO<sub>2</sub> (50 nm)

25 nm

10 nm

Challenges: Defect control in ML TMD and dielectric interface

#### **Amorphous Oxide Semiconductors**



Highlights:  $L_{\rm G}$  = 50nm, EOT= 0.8nm, SS 67 mV/dec,  $R_{\rm C} \approx 500 \ \Omega \cdot \mu m$ ,  $I_{\rm D} \approx 720 \ \mu A/\mu m$ ; improved V<sub>T</sub> stability

> Applications and Systems Driven Center Energy-Efficient Integrated Nanotechnology

Challenges: Defect control in oxide thin MRS

#### BEOL Memory (Monolithic 3D eDRAM)

2T (capacitorless) DRAM memory with sub femto ampere loff BEOL oxide FETs

#### BEOL memory Thermal management Dense interconnect & via BEOL logic FEOL logic



Integration at

Transistor level

**Monolithic 3D** 

- Gate level
- Block level

Highlights: Demonstration of 2T gain cell embedded DRAM (eDRAM) exhibiting (a) cell level leakage current of  $\sim 1 \times 10^{-15}$  A/µm and  $\sim 1 \times 10^{-14}$  A/µm at 25C and 85C

Challenges: Stability/Variability of IWO FETs without affecting mobility



#### **BEOL Memory (Monolithic 3D TCAM)**

#### 2 layers of Ferroelectric FETs as TCAMs for few shot learning

Applications and Systems Driven Center to Energy-Efficient Integrated Nagotechnologie



#### Integration at

- Transistor level
- Gate level
- Block level

Highlights: Demonstration of M3D TCAM array demonstrating in situ compute of Hamming distance and 3-way 3-shot learning with 20-bit feature vectors

Challenges: Fabricate arrays for larger statistics; SOC compatible voltage; Vt drift

#### **BEOL Chiplets**

**Polylithic 3D** 

#### **3D Integrated Chiplets Encapsulation (3D ICE)**

#### Transfer of SiO2-reconstituted-tier on a glass wafer



43.V 11 Anno60 SEAN

SiO<sub>2</sub> timm x timm





'Sea of Chiplets' of varying dimensions after encapsulation

 $SiO_2$ -reconstituted tier with throughoxide vias (TOV) and RDL

- Fills gap between M3D and heterogeneous packaging
- SiO<sub>2</sub>-reconstituted Tier for BEOLlevel polylithic integration

Highlights: Demonstrated  $SiO_2$ -reconstituted Tier with via and RDL transferred onto glass substrate.

Challenges: Staying competitive with industrial 3DIC approaches; thermal management

#### Augment Charge with Spin

0.05

0.005

Cell Area [um<sup>2</sup>]

Speed

Reg.

SRAM

Cache eDRAM

DRAM

X-point

NAND/HDD

Size

#### Beyond STT-MRAM



Noriyuki Sato, Ian Young (Intel) 2020 VLSI Symposium

Explore conventional and new symmetry materials to switch magnet (by current / voltage) more efficiently than STT

#### Spin Orbit Torque (SOT) Memory



neroy-Efficient Integrated N

spin torque efficiency,  $\xi_{\text{DL}}$ 

#### **Embrace Heterogeneity**



#### Year

Doug Yu (TSMC) 2019 IEDM Evening Panel

It may prove to be more economical to build large systems out of smaller functions, which are separately packaged and interconnected - Gordon Moore, 1965

#### **Embrace Heterogeneity**

#### Heterogeneous fabric (ASCENT) Today's HI trends SolC™ WoW Antenna Arrav Heterogeneous Integration Chip 1 in Polymer using Glass Interposer & ..... · · · · · · Passives in Polymer - --Chip 2 (filters, Butler, Fan-Out Panel Level couplers, MS, CPW ...) CoWoS Packaging InFO Embedded die in Glass (beamformer Logic/ Logic/ Logic PA, switch, SIW, Logic/ нвм Logic Polymer CBDW...) HRM COCCOCCCCCCCC Glass core Substrate LNA Switch Baseband PA Power BGA Flip chip ((( Thermal Si Interposer/PCB IF 💔 Package integrated 48-to-1V voltage Horizontal HBM HBM chip2 chip-to-chip chip1 SolC<sup>®</sup> Bonding converter & regulator for data interconnect - dbin3 Silicon/RDL TSV Si Control NGaN Si Control uBump NGaN Interposer Cin MOSFETs Gate Driver Cout Cin MOSFETs Gate Driver Cout Cin MOSFETs Gate Driver Cout TSV Bump Substrate - BGA Smaller bump/bond pitch Larger package size 3D integration

Applications and Systems Driven Center for Energy-Efficient Integrated Nanotechnologies

#### **Compute with Memory**



Explore specialized hardware for solving computationally hard problems



Eneroy-Efficient Interrated Na

#### The future of data-centric compute











in

#### www.src.org/program/jump

Semiconductor Research Corporation



SIA-SRC Webinar on the Collaboration towards Decadal Plan Goals: Advances and Challenges in Semiconductor Hardware

#### Successes and Learnings from JUMP ComSenTer

Dr. Farhana Sheikh, Intel Corporation

intel



## Legal Notices & Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

All product and service plans, and roadmaps are subject to change without notice. Any forecasts of goods and services needed for Intel's operations are provided for discussion purposes only. Intel will have no liability to make any purchase in connection with forecasts published in this document. Code names are often used by Intel to identify products, technologies, or services that are in development and usage may change over time. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

Statements in this document that refer to future plans or expectations are forward-looking statements. These statements are based on current expectations and involve many risks and uncertainties that could cause actual results to differ materially from those expressed or implied in such statements. For more information on the factors that could cause actual results to differ materially, see our most recent earnings release and SEC filings at <u>www.intc.com</u>.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. This document contains information on products and/or processes in development.

Intel

## Tech has never been more important to hum

#### Computing has become pervasive, and the entire world is becoming

*Revolutionary technologies to address high-frequency radar, sensing, and communications: systems-to-circuits, beyond-CMOS devices, CMOS optimizations for sub-THz circuits – convergence of communications, sensing, and compute* 

#### ComSenTer System Highlights:

- 140GHz Channel measurements up to 100m in urban environments, 1GHz BW
- Digital MU-MIMO: low-cost, energy-efficient digital beamforming  $\rightarrow$  adaptability
- Systems-to-circuits modeling and optimization: 4X to 5X power reduction

#### ComSenTer Circuits Highlights:

- CMOS-based PA with highest power & PAE at 140GHz record-setting work
- CMOS-based D-band phased array with 13Gb/s & integration into COTS system
- 32-element DBF ASIC + SERDES + ADC-DAC array + baseband
- GaN PA at 140GHz with >24dBm output, 10dB gain; 220GHz InP PA: 30% PAE

#### ComSenTer Devices Highlights:

- N-Polar GaN emergence as front-runner candidate for D-band with >42% PAE @94GHz
- Successful growth of diamond at reduced temperature: 400°C
- AIN/GaN transistors healthy and showing promise

#### ComSenTer Demonstration Highlights:

- Massive MIMO COTS + ASICs demo @140GHz
- 140GHz massive MIMO demo with Samsung

Additional information posted on src.org under JUMP ComSenTer Digitize Network Everything Vetwork Everything

2000

2010

2020



SRC-SIA Webinar: SIA-SRC Webinar on the Collaboration towards Decadal Plan Goals: Advances and Challenges in Semiconductor Hardware, June 23 2022

1980

1990

## Heterogeneity and Modularity: Systems Integration

Are we ready for a new wave in Semiconductor Technologies & Education?



Convergence of communications, sensing, and compute: End-to-end integration of heterogeneous materials/devices, technologies, circuits, and platforms



- Optimized systems integration → Platform Centric Design focusing on Functional Density High-frequency RF (III-V, SiGe, CMOS) + Analog/Mixed-Signal + Digital + Configurable Arrays with Domain Specific Compute + Advanced memory architectures
- 2. Security + Intelligence -> Design and integration from Day 1: integral part of algorithms, architecture, ...
- 3. Optimized packaging + 2.5D/3D Chiplet based devices, circuits, & architectures → Interface I/O power can be high Need for 2.5D / 3D and multi-die heterogeneous integration at nano, micro, macro levels for comms and sensing Optimal partitioning and automated design methodologies/flows
- 4. Work force development in emerging sub-THz comms/sensing/imaging with focus on diversity Build technical leadership broadly across USA schools in devices, circuits, packaging, & systems: multi-disciplinary workforce → materials and devices + circuits and tapeout + software + systems/architecture classes

"Don't be encumbered by history. Go off and do something wonderful."



# 



#### Advances and Challenges in Semiconductor Hardware Madhavan Swaminathan, Georgia Tech



□ Global semiconductor industry projected to become a trillion-dollar industry by 2030 (Source: McKinsey & Company)

- 55 years to become a \$0.5 Trillion industry
- 10 years to become a \$1.0 Trillion industry
- Drivers & Challenges
  - Compute & Storage (Deep Learning: 300X "brute force" System Scale Out since 2012 leading to energy crisis)
  - Wireless (Single autonomous car expected to produce 4000 Gigabytes of data per day)

□ Need Energy Efficient solutions (femto-Joules/bit) with Large Bandwidth Density (500TBps/mm<sup>2</sup>) in the future



With Moore's law slowing down, need **Heterogeneous Integration** platforms which combine both sequential Monolithic 3D Integration (M3D) and parallel polylithic chiplet integration to achieve 100X improvement in transistor, IO, and Bandwidth Densities.

https://www.inovex.de/de/blog/edge-computing-introduction/

Andrew John and Micah Musser, "AI and Compute – How much longer can computing compower drive artificial intelligence progress", CSET, 2022.

Georgia

#### Recent Key Accomplishments: Antenna in Package for 6G (D-Band)



11dB Gain 14-16dBi Gain Kai-Qi Huang et al, ECTC 2021 Serhat Erdogan et al, IMS 2022

100/200um TGV SIW Loss: 0.7dB/mm Mutee Rehman et al, IMS 2021





CREATING THE NEXT





## Progress Towards High-Performance THz and mm-Wave Transistors for Wireless Systems

#### Srabanti Chowdhury srabanti@stanford.edu



This work was supported in part by the Semiconductor Research Corporation (SRC) and DARPA.



## The Challenge





The center's application goals require <u>high transmit power</u> and <u>low receiver noise figure</u> beyond the state of the art.

The improved device-level performance needed is addressed through application-specific THz transistors

## JUMP GaN & InP Transistors for 100+ GHz systems



#### Silicon

baseband processing at all frequencies RF sections @ 140, 200GHz PAs, LNAs in short-range 140, 210 GHz links

#### GaN

high-power amplifiers in long-range 140,210GHz links (possibly 340GHz ?), with integrated Diamond cooling  $\rightarrow$  *Mishra, Xing, Jena, Chowdhury* 

#### InP MOS-HEMT

low-noise amplifiers in long-range 140,210GHz links low-noise amplifiers @ 290, 650GHz → Rodwell

#### InP HBT

medium-power amplifiers in long-range 140, 210GHz links power amplifiers @290, 650GHz RF sections @ 290, 650GHz → Rodwell (devices), Buckwalter (IC in foundry process)



spatially multiplexed base station



MIMO hub: 140GHz: F= 8dB, P<sub>avg</sub>=21dBm, P<sub>1dB</sub>≅25dBm

 $4 \times 4$ subarray MIMO array propagation

range

Point-point MIMO: 210GHz: F= 6dB, P<sub>ave</sub>=16dBm, P<sub>1dB</sub>≅21dBm



### The bottleneck





- To achieve over x W operation in mm-wave domain (94GHz, 240Ghz and 300Ghz) with a PAE of 20-25%.
- The device should be able to transfer heat over 3-4x W to deliver x W without losing the performance.

 $\rightarrow$ Leverage the thermal conductivity of diamond without hurting the GaN channel mobility.





#### Development of Polycrystalline Diamond for Thermal Management





## **JUMP** Role of Industry-Academia partnership **ComSen**

- SRC enabled a platform where industry and academic researchers can collaborate, observe, develop and nurture technologies
- The "energy" felt during our ComSenTer reviews was exceptional



 Collaboration without boundaries helped set up successful seed ideas/projects (DARPA)

> The strength of these relationships will drive the future innovations, workforce development and supply of talents to achieve ambitious yet practical goals. We <u>need</u> to remove any roadblocks that threaten such collaboration and progress.

#### Collaboration towards Decadal Plan Goals: Advances and Challenges in Semiconductor Hardware

Vijaykrishnan Narayanan Pennsylvania State University





## **Evolution from Compute-to-Memory Centric Systems**

4 million billion bytes of data to Image a single black hole



2019: The time is now for memory centric designs



## Latency-Storage Tradeoff



Efficient algorithms, Hardware Acceleration, High density memory/storage, Compute near memory/storage, 3D Integration



## **Enabling Data-Centric Systems**



Source: DOE 3DFeM



Source: Tajana Simunic

#### 23 June 2022

## **Technology-System Interactions**



## **Compute-Memory-Storage Hierarchy**

![](_page_37_Figure_1.jpeg)

#### **BFree Architecture: Sub-array with Reduced Access LUT rows and Compute Engine (BCE)**

![](_page_38_Figure_1.jpeg)

BCE is a 3-stage pipelined in-order core, placed at the sub-array level. BCE is connected to the timer and decoder ports of the sub-array without perturbing the custom-built sub-arrays.

## **Near Storage Processing**

![](_page_39_Figure_1.jpeg)

Great potential, but many more challenges

- Programming Ease
- Scalability
- Security
- Endurance
- Power/Thermal

## **CROSS-LAYER DESIGN FROM DEVICES TO APPLICATIONS**

X. Sharon Hu, University of Notre Dame

**Faculty collaborators:** Michael Niemier (notre Dame), Suman Datta (Notre Dame), Mohsen Imani (UCI), Kai Ni (RIT), Thomas Kampfe (Fraunhofer IPMS-CNT, Germany) **Students at Notre Dame:** Arman Kazem, Ann Franchesca Laguna, Liu Liu, Mohammad Mehdi Sharifi

![](_page_40_Picture_3.jpeg)

![](_page_40_Picture_4.jpeg)

![](_page_40_Picture_5.jpeg)

#### **CROSS-LAYER DESIGN: FEFET-BASED COMPUTE-IN-MEMORY FABRICS**

![](_page_41_Figure_1.jpeg)

### **VALUE PROPOSITION OF MULTI-BIT FEFETS?**

**NEED CROSS-LAYER ANAYSIS TO ANSWER!** 

![](_page_42_Figure_2.jpeg)

## **DEVICE-ARCHITECTURE-ALGORITHM DSEs**

![](_page_43_Figure_1.jpeg)