

Juergen Jaeger, Director, Product Management

September 2015



## It's All About The Software

#### Getting to chip tape-out ... is not the biggest problem anymore!

- Software dominates development cost and schedules
- Software is a necessity to successfully sell silicon
- Delay of software delivery delays time to revenue for semiconductor providers









## SW/FW Development Care About

#### Ease of use

- Network resource
- "one click" download, configure, run
- Reset, re-run

#### Productive

- Backdoor memory access
- Start/stop clock
- High performance

### (SW) flow integration

- JTAG
- UART
- Design-specific interfaces (e.g. PCIe, Ethernet, etc.)





## Time to Prototype

Key to success



## Prototyping User Challenges

- FPGA-based prototype bring-up can be both challenging and time consuming
- Unique capabilities in the following areas are addressing these challenges:
  - 1. Design import and compilation
  - 2. Memory support and modeling
  - 3. Clock handling
  - 4. Multi-FPGA partitioning
  - 5. Functional validation and debug (bring-up)
  - 6. Runtime debug
  - 7. Flow integration





## Protium Compile Flow



## Traditional Clocking Approach



## **Advanced Clocking**

#### Complete clock tree transformation

- Automatic conversion of gated and multiplexed clocks
- Eliminates FPGA and board clocking restrictions
  - Support unlimited # of design clocks
- Converts latches and tri-states
- Removes FPGA hold-time violations
- Reduces complexity of clock trees, which speeds up FPGA place and route
  - Faster P&R times better quality of results

#### Benefits:

- No hold-time violations in user clock domains
- Removes any FPGA-specific clock limitations
- Improves FPGA timing closure
- Accelerates FPGA P&R times

#### Protium is "cycle-based"

- Protium updates each net in the design once per cycle of a conceptual clock called FCLK.
- FCLK is generated automatically by the compiler. Its frequency is determined by the compiler.
- The fastest design clock changes once per FCLK cycle, so it runs 2X slower than FCLK.



#### FCLK and Step Clock

- In Protium hardware, FCLK is a conceptual clock, but step clock really exists.
- Step clock is ideally 150Mhz, but may be slower.
- In each compile, the compiler determines both the <u>step clock</u> frequency and the <u>step count</u>
  - ☐ Step count is the number of step clock cycles per FCLK cycle
  - ☐ Typical step count is between 10 and 50





## Adjustable Performance



#### Fully automatic: 3-10MHz\*

- Clock-tree transformation
- ASIC memory mapping
- Partitioning
- FPGA P&R

#### Manual guidance: 10-20MHz\*

- Partitioning input
- Directly connected bulk memories
- FPGA P&R options and constraints
- Logic replication
- Clock-tree simplification

#### Black-boxing: >100MHz\*

- Single FPGA scope
- FPGA-specific optimization
- Direct clock mapping
- Directly connected bulk memories
- FPGA P&R options and constraints

<sup>\*</sup> Actual performance is design-dependent



## Unique Control & Debug Capabilities

#### Runtime

- Start/stop clock capability (run "N" cycles)
- Memory (backdoor) upload and download
- Monitor signal
  - Real-time monitoring of predefined (at compile time) signals
- Force/release signal
  - Forces predefined signals (at compile time) into "0" or "1" during runtime

#### Probes

- Runtime data capture of predefined signals for offline waveform viewing
- Waveforms across partitions
  - Provides a design-centric view rather than an FPGA-centric view



## Comprehensive Memory Support

 The conversion and implementation of memories is one of the most challenging and time-consuming steps in the bring-up of an FPGA-based prototype system, often taking many weeks to complete

| Туре                                             | Size                                        | Palladium<br>MMP     | Upload/do<br>wnload | Performance       | Comments                                                                                                                                                                                    |
|--------------------------------------------------|---------------------------------------------|----------------------|---------------------|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| FPGA-internal                                    | ~50Mbits/FPGA                               | Yes                  | Yes                 | Full design speed | Fully automatic compile     Upload/download limited to 32 memory blocks max. per FPGA                                                                                                       |
| DCMC<br>(Direct<br>connected<br>memory card)     | x GBytes<br>(depending on<br>memories used) | No                   | No                  | Full design speed | Design change may be required, depending on memory type     App notes available     Currently supported:     DDR3, DDR4, SRAM, SD-FLASH                                                     |
| FCMC<br>(Full-custom<br>memory card)             | custom                                      | No                   | No                  | Full design speed | Full custom development                                                                                                                                                                     |
| XSRAM<br>(automated<br>small external<br>memory) | 144 Mbytes per<br>memory card               | some                 | Yes                 | Full design speed | Fully automatic compile     Extends 'FPGA-internal' memory modeling capabilities to a 144M external SRAM     Useful for SPI-flash and other small memories (e.g. boot ROM)                  |
| XDRAM<br>(automated<br>bulk memory)              | 8 GBytes per<br>XDRAM card                  | DDR family<br>models | Yes                 | >4.5MHz           | semi automatic compile     Leverages XDRAM hardware     (daughter card)     Support for DDR3/4, LPDDR3/4     Additional protocols may be added on customer demand and technical feasibility |

The memory compile capabilities in the Protium<sup>™</sup> platform are comprehensive and easy to use:

- Smaller memories are fully automatically compiled into FPGAinternal memory resources
- For larger, off-FPGA memories, Protium platform offers several solutions; which one to use depends on specific requirements and objectives



## Protium Rapid Prototyping Platform

- Gate capacity: up to 100M
- Adjustable performance: full automation to user tuned
- Fast compile time
- Fast Time-toPrototype
- Highly productive implementation flow
  - Automated memory compilation
  - Terminal timing controllability
  - Fully integrated FPGA place&route tool
- Automatic, emulation-like clock tree transformation
  - Unlimited # of user clocks
  - Gated-clock, latch, internal tri-state conversion
  - Elimination of any hold-time violations









#### Tensilica Fusion

#### Single-precision FPU

- Floating Point instructions issued concurrently with 64-bit load/store
- Speeds S/W porting
- AVS (Audio/Voice/Speech)
  - SW compatibility with HiFi 3 Audio DSP
  - Access to 125+ HiFi Audio/Voice software packages
- Quad 16x16 MAC
  - Accelerates communications standards like BLE/Wi-Fi
  - · Accelerates voice algorithm performance
- BLE/Wi-Fi AES-128
  - Encryption acceleration for wireless
- Baseband bit operations
  - Accelerates performance of bit operations for implementation of Baseband MAC/PHY





## Debugging on Hardware via JTAG



| Debug Host                                                                                                                     | Target Host                                                          | Debug Probe                                                                                                                                | Target hardware                                                                                                         |
|--------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|
| <ul> <li>Windows or Linux host running:</li> <li>Xtensa Xplorer with integrated xt-gdb</li> <li>Command-line xt-gdb</li> </ul> | Windows or Linux host running:  • XOCD • debug probe driver software | <ul> <li>Flyswatter</li> <li>Catapult</li> <li>J-Link</li> <li>DStream</li> <li>Wiggler</li> <li>USB2  Daemon</li> <li>and more</li> </ul> | <ul> <li>Protium /<br/>Palladium</li> <li>Emulation board</li> <li>Customer target<br/>hardware with<br/>OCD</li> </ul> |



## Protium HW setup





## Debugging





## Tack Tack













# cādence®

© 2015 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo, Palladium, and SpeedBridge are registered trademarks and Protium is a trademark of Cadence Design Systems, Inc. in the United States and other countries. All other trademarks are the property of their respective owners and are not affiliated with Cadence.