Friday, July 24, 2009

NPGA - A Totally Cool Fusion Mutant

Previous article outlined the concept of embedded FPGA. It placed it in the context of CPU and identified Stretch S6000 as the first processor with eFPGA block.

This time we'll take a quick look at another hybrid - Cswitch 'Configurable Switch Array' (this company is unfortunately defunct at this moment and looking for buyer).

This mutant can be viewed as a Network Processor (NP) with eFPGA. Or, it may as well be classified as FPGA with deadly dose of hardware mega-blocks aimed at packet processing and networking applications.

Since it's got sizable genome from both NP and FPGA, let's call it NPGA! So, the Cswitch twisted personality includes:

1) Fixed, high-speed (2 GHz), top-level interconnect in the form of any-to-any cross-connect switch. Not an FPGA thing. Found in some NPs.

2) Flexible, highly-configurable, multi-standard I/O that covers the gamut of single-ended, differential and terminated options. This I/O sports built-in flops (SDR and DDR), FIFOs, DLLs and ECC logic (!!!). Not found in NP. Taken from FPGA, but enriched and expanded.

3) An assortment of configurable, global and local clocks, with PLLs, dividers, skew management and clock gating elements. Again, not found in NP, typical of FPGA, except that these clocks can entertain 2GHz (!)

4) One Configurable Packet Processor engine. Consists of programmable Packet Parser, Reconfigurable Arithmetic Unit and Reconfigurable(!) CAM block. Not found in FPGA. Typical of NP, with caveat that Cswitch's packet engine is more configurable.

5) Handful of fast (1Gbps/pin), multi-standard controllers for external memories (DDR1/2, QDR1/2, RLDRAM1/2). Typical of NP. Also found in FPGA, but Cswitch wins on speed, variety and number of available memory controllers.

6) Few dozen SerDes with MACs, including 10GE and PCIe. NP normally does not have this many SerDes, while high-end FPGA may have comparable number of them, but far fewer MAC blocks, if at all.

7) Thousands of Programmable Logic Blocks (at 500MHz) with hundreds of both fine and course-grained, single and dual-port Memory Blocks at incredible 1GHz (!), some with built-in FIFO controller. Typical for FPGA. Not found at all in NP. FPGA normally possesses more logic and memory blocks than this, but cannot entertain the Cswitch speeds.

8) Cswitch design flow is hardware/FPGA-centric, with RTL and synthesis acting as the main implementation vehicle and C compiler/debugger for the Packet Processor engine (NP flow) standing on the side. The compiled C binary is merged with main flow via an HDL wrapper.

This NPGA guy from Cswitch is quite a character, isn't it, esp. knowing that it is carved out of the aging 90nm stock.

Stay tuned for another recent FPGA-ish mutant with similarly strong personality...

Bookmark and Share

Add to Technorati Favorites

Tuesday, July 21, 2009

eFPGA - Another Fusion in the Works!

We touched on AMD-pioneered fusion of GPU to CPU in the previous article and wondered if the desktop processors would also embrace some kind of programmable-ity at the gate level, to so complement the healthy C-to-Hardware movements in the compiler space and allow for dynamic, application-driven processor customizations.

Well, this new fusion, 'Gate Fusion' is reaching the critical mass!

It is the French side of the Atlantic that accelerated the particles this time, where Menta (www.menta.fr) developed embedded FPGA - eFPGA cores in the form of soft IP. Chip companies would then buy Menta's eFPGA core of desired size and integrate it on their design to so get the live capability to add/change logic. Important to note is the soft IP attribute of the Menta eFPGA, which opens it to wider foundry access and offers portability across process technology! Another French company, M2000, was also innovating in the eFPGA space (until they recently became Abound Logic, changed focus, and moved to US).

Simply put, eFPGA is about embedding some uncommitted field-programmable gates in an otherwise ASSP silicon, such as general-purpose processor. Main difference between an FPGA-with-embeddedCPU and CPU-with-eFPGA is in the ratio of the two -- While former is primarily an FPGA (with some CPU), the latter is primarily a CPU (with some FPGA). Moreover, just like embedded Flash or embedded DRAM, eFPGA core(s) can be integrated in any kind of ASSP, not only the CPU, and for reasons more mundane than acceleration through instruction set extension (think 'bug deflection' and 'future-proofing' here).

Tensilica Xtensa product line has been riding the 'custom processor' wave for quite some time, offering rather automated acceleration of its instruction set. The key with Tensilica was however that the application code had to be written and analyzed before processor was manufactured. The bottlenecks are hence identified a-priori and addressed in pre-production, that is in hard gates. ARC Configurable Cores fall into the same category.

The problem in either case is that, should application change enough, the hard-wired accelerator could easily become less of an asset and more of a burden. That's where the eFPGA comes to rescue as its customer-visible gates are soft, that is field-reprogrammable.

Stretch S6000 family is an interesting marriage of the best from both worlds and probably the first CPU with embedded FPGA. 'Stretch' is marketed under the catchy 'Software Configurable Processor' banner. It contains VLIW Xtensa processing core with Tensilica 'Instruction Set Extension Fabric', further augmented by Menta eFPGA-based 'Programmable Accelerator'. Higher-end versions of the 'Stretch' also contain 'Processor Array' interface, which is to allow connecting multiple 'Stretch' units into some kind of networking topology, for scalability stretch.

It seems to us that pure-play FPGA is the thing of the past -- As the programmable imperative is gaining momentum, standard parts are being enriched with eFPGA cores, while true FPGAs are doped with heavy doses of complex hard IP and morphing into eFPGA-like parts.

This Fusion both consolidates and diversifies -- The technologies that are traditionally thought of as distinct and unique are now fused on the common die, which isn't merely integration and density scaling. Yet, the variety of target applications, coupled with acute energy crisis and cry for doing the most on the least, increase the diversity of the so 'fused' chips by calling for differences in the ratios of the packaged content.

This Fusion is an unstoppable, chain reaction and occurring at multiple levels -- We are yet to touch on its mixed-signal and RF aspects...

Bookmark and Share
Jasmin Ibrahimovic's VisualCV

Friday, May 8, 2009

PSCs and Streamcomputing without FPGA

The most seductive FPGA application is in the realm of reconfigurable supercomputing.

Ages ago, in April 2005, Cray Inc. unveiled their XD1 supercomputer that utilized array of Xilinx Virtex2P FPGAs for CPU acceleration. XD1 came with an SDK that allowed programmers to move time-critical sections of 'C' code to dedicated computing hardware within FPGA (check Starbridge Viva, ImpulseC, Mentor Catapult-C, Altera C2H, Celoxica Handel-C, Cadence C2Silicon, free http://www.c-to-verilog.com /*thanks Nadav*/, Forte Cynthesizer, Nallatech Dime-C, Mitrion-C, Synfora PICO for C-to-Hardware translation tools). This ability to on-demand, per-application, per-need, create and extend the CPU instruction set is the basis of Reconfigurable Computing. Stream computing is then a natural extension, as parallelism is the inherent property of FPGA-based computing engines.

Then, in July 2006, AMD got hold of ATI and announced plans to design 'a new kind of processor', the Fusion processor that would integrate and use the GPU for its computing tasks other than 3D video rendering, shading and display-related math and data movements.

The first step however was to modify the hard-coded, inaccessible data paths and parallel compute engines in the deep core of a GPU and turn them into flexible, multi-purpose, fully exposed stream processors that C programmer had direct control of, using special SDK.

While AMD is now very close to presenting the world with their GPU-fused-to-CPU silicon, the GPU-based supercomputing has already arrived! It comes in the form of a PCIe accelerator card (AMD FireStream, Nvidia Tesla) and SDK.

Amazingly, these cards are designed for ordinary PCs and can be bought in retail for < $1000 !! This makes Cray-like supercomputing available for much, much less than $millions, on the desktop, outside the big air-conditioned, water-cooled room!!! The supercomputers are becoming very, very personal - Personal Super Computers: PSCs.

Will this fusion of technologies steal the story from the FPGAs?


Or, will the fusion continue and add some FPGA-like gate-level programmability to CPUs, in the same way as the FPUs had been and GPUs are being absorbed?

Bookmark and Share

Tuesday, May 5, 2009

FPGA Outside the Big4 Wheelers

The Big4: Xilinx, Altera, Lattice, Actel, is the first and often only thought on FPGA mention.

But, as more and more design space is within FPGA reach and "Programmable Imperative" is becoming pervasive, so are the Big4 alternatives gaining ground.

Here are the most credible ones:
  • And the list goes on ...
Bookmark and Share

View Jasmin Ibrahimovic's profile on LinkedIn

Thursday, April 30, 2009

Xilinx is Out with V6, S6 and ISE11

Platform Computing is the theme:
Problem and solution landscape have grown in size and complexity so that Xilinx is now bundling the tools in the kits for different application domains: Logic Design, DSP Algorithm Development, Embedded Systems, to help designers pick the right tool for the task.

Immersive Computing and Programmable Imperative
are the vision and where the FPGAs are going to.

Here is how Xilinx CEO, Moshe Gavrielov, explains it: http://bit.ly/9arlF

Bookmark and Share

Jasmin Ibrahimovic's Resume