mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-06-24 02:14:17 +00:00
more business notes
This commit is contained in:
parent
b1000d866e
commit
d29b16e5b4
1 changed files with 14 additions and 0 deletions
14
fpga/README
14
fpga/README
|
|
@ -15,6 +15,10 @@ Small Board (Arty A7 100T)
|
|||
* 4x4x4 matmul = 64 mults, perhaps 8x8x8 matmul = 512 mults
|
||||
* 6.4 GFLOPS @ 50 mhz
|
||||
|
||||
* Forward/backward pass of ResNet-50, EfficientNet-B2, and BERT-large in the simulator
|
||||
* Train MNIST models on the real hardware
|
||||
* After we've trained MNIST here, buy the big board and a Linux computer for home
|
||||
|
||||
Big Board (Alveo U250)
|
||||
=====
|
||||
* Support DMA over PCI-E. 16 GB/s
|
||||
|
|
@ -24,6 +28,12 @@ Big Board (Alveo U250)
|
|||
* 16x16x16 matmul = 4096 mults, perhaps 32x32x32 matmul = 32768 mults
|
||||
* 4 TFLOPS @ 500 mhz
|
||||
|
||||
* Bring up in one Z840 with one card
|
||||
* Train (with tinygrad) ResNet-50, EfficientNet-B2, and BERT-large
|
||||
* Now we buy a machine with 8x cards
|
||||
* Write 8x multicard training, place on https://mlcommons.org/en/training-normal-07/
|
||||
* Now it's funding/kickstarter time, based on our MLPerf results on the Alveos and Cherry Two sim
|
||||
|
||||
Cherry Two (12nm tapeout)
|
||||
=====
|
||||
* Support DMA over PCI-E. 16 GB/s
|
||||
|
|
@ -34,6 +44,10 @@ Cherry Two (12nm tapeout)
|
|||
* Target 75W, even if underclocked. One slot, no external power.
|
||||
* This card should be on par with a 3090 and sell for $1000
|
||||
|
||||
* Write PyTorch port to support same training while waiting for tapeout
|
||||
* If we are here, we are winning the AI chip market
|
||||
* Tile the core and go to a smaller process node
|
||||
|
||||
Cherry Three (5nm tapeout)
|
||||
=====
|
||||
* Support DMA over PCI-E 4.0. 32 GB/s
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue