After reading the entire analysis, I'm left wondering, what observations in this analysis - if any - actually apply to CUDA?
Otherwise OpenCL is very good as well, with the added benefit of running on all GPUs.
This is an ironic comment - OpenCL uses the same compiler as CUDA on NVIDIA and HIP on AMD.
https://resources.nvidia.com/en-us-blackwell-architecture
Blackwell uses the TSMC 4NP process. It has two layers. A very back of the envelope estimate:
750mm^2 / (208/2) * 10^9 = 7211 nm^2
85 nm x 85 nm
NB: process feature size does not equal transistor size. Process feature size doesn't even equal process feature size.Where did you get that from? Pretty sure it's a single planar set of transistors. Those transistors are manufactured using multiple layers of mask.
FinFET transistors are described as 3D or non-planar but crucially this isn't allowing transistor on transistor stacking you've just got the gate structure of the FinFET poking out above the plane of the rest of the transistors.
Silicon on silicon die stacking is a possibility but limits your power and GPUs run very hot so it's not an option for them.
I'd say advanced users or skilled staff.
20+ years ago e.g. Athlon XP had a small CPU die in the middle and 4 round spacers in the corners for a proper heatspreader installation. Despite the CPU die wouldn't clock down and go in flames in case of cooler removal during operation.
Nowadays with a safer CPU monitoring its temperature, one has to risk to remove the heatspreader and replace it with "special" direct die cooling resulting in either a bit more performance or 15-20 grad lower temperatures or a smaller or a silent cooler. One is free to choose.
Sure, even advanced user must take more care working around the naked die. But the technology to make this safer than before could have also matured.
However you can't always pack the transistors as dense as you would like because you can't fit the wiring for them in above at the same density.
Plus there are various 'design rules' that constrain how things get placed. These are needed to ensure manufacturing is successful and achieved good yield. An important set of rules are the 'antenna rules' that requires the insertion of antenna diodes (using silicon reducing transistor density) to prevent circuitry being destroyed during manufacturing: https://www.zerotoasiccourse.com/terminology/antenna-report/
On another note I am waiting for Nvidia's entry to CPU. At some point down the line I expect the CPU will be less important, ( relatively speaking ) and Nvidia could afford to throw a CPU in the system as bonus. Especially when we are expecting ARM X930 to rival Apple's M4 in terms of IPC. CPU design has become somewhat of a commodity.
I don't have really solid evidence, just semi-anecdotal/semi-reliable internet posts:
Eg. https://www.tomshardware.com/tech-industry/more-than-251-mil...
Nvidia as a whole has been fairly anti-consumer recently with pricing, so I wouldn't be banking on them for a great cpu option. Weirdly Intel is in the position where they have to prove themselves, so hopefully they'll give us some great products in the next 2-5 years - if they survive (think the old lead-up-to-ryzen era for amd)
[1] there are now 5090 branded cards that use same chip as 5080
If they’re swimming in the AI cash and the consumer GPU segment isn’t that important (https://www.visualcapitalist.com/nvidia-revenue-by-product-l...) then why on earth couldn’t they do less price gouging?
It feels a bit like the Intel Core Ultra desktop CPU launch where the prices were the critical factor that doomed an otherwise pretty okay product. At least Intel's excuse is that they’re closer to going under than before, even if their GPUs were pretty fairly priced anyways.
It’s almost like everyone complains about their prices and the fact that they’re releasing 8 GB cards… and then still go and give them money anyways.
[1]: https://gamersnexus.net/gpus/investigating-nvidias-defective...
[2]: https://nvidia.custhelp.com/app/answers/detail/a_id/5628/~/h...
Haven't they already started doing this with Grace and GB10?
- https://www.nvidia.com/en-us/data-center/grace-cpu/
- https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...
GB10 when it ships might be more interesting, since it'll go into systems that need to support use cases other than merely feeding a big GPU ML workloads. But it sounds like the CPU chiplet at least was more or less outsourced to Mediatek.
It seems there is a huge market for inference.
Less programmable, but more throughput/power efficiency?
I also wonder the same. It'd make sense to sell two categories of chips:Traditional GPUs like Blackwell that can do anything and have backwards compatibility.
Less programmable and more ASIC-like inference chips like Google's TPUs. Inference market is going to be multiple times bigger than training soon.