Hot Chips 2020 Live Blog: Alibaba Xuantie-910 RISC-V CPU (3:00pm PT)
by Dr. Ian Cutress on August 17, 2020 6:00 PM EST- Posted in
- CPUs
- Live Blog
- RISC-V
- Alibaba
- Hot Chips 32
- Xuantie-910
06:04PM EDT - This is the first talk on edge computing
06:05PM EDT - Xuantie-910 of Alibaba
06:06PM EDT - Innovating Cloud and Edge Computing by RISC-V
06:06PM EDT - Xuantie refers to a heavy sword from Chinese folklore made of Iron
06:07PM EDT - T-Head semiconductor - a young Alibaba organization specializing in circuit design specialising next gen compute for various areas with a strong commitment to Open Source
06:08PM EDT - RISC-V is very attractive for the IoT era
06:08PM EDT - Extensibility and modularity allows for customization for the domain specific workloads
06:09PM EDT - RISC-V Mainline platform in Linux, fully supported in AlibabaOS
06:09PM EDT - Xuantie goal is to contribute to the oepn source community
06:09PM EDT - AI Vector Engine
06:10PM EDT - Similar in performance to Arm 73
06:10PM EDT - Xuantie-902 (M0+ like) with hardware TEE up to Xuantie-910
06:10PM EDT - 903, 907,908 coming
06:11PM EDT - 4 cores per cluster in 910
06:11PM EDT - HMP cluster
06:11PM EDT - Each core supports 32-64 KB L1 D and 32-64 KB L1 I
06:11PM EDT - Each single core is 3-decode 8-issue OoO
06:11PM EDT - Hybrid branch predictor
06:11PM EDT - vector engine
06:12PM EDT - One of the first commercial processors to use RISC-V vector extension proposals
06:12PM EDT - Performance on Coremark 7.1 per MHz. This workload is a full cache hit only
06:13PM EDT - Highest performance RISC-V on market now
06:13PM EDT - SiFive has U84 processor which might be higher performance, but no details of yet
06:13PM EDT - waiting for info to become available
06:13PM EDT - X910 supports RISC-V 0.7.1 Vector Extension
06:13PM EDT - FP16-FP64, INT8-INT64
06:14PM EDT - MMX, Clint, PPC
06:14PM EDT - MMU*
06:14PM EDT - Supports unaligned memory data access
06:14PM EDT - Supports custom extensions
06:14PM EDT - RISC-V Turbo extensions
06:15PM EDT - bit operations, memory access, core sync
06:15PM EDT - Can be disabled to be completely compatible with RISC-V
06:15PM EDT - but Alibaba toolchain can use the new instructions
06:16PM EDT - Two vector pipes, 1 ALU/MUL, 1 ALU/DIV, 1 Branch, 1 dual issue Load/Store units
06:16PM EDT - 128-bit instruction fetch unit
06:16PM EDT - can fetch 8 instructions at once
06:17PM EDT - Hybrid multi-mode branch prediction
06:17PM EDT - Cache Way prediction
06:17PM EDT - Loop accelerator
06:18PM EDT - Can do one load and one store in parallel
06:18PM EDT - 3-cycle load-to-use
06:19PM EDT - Unique multi-mode and multi-stream prefetch mode for RISC-V by pattern matching and backfills the L1/L2 cache
06:19PM EDT - 4 cores per cluster, up to 4 clusters
06:20PM EDT - All Clusters shares L2, up to 8MB
06:20PM EDT - Two 128-bit Vector ALU ops/cycle
06:21PM EDT - More than 300 GFLOPs FP16 per cluster (32 FLOPs/core/cycle x 2.5 GHz x 4-cores)
06:21PM EDT - FP32 perf is 0.5x FP16
06:21PM EDT - So 150 GFLOP of FP32 per cluster - up to 600 GFLOP of FP32 in a 4-cluster design
06:22PM EDT - Also integrated IDE with profiling for Xuantie-910
06:22PM EDT - Compiler has been co-optimized for the hardware improvements
06:22PM EDT - Compared to Arm A73
06:23PM EDT - A73 CPU is from Huawei Kirin 970
06:23PM EDT - Xuantie is configured to same L1 cache sizes
06:24PM EDT - 'on par in this config'
06:24PM EDT - Benchmarks doesn't mean that Xuantie-910 is up to the perfection of A73, as it's still new, and needs more collaboration
06:25PM EDT - Here's an AI workload
06:25PM EDT - on an FPGA simulation of X910
06:25PM EDT - Here's a floor plan
06:26PM EDT - TSMC 12FF
06:26PM EDT - FPGA X910 already deployed in Alibaba cloud
06:27PM EDT - FPGA runs at 200 MHz
06:27PM EDT - 2020 July, 28 HPC version at 1.6 GHz, 0.3 mW/MHz
06:27PM EDT - September, 12nm FinFET due
06:28PM EDT - Help external customers with X910 with Wujian SoC platform
06:30PM EDT - Now for Q&A
06:32PM EDT - Q: What applications are you using it for?
06:33PM EDT - A: It's a full chip - a high-end core for embedded SoCs
06:34PM EDT - Q: Source code? A: we are actively working on open source procedures. It's not straight forward for a high performance core - legal required. We are talking to open source companies to find the best way to do this. Also repository management and such. Once it is available, we will let you know!
06:34PM EDT - Q: plans to support RVV 1.0? A: 0.7.1 for now - when we designed, it was still at that level. We are following and working ont hat yes.
06:36PM EDT - That's a wrap. My next live blog will be NVIDIA A100 at 5pm PT.
6 Comments
View All Comments
eSyr - Monday, August 17, 2020 - link
Suddenly, C-SKY.watersb - Tuesday, August 18, 2020 - link
Pandemic distancing borders in the surreal right now, kids returning to school while staying at home.Strange times, but as always I get a lot out of your Hot Chips coverage, even though you are unlikely to be able to insert any wafers into your mouth this year.
Thanks!
name99 - Wednesday, August 19, 2020 - link
What is "RISC-V Turbo"? The web shows nothing useful.MetalPenguin - Wednesday, August 19, 2020 - link
That is the name that Alibaba gave to their own custom instruction extension. The slide around "6:14PM EDT" kind of gives a brief overview. They basically added custom instructions to accelerate certain functions.name99 - Wednesday, August 19, 2020 - link
OK, so point is it's an Alibaba extension, not a "standard" RISC-V extension.Oh RISC-V, you and your crazy extensions. It's hard to imagine this will play out well long term...
So do we validate these as good choices (ie they should have been in the base spec!) if other people copy them? Or are they (C) Alibaba so one step on the way to Balkanization?
green.holden - Tuesday, September 20, 2022 - link
300 GFLOPS of FP16 and 150 of FP32.That's got to be enough for a basic GPU.