02:16PM EDT - Princeton is presenting it's own solution for in-memory compute this year at Hot Chips.

02:31PM EDT - Energy efficiency is critical, but programmability is often complex

02:32PM EDT - Compute is often only 10% of the total instruction energy

02:32PM EDT - Compute is getting faster, but programmability hasn't. We've hit a memory wall

02:33PM EDT - This focus is on embedded SRAM

02:33PM EDT - Models can be large, but up to 50M parameters is smaller

02:34PM EDT - Data movement is fundamental. Cannot be eliminated, but amortized

02:34PM EDT - End up with reuse of data and specialized memory-compute integrated architectures, like TPU with systolic arrays

02:35PM EDT - In-Memory computing does soemthing similar but more aggressive

02:35PM EDT - IMC = in memory computing

02:36PM EDT - At a fundamental level, IMC can reduce voltage SNR in exchange for energy/throughput

02:37PM EDT - Solution is to use analog circuits. Problem is that transistors have non-linear properties

02:37PM EDT - A number of IMC designs have been produced in academia, with some test chips

02:38PM EDT - One issue with IMC is despite 10x energy efficiency is lower memory density

02:38PM EDT - In order to reduce non-linearity and variation of analog circuits, need advanced process technologies with tighter tolerances

02:39PM EDT - Move to charge-domain computation based on capacitors. End up with 8T bit-cell

02:41PM EDT - Can measure image recognition in micro-joules per image

02:41PM EDT - Most ML compute is GEMM, where IMC can help

02:42PM EDT - 590 KB IMC with Si-Five CPU sample chip

02:42PM EDT - Compute-In-Memory Unit (CIMU)

02:42PM EDT - 32-bit external architecture built into standard memory interface

02:42PM EDT - low power 8-bit ADC

02:43PM EDT - bit scalability form 1-8 bits

02:44PM EDT - 8-bit ADC helps with energy overhead

02:44PM EDT - saves energy/area vs 16-bit

02:47PM EDT - OK I'm lost on this talk. It's very academic

02:48PM EDT - Test chip built on 65nm, 8.5mm2

02:48PM EDT - 1b efficiency was 400 TOPs/W

02:49PM EDT - energy efficiency scales like digital, while maintaining analog precision

02:50PM EDT - 23 images/sec at 4b, 176 images/sec at 1b

02:50PM EDT - Developed a prototype kit

02:51PM EDT - 4b activations and weights: 92.4% accurate, 105.2 microjoules total, 23 images/sec

02:52PM EDT - 1b activations and weights, 89.3% accurate, 5.31 microjoules total, 176 images/sec

02:52PM EDT - Software SDK

02:53PM EDT - Libraries are available with CPU-fallback

02:54PM EDT - Q&A Time

02:54PM EDT - No Qs.

Next talk is Intel Optane!

Comments Locked

3 Comments

View All Comments

  • ballsystemlord - Tuesday, August 20, 2019 - link

    Finally, it's my chance to be the first to reply! :-)
  • abufrejoval - Wednesday, August 21, 2019 - link

    I am confused by the combination of "SRAM" and "capacitance based" computing.
    SRAM IMHO means 4-6 transistors per bit and is about the worst density, while here I thought they were using (decaying) capacitances in DRAM bits trenches to store (and compute on?) NN data in "analalog" form: While that sounds crazy, density would be great.

    But I guess the target context is about the smart shirt button, which infers that vis-à-vis your boss it should really be closed and now tries to inform you about that via your on-body-network.
  • Bulat Ziganshin - Thursday, August 22, 2019 - link

    I think they mean SRAM as static RAM what doesn't need to be recharged like DRAM

Log in

Don't have an account? Sign up now