EMC^2: EMC2 - Energy Efficient Machine Learning and Cognitive Computing

Sunday, June 16, 2019 Room: Hyatt Shoreline A Half Day (Morning Session)

Program
CFP

In the Eleventh edition of EMC2 workshop, we plan to facilitate conversation about the sustainability of large-scale AI computing systems being developed to meet the ever-increasing demands of generative AI. This involves discussions spanning multiple interrelated areas. First, we continue to serve as the leading forums for discussing the energy-efficiency aspect of GenAI workloads which directly impact the overall viability and economic value of AI technology. Second, we reassess the scaling laws of AI with the prevalence of agentic, multi-modal, and reasoning-based models in conjunction with novel techniques such as a highly sparse expert architecture and disaggregated computation. Finally, we discuss sustainable and high-performance computing paradigms towards efficient datacenters and hybrid computing models that can cater to the exponential growth in model sizes, application areas, anduser base. This would allow us to explore ideas to build the hardware, software, systems, and scaling infrastructure, as well as model architectures that make AI technology even more prevalent and accessible.

The goal of this Workshop is to provide a forum for researchers and industry experts who are exploring novel ideas, tools and techniques to improve the energy efficiency of MLLMs as it is practised today and would evolve in the next decade. We envision that only through close collaboration between industry and the academia we will be able to address the difficult challenges and opportunities of reducing the carbon footprint of AI and its uses. We have tailored our program to best serve the participants in a fully digital setting. Our forum facilitates active exchange of ideas through:

Keynotes, invited talks and discussion panels by leading researchers from industry and academia
Peer-reviewed papers on latest solutions including works-in-progress to seek directed feedback from experts
Independent publication of proceedings through IEEE CPS

We invite full-length papers describing original, cutting-edge, and even work-in-progress research projects about efficient machine learning. Suggested topics for papers include, but are not limited to the ones listed below:

Neural network architectures for resource constrained applications.
Efficient hardware designs to implement neural networks including sparsity, locality, and systolic designs.
Power and performance efficient memory architectures suited for neural networks.
Network reduction techniques – approximation, quantization, reduced precision, pruning, distillation, and reconfiguration.
Exploring interplay of precision, performance, power, and energy through benchmarks, workloads, and characterization.
Performance potential, limit studies, bottleneck analysis, profiling, and synthesis of workloads.
Explorations and architctures aimed to promote sustainable computing.
Simulation and emulation techniques, frameworks, tools, and platforms for machine learning.
Optimizations to improve performance of training techniques including on-device and large-scale learning.
Load balancing and efficient task distribution, communication and computation overlapping for optimal performance.
Verification, validation, determinism, robustness, bias, safety, and privacy challenges in AI systems.
Efficient deployment strategies for edge and distributed environments.
Model compression and optimization techniques that preserve reasoning and problem-solving capabilities.
Architectures and frameworks for multi-agent systems and retrieval-augmented generation (RAG) pipelines.
Systems-level approaches for scaling future foundation models (e.g., Llama 4, GPT-5 and beyond).

08:00 - 08:10

Welcome

Introduction and Opening Remarks

08:10 - 09:00

Invited Talk

Presentation

Mixed-signal Techniques for Embedded Machine Learning Systems

Boris Murmann, Stanford University

Over the past decade, machine learning algorithms have been deployed in many cloud-centric applications. However, as the application space continues to grow, various algorithms are now being embedded “closer to the sensor,” eliminating the latency, privacy and energy penalties associated with cloud access. In this talk, I will review mixed-signal circuit techniques that can improve the efficiency of moderate-complexity, low-power inference algorithms. Specific examples include feature analog extraction for image and audio processing, mixed-signal compute circuits for convolutional neural networks, as well as compute-in memory using resistive RAM.

Boris Murmann is a Professor of Electrical Engineering at Stanford University. He joined Stanford in 2004 after completing his Ph.D. degree in electrical engineering at the University of California, Berkeley in 2003. Since 2004, he has worked as a consultant with numerous Silicon Valley companies. Dr. Murmann’s research interests are in mixed-signal integrated circuit design, with special emphasis on sensor interfaces, data converters and custom circuits for embedded machine learning. He is a Fellow of the IEEE.

09:00 - 09:50

Invited Talk

Presentation

Balancing Efficiency and Flexibility for DNN Acceleration

Vivienne Sze, MIT

There has been a significant amount of research on the topic of efficient processing of DNNs, from the design of efficient DNN algorithms to the design of efficient DNN accelerators. The wide range techniques used for efficient DNN algorithm design has resulted in a more diverse set of DNNs; this creates a new challenge for the DNN accelerators, as they now need to be sufficiently flexible to support a wide range of DNN workloads efficiently. However, many of the existing DNN accelerators rely on certain properties of the DNN which cannot be guaranteed (e.g., fixed weight sparsity, large number of channels, large batch size). In this talk, we will briefly discuss recent techniques that have been used to design efficient DNN algorithms and important properties to consider when applying these techniques. We will then present a systematic approach called Eyexam to identify the sources of inefficiencies in DNN accelerator designs for different DNN workloads. Finally, we will introduce a flexible accelerator called Eyeriss v2 that is computationally efficient across a wide range of diverse DNNs.

Vivienne Sze is an Associate Professor at MIT in the Electrical Engineering and Computer Science Department. Her research interests include energy-aware signal processing algorithms, and low-power circuit and system design for portable multimedia applications, including computer vision, deep learning, autonomous navigation, and video process/coding. Prior to joining MIT, she was a Member of Technical Staff in the R&D Center at TI, where she designed low-power algorithms and architectures for video coding. She also represented TI in the JCT-VC committee of ITU-T and ISO/IEC standards body during the development of High Efficiency Video Coding (HEVC), which received a Primetime Engineering Emmy Award. She is a co-editor of the book entitled “High Efficiency Video Coding (HEVC): Algorithms and Architectures” (Springer, 2014).

Prof. Sze received the B.A.Sc. degree from the University of Toronto in 2004, and the S.M. and Ph.D. degree from MIT in 2006 and 2010, respectively. In 2011, she received the Jin-Au Kong Outstanding Doctoral Thesis Prize in Electrical Engineering at MIT. She is a recipient of the 2018 Facebook Faculty Award, the 2018 & 2017 Qualcomm Faculty Award, the 2018 & 2016 Google Faculty Research Award, the 2016 AFOSR Young Investigator Research Program (YIP) Award, the 2016 3M Non-Tenured Faculty Award, the 2014 DARPA Young Faculty Award, the 2007 DAC/ISSCC Student Design Contest Award, and a co-recipient of the 2017 CICC Outstanding Invited Paper Award, the 2016 IEEE Micro Top Picks Award and the 2008 A-SSCC Outstanding Design Award.

For more information about research in the Energy-Efficient Multimedia Systems Group at MIT visit: http://www.rle.mit.edu/eems/

09:50 - 10:00

Break

Short Break

10:00 - 10:50

Invited Talk

Presentation

Hardware Efficiency Aware Neural Architecture Search

Song Han, MIT

Efficient deep learning computing requires algorithm and hardware co-design to enable specialization: we usually need to change the algorithm to reduce memory footprint and improve energy efficiency. However, the extra degree of freedom from the algorithm creates a much larger design space: it’s not only about designing the hardware but also about how to change the algorithm to best fit the hardware. Human engineers can hardly exhaust the design space by heuristics. It’s labor consuming and sub-optimal. We propose design automation techniques for efficient neural networks. We investigate automatically designing small and fast models (ProxylessNAS), auto channel pruning (AMC), and auto mixed-precision quantization (HAQ). We demonstrate such learning-based, automated design achieves superior performance and efficiency than rule-based human design. Moreover, we shorten the design cycle by 200× than previous work, so that we can afford to design specialized neural network models for different hardware platforms.

Song Han is an assistant professor at MIT EECS. Dr. Han received the Ph.D. degree in Electrical Engineering from Stanford advised by Prof. Bill Dally. Dr. Han’s research focuses on efficient deep learning computing. He proposed “Deep Compression” and “EIE Accelerator” that impacted the industry. His work received the best paper award in ICLR’16 and FPGA’17. He is the co-founder and chief scientist of DeePhi Tech (a leading efficient deep learning solution provider), which was acquired by Xilinx in 2018.

10:50 - 11:40

Invited Talk

Presentation

What Can In-memory Computing Deliver, and What Are the Barriers?

Naveen Verma, Princeton University

Inference based on deep-learning models is being employed pervasively in applications today. In many such applications, state-of-the-art models can easily push the platforms to their limits of performance and energy efficiency. To address this, digital acceleration has been widely exploited. But, deep-learning computations exhibit critical attributes that limit the gains achievable by traditional digital acceleration. In particular, computations are dominated by high-dimensionality matrix-vector multiplications (MVMs), where the precision requirements of elements have been reducing (from FP32 a few years ago, to INT8/4/2 now and in the future). In this scenario, in-memory computing (IMC) offers distinct advantages, which have been demonstrated through recent prototypes leading to roughly 10x higher energy efficiency and area-normalized throughput, compared to optimized digital accelerators. This arises from the structural alignment of dense 2D memory arrays with the dataflow of MVMs. While digital spatial architectures (e.g., systolic arrays) also exploit such alignment, IMC can do so more aggressively, minimizing data movement and amortizing compute into highly-efficient, highly-parallel analog operations. But, IMC also raises critical challenges, at each level (need for analog compute at circuit level, need for high bandwidth hardware infrastructure at architectural level, constrained configurability/virtualization at the software-mapping level). Recent research advances have shown remarkable promise in addressing many of these challenges, making IMC more of a reality than ever. These advances, their potential implications, and key questions remaining will be reviewed.

Naveen Verma received the B.A.Sc. degree in Electrical and Computer Engineering from the UBC, Vancouver, Canada in 2003, and the M.S. and Ph.D. degrees in Electrical Engineering from MIT in 2005 and 2009 respectively. Since July 2009 he has been a faculty member at Princeton University. His research focuses on advanced sensing systems, exploring how systems for learning, inference, and action planning can be enhanced by algorithms that exploit new sensing and computing technologies. This includes research on large-area, flexible sensors, energy-efficient statistical-computing architectures and circuits, and machine-learning and statistical-signal-processing algorithms. Prof. Verma has served as a Distinguished Lecturer of the IEEE Solid-State Circuits Society, and currently serves on the technical program committees for ISSCC, VLSI Symp., DATE, and IEEE Signal-Processing Society (DISPS).

11:40 - 12:30

Invited Talk

Presentation

Speeding up Deep Neural Networks with Adaptive Computation and Efficient Multi-Scale Architectures

Rogerio Feris, IBM Research

Very deep convolutional neural networks have shown remarkable success in many computer vision tasks, yet their computational expense limits their impact in domains where fast inference is essential, particularly in delay-sensitive and real-time scenarios such as autonomous driving, robotic navigation, or user-interactive applications on mobile devices. In this talk, I will describe two complementary approaches for speeding up deep neural networks. The first approach, called BlockDrop, learns to dynamically choose which layers of a deep network to execute during inference so as to best reduce total computation without degrading prediction accuracy. The second approach, called Big-Little Net, relies on a multi-branch network architecture that has different computational complexities for different branches, with feature fusion at multiple scales. The model surpasses state-of-the-art CNN acceleration approaches by a large margin in accuracy and FLOPs reduction. Finally, I will conclude the talk describing ongoing efforts at IBM for energy efficient deep learning.

Rogerio Schmidt Feris is the head of computer vision and multimedia research at IBM T.J. Watson Research Center. He joined IBM in 2006 after receiving a Ph.D. from the University of California, Santa Barbara. He has also worked as an Affiliate Associate Professor at the University of Washington and as an Adjunct Associate Professor at Columbia University. He has authored over 100 technical papers and has over 40 issued patents in the areas of computer vision, multimedia, and machine learning. Rogerio is a principal investigator of several projects within the MIT-IBM Watson AI Lab, and leads the IBM-MIT-Purdue team as part of the IARPA DIVA program. His work has not only been published in top AI conferences (NeurIPS, CVPR, ICLR, ICCV, ECCV, SIGGRAPH), but has also been integrated into multiple IBM products, including Watson Visual Recognition, Watson Media, and Intelligent Video Analytics. He led the development of an attribute-based people search system used by many police departments around the world, as well as a system to produce auto-curated highlights for the US Open, Wimbledon, and Masters tournaments, which were seen by millions of people. Rogerio’s work has been covered by the New York Times, ABC News, CBS 60 minutes, and many other media outlets. He currently serves as an Associate Editor of TPAMI, has served as a Program Chair of WACV 2017, and as an Area Chair of top AI conferences, such as NeurIPS, CVPR, and ICCV. Rogerio is an IBM Master Inventor, has received an IBM Outstanding Innovation Award, and was part of the team that recently achieved top results in highly competitive benchmarks such as the KITTI and TrecVid evaluations. In addition to working on core research, he had a one-year assignment at IBM Global Technology Services as a senior software engineer to help the productization of the IBM Smart Surveillance System.

Energy Efficient Machine Learning and Cognitive Computing

3rd Edition

Co-located with CVPR 2019 in Long Beach, CA

Workshop Objective

Call for Papers

Topics for the Workshop

We will follow that same formatting guidelines and duplicate submission policies as ASPLOS.

08:00 - 08:10

Welcome

Introduction and Opening Remarks

08:10 - 09:00

Invited Talk

Mixed-signal Techniques for Embedded Machine Learning Systems

Boris Murmann, Stanford University

09:00 - 09:50

Invited Talk

Balancing Efficiency and Flexibility for DNN Acceleration

Vivienne Sze, MIT

09:50 - 10:00

Break

Short Break

10:00 - 10:50

Invited Talk

Hardware Efficiency Aware Neural Architecture Search

Song Han, MIT

10:50 - 11:40

Invited Talk

What Can In-memory Computing Deliver, and What Are the Barriers?

Naveen Verma, Princeton University

11:40 - 12:30

Invited Talk

Speeding up Deep Neural Networks with Adaptive Computation and Efficient Multi-Scale Architectures

Rogerio Feris, IBM Research

12:30 - 12:35

Close

Closing Remarks

Energy Efficient Machine Learning and Cognitive Computing

3rd Edition

Co-located with CVPR 2019 in Long Beach, CA

description Workshop Objective

chat Call for Papers

format_list_bulleted Topics for the Workshop

We will follow that same formatting guidelines and duplicate submission policies as ASPLOS.

08:00 - 08:10

Welcome

Introduction and Opening Remarks

08:10 - 09:00

Invited Talk

Mixed-signal Techniques for Embedded Machine Learning Systems

Boris Murmann, Stanford University link

09:00 - 09:50

Invited Talk

Balancing Efficiency and Flexibility for DNN Acceleration

Vivienne Sze, MIT link

09:50 - 10:00

Break

Short Break

10:00 - 10:50

Invited Talk

Hardware Efficiency Aware Neural Architecture Search

Song Han, MIT link

10:50 - 11:40

Invited Talk

What Can In-memory Computing Deliver, and What Are the Barriers?

Naveen Verma, Princeton University link

11:40 - 12:30

Invited Talk

Speeding up Deep Neural Networks with Adaptive Computation and Efficient Multi-Scale Architectures

Rogerio Feris, IBM Research link

12:30 - 12:35

Close

Closing Remarks

Workshop Objective

Call for Papers

Topics for the Workshop

Boris Murmann, Stanford University

Vivienne Sze, MIT

Song Han, MIT

Naveen Verma, Princeton University

Rogerio Feris, IBM Research