The 1st EMC2 - Energy Efficient Training and Inference of Transformer Based Models

Co-located with the 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS 2018

Sunday, March 25, 2018
Williamsburg, VA
Room: Allegheny Room A
Click Here for: Slides from the Workshop
Half Day (Afternoon Session)

description Workshop Objective

Transformers are the foundational principles of large deep learning language models. Recent successes of Transformer-based models in image classification and action prediction use cases indicate their wide applicability. In this workshop, we want to focus on the leading ideas using Transformer models such as PALM from Google. We will learn what have been their key observations on performance of the model, optimizations for inference and power consumption of both mixed-precision inference and training.

chat Call for Papers

The goal of this Workshop is to provide a forum for researchers and industry experts who are exploring novel ideas, tools, and techniques to improve the energy efficiency of machine learning and deep learning as it is practiced today and would evolve in the next decade. We envision that only through close collaboration between industry and the academia we will be able to address the difficult challenges and opportunities of reducing the carbon footprint of AI and its uses. We have tailored our program to best serve the participants in a fully digital setting. Our forum facilitates active exchange of ideas through:

  • Keynotes, invited talks and discussion panels by leading researchers from industry and academia
  • Peer-reviewed papers on latest solutions including works-in-progress to seek directed feedback from experts
  • Independent publication of proceedings through IEEE CPS

We invite full-length papers describing original, cutting-edge, and even work-in-progress research projects about efficient machine learning. Suggested topics for papers include, but are not limited to the ones listed on this page. The proceedings from previous instances have been published through the prestigious IEEE Conference Publishing Services (CPS) and are available to the community via IEEE Xplore. In each instance, IEEE conducted independent assessment of the papers for quality.

format_list_bulleted Topics for the Workshop

  • Neural network architectures for resource constrained applications
  • Efficient hardware designs to implement neural networks including sparsity, locality, and systolic designs
  • Power and performance efficient memory architectures suited for neural networks
  • Network reduction techniques – approximation, quantization, reduced precision, pruning, distillation, and reconfiguration
  • Exploring interplay of precision, performance, power, and energy through benchmarks, workloads, and characterization
  • Simulation and emulation techniques, frameworks, tools, and platforms for machine learning
  • Optimizations to improve performance of training techniques including on-device and large-scale learning
  • Load balancing and efficient task distribution, communication and computation overlapping for optimal performance
  • Verification, validation, determinism, robustness, bias, safety, and privacy challenges in AI systems
13:30 - 14:30

Safety and Security at the Heart of Autonomous Driving

Kamal Khouri, NXP Semiconductors

The automotive industry is undergoing a revolution with connected, autonomous and electric vehicles and the benefits they can bring to the public. Drivers enjoying their daily commute, fewer road fatalities and less pollution are all possible thanks to new technologies. Car makers need to offer these features but at the same time make sure vehicles are safe and secure. In the coming years, there will be various levels of automation until we have fully autonomous vehicles. To achieve any level of automation, cars need to connect to other vehicles, connect to the infrastructure, sense the environment through various sensors such as camera and radar and then make maneuvering decisions based on all these inputs. Artificial intelligence is and will be deployed heavily to accomplish many of the tasks of autonomous driving. Perception and decision-making based on artificial intelligence introduces an entirely new set of challenges to car makers to ensure no security compromises as well as proving the decisions being made are functionally, behaviorally and environmentally safe. The challenge can be described in a simple question: “If a machine learning based car system is accurate 99% of the time, are you willing to ride this car knowing that it will be wrong 1% of the time? What is the consequence of that incorrect decision?” Deep expertise and research in the safety and security aspects of AI are needed to ensure future mass deployment and success in the area of autonomous driving.

Kamal Khouri is General Manager & Vice-president of Automotive Microcontrollers and Processors for ADAS Product Line at NXP Semiconductors. Kamal holds a BS in Electrical Engineering from Bucknell University and a Masters and Ph.D. in Computer Engineering from Princeton University. He has over 17 years of semiconductor industry experience with multiple patents and over 25 publications. He started his career at Motorola SPS and later Freescale working in various roles in engineering and product management within the compute and networking divisions. Kamal was also Director of Products for various businesses at AMD, ranging from embedded to gaming and semi-custom products. At NXP, his team is now defining the future of autonomous vehicles and the processing power they need to make them a reality.

14:30 - 15:50
Paper Session #1
Paper Presentation

A High Efficiency Accelerator for Deep Neural Networks

Aliasger Zaidy, Andre Xian Ming Chang, Vinayak Gokhale and Eugenio Culurciello
FWDNXT and Google

Paper Presentation

A Case for Dynamic Activation Quantization in CNNs

Karl Taht, Surya Narayanan and Rajeev Balasubramonian
University of Utah

Paper Presentation

Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit

Seyed Hamed Fatemi Langroudi, Tej N. Pandit and Dhireesha Kudithipudi
Rochester Institute of Technology

Paper Presentation

A Quantization-Friendly Separable Convolution for MobileNets

Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen and Mickey Aleksic

16:00 - 17:00
Invited Talk

Challenges and Solutions for Embedding Vision AI

Charles Qi, Tensilica/Cadence

Recently computer vision and neural network based AI technology have seen explosive demands in embedded systems such as robots, drones, autonomous vehicles, etc. Due to cost and power constraints, it remains quite challenging to achieve satisfactory performance, while maintaining power efficiency and scalability for embedded vision AI. This presentation first analyzes the technical challenges of embedding vision AI, from the perspectives of algorithm complexity, computation and memory BW demands, and constrains of power consumption profile. The analysis shows that modern neural networks for vision AI contain complex topology and diversified computation steps. These neural networks are often part of a large embedded vision processing pipeline, intermixed with conventional vision algorithms. As a result, the vision AI implementation demands several TOPS computation performance and ten’s of GB memory BW. Subsequently the architecture of Tensilica Vision AI DSP processor technology is presented with three distinctive advantages: The optimized instruction sets of Vision P6 and Vision C5 DSP are explained as examples of achieving instruction level computation efficiency and performance. This is coupled with unique processor architecture features for achieving SoC level data processing efficiency and scalability that lead to a high-performance vision AI sub-system. The fully automated AI optimization framework, software libraries and tools provide practical performance tuning methodology and rapid turn-around time for embedded vision AI system design. In conclusion, the presentation offers considerations for future research and development to bring embedded vision AI to the next performance level.

Charles Qi is a system solutions architect in Cadence’s IPG System and Software team, responsible for providing vision system solutions based on the Cadence® Tensilica Vision DSP technology and a broad range of interface IP portfolio. At system level, his primary focus is image sensing, computer vision and deep learning hardware and software for high-performance automotive vision ADAS SoC. Currently he is also an active internal architecture team member for high performance neural network acceleration hardware IPs.

Prior to joining Cadence, Charles held various technical positions in Intel, Broadcom and several high-tech startups.

17:00 - 18:00
Paper Session #2
Paper Presentation

Efficient Compiler Code Generation for Deep Learning Snowflake Co-processor

Andre Xian Ming Chang, Aliasger Zaidy and Eugenio Culurciello

Paper Presentation

Moving CNN Accelerator Computations Closer to Data

Sumanth Gudaparthi, Surya Narayanan and Rajeev Balasubramonian
University of Utah

Paper Presentation

Event Prediction in Processors Using Deep Temporal Models

Tharindu Mathew, Aswin Raghavan, Sek Chai
SRI International

18:00 - 18:45
Invited Talk

Introducing ReQuEST: an Open Platform for Reproducible and Quality-Efficient Systems-ML Tournaments

Grigori Fursin, dividiti and cTuning Foundation link

Co-designing efficient machine learning based systems across the whole application/hardware/software stack to trade off speed, accuracy, energy and costs is becoming extremely complex and time consuming. Researchers often struggle to evaluate and compare different published works across rapidly evolving software frameworks, heterogeneous hardware platforms, compilers, libraries, algorithms, data sets, models, and environments. I will present our community effort to develop an open co-design tournament platform with an online public scoreboard based on Collective Knowledge workflow framework (CK). It gradually incorporates best research practices while providing a common way for multidisciplinary researchers to optimize and compare the quality vs. efficiency Pareto optimality of various workloads on diverse and complete hardware/software systems. All the winning solutions will be made available to the community as portable and customizable “plug&play” components with a common API to accelerate research and innovation!

I will then discuss how our open competition and collaboration can help to achieve energy efficiency for cognitive workloads based on energy-efficient submissions from the 1st ReQuEST tournament co-located with ASPLOS’18. Further details:

Grigori Fursin is the CTO of dividiti and the Chief Scientist of the non-profit cTuning foundation. He is developing an open Collective Knowledge platform to crowdsource multi-objective optimization and co-design of deep learning and other emerging workloads across the whole SW/HW/model stack. Before co-founding dividiti in 2015, he was the head of workload optimization group at Intel Exascale Lab and a senior tenured scientist at INRIA. Grigori has an interdisciplinary background in computer engineering, physics, electronics and machine learning with a PhD in Computer Science from the University of Edinburgh (2004). He is a recipient of a personal INRIA fellowship for “making an outstanding contribution to research” in 2012 and the ACM CGO “test of time” award in 2017. Further info: