The 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing

Co-located with the 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS 2018

Sunday, March 25, 2018
Williamsburg, VA
Room: Allegheny Room A
Half Day (Afternoon Session)

description Workshop Objective

As artificial intelligence and other forms of cognitive computing continue to proliferate into new domains, many forums for dialogue and knowledge sharing have emerged. In the proposed workshop, the primary focus is on the exploration of energy efficient techniques and architectures for cognitive computing and machine learning, particularly for applications and systems running at the edge. For such resource constrained environments, performance alone is never sufficient, requiring system designers to carefully balance performance with power, energy, and area (overall PPA metric).

The goal of this workshop is to provide a forum for researchers who are exploring novel ideas in the field of energy efficient machine learning and artificial intelligence for a variety of applications. We also hope to provide a solid platform for forging relationships and exchange of ideas between the industry and the academic world through discussions and active collaborations.

chat Call for Papers

A new wave of intelligent computing, driven by recent advances in machine learning and cognitive algorithms coupled with process technology and new design methodologies, has the potential to usher unprecedented disruption in the way conventional computing solutions are designed and deployed. These new and innovative approaches often provide an attractive and efficient alternative not only in terms of performance but also power, energy, and area. This disruption is easily visible across the whole spectrum of computing systems -- ranging from low end mobile devices to large scale data centers and servers.

A key class of these intelligent solutions is providing real-time, on-device cognition at the edge to enable many novel applications including vision and image processing, language translation, autonomous driving, malware detection, and gesture recognition. Naturally, these applications have diverse requirements for performance,energy, reliability, accuracy, and security that demand a holistic approach to designing the hardware, software, and intelligence algorithms to achieve the best power, performance, and area (PPA).

format_list_bulleted Topics for the Workshop

  • Architectures for the edge: IoT, automotive, and mobile
  • Approximation, quantization reduced precision computing
  • Hardware/software techniques for sparsity
  • Neural network architectures for resource constrained devices
  • Neural network pruning, tuning and and automatic architecture search
  • Novel memory architectures for machine learing
  • Communication/computation scheduling for better performance and energy
  • Load balancing and efficient task distribution techniques
  • Exploring the interplay between precision, performance, power and energy
  • Exploration of new and efficient applications for machine learning
  • Characterization of machine learning benchmarks and workloads
  • Performance profiling and synthesis of workloads
  • Simulation and emulation techniques, frameworks and platforms for machine learning
  • Power, performance and area (PPA) based comparison of neural networks
  • Verification, validation and determinism in neural networks
  • Efficient on-device learning techniques
  • Security, safety and privacy challenges and building secure AI systems
13:30 - 14:30

Safety and Security at the Heart of Autonomous Driving

Kamal Khouri, NXP Semiconductors

The automotive industry is undergoing a revolution with connected, autonomous and electric vehicles and the benefits they can bring to the public. Drivers enjoying their daily commute, fewer road fatalities and less pollution are all possible thanks to new technologies. Car makers need to offer these features but at the same time make sure vehicles are safe and secure. In the coming years, there will be various levels of automation until we have fully autonomous vehicles. To achieve any level of automation, cars need to connect to other vehicles, connect to the infrastructure, sense the environment through various sensors such as camera and radar and then make maneuvering decisions based on all these inputs. Artificial intelligence is and will be deployed heavily to accomplish many of the tasks of autonomous driving. Perception and decision-making based on artificial intelligence introduces an entirely new set of challenges to car makers to ensure no security compromises as well as proving the decisions being made are functionally, behaviorally and environmentally safe. The challenge can be described in a simple question: “If a machine learning based car system is accurate 99% of the time, are you willing to ride this car knowing that it will be wrong 1% of the time? What is the consequence of that incorrect decision?” Deep expertise and research in the safety and security aspects of AI are needed to ensure future mass deployment and success in the area of autonomous driving.

Kamal Khouri is General Manager & Vice-president of Automotive Microcontrollers and Processors for ADAS Product Line at NXP Semiconductors. Kamal holds a BS in Electrical Engineering from Bucknell University and a Masters and Ph.D. in Computer Engineering from Princeton University. He has over 17 years of semiconductor industry experience with multiple patents and over 25 publications. He started his career at Motorola SPS and later Freescale working in various roles in engineering and product management within the compute and networking divisions. Kamal was also Director of Products for various businesses at AMD, ranging from embedded to gaming and semi-custom products. At NXP, his team is now defining the future of autonomous vehicles and the processing power they need to make them a reality.

14:30 - 15:50
Paper Session #1
Paper Presentation

A High Efficiency Accelerator for Deep Neural Networks

Aliasger Zaidy, Andre Xian Ming Chang, Vinayak Gokhale and Eugenio Culurciello
FWDNXT and Google

Paper Presentation

A Case for Dynamic Activation Quantization in CNNs

Karl Taht, Surya Narayanan and Rajeev Balasubramonian
University of Utah

Paper Presentation

Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit

Seyed Hamed Fatemi Langroudi, Tej N. Pandit and Dhireesha Kudithipudi
Rochester Institute of Technology

Paper Presentation

A Quantization-Friendly Separable Convolution for MobileNets

Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen and Mickey Aleksic

16:00 - 17:00
Invited Talk

Challenges and Solutions for Embedding Vision AI

Charles Qi, Tensilica/Cadence

Recently computer vision and neural network based AI technology have seen explosive demands in embedded systems such as robots, drones, autonomous vehicles, etc. Due to cost and power constraints, it remains quite challenging to achieve satisfactory performance, while maintaining power efficiency and scalability for embedded vision AI. This presentation first analyzes the technical challenges of embedding vision AI, from the perspectives of algorithm complexity, computation and memory BW demands, and constrains of power consumption profile. The analysis shows that modern neural networks for vision AI contain complex topology and diversified computation steps. These neural networks are often part of a large embedded vision processing pipeline, intermixed with conventional vision algorithms. As a result, the vision AI implementation demands several TOPS computation performance and ten’s of GB memory BW. Subsequently the architecture of Tensilica Vision AI DSP processor technology is presented with three distinctive advantages: The optimized instruction sets of Vision P6 and Vision C5 DSP are explained as examples of achieving instruction level computation efficiency and performance. This is coupled with unique processor architecture features for achieving SoC level data processing efficiency and scalability that lead to a high-performance vision AI sub-system. The fully automated AI optimization framework, software libraries and tools provide practical performance tuning methodology and rapid turn-around time for embedded vision AI system design. In conclusion, the presentation offers considerations for future research and development to bring embedded vision AI to the next performance level.

Charles Qi is a system solutions architect in Cadence’s IPG System and Software team, responsible for providing vision system solutions based on the Cadence® Tensilica Vision DSP technology and a broad range of interface IP portfolio. At system level, his primary focus is image sensing, computer vision and deep learning hardware and software for high-performance automotive vision ADAS SoC. Currently he is also an active internal architecture team member for high performance neural network acceleration hardware IPs.

Prior to joining Cadence, Charles held various technical positions in Intel, Broadcom and several high-tech startups.

17:00 - 18:00
Paper Session #2
Paper Presentation

Efficient Compiler Code Generation for Deep Learning Snowflake Co-processor

Andre Xian Ming Chang, Aliasger Zaidy and Eugenio Culurciello

Paper Presentation

Moving CNN Accelerator Computations Closer to Data

Sumanth Gudaparthi, Surya Narayanan and Rajeev Balasubramonian
University of Utah

Paper Presentation

Event Prediction in Processors Using Deep Temporal Models

Tharindu Mathew, Aswin Raghavan, Sek Chai
SRI International

18:00 - 18:45
Invited Talk

Introducing ReQuEST: an Open Platform for Reproducible and Quality-Efficient Systems-ML Tournaments

Grigori Fursin, dividiti and cTuning Foundation link

Co-designing efficient machine learning based systems across the whole application/hardware/software stack to trade off speed, accuracy, energy and costs is becoming extremely complex and time consuming. Researchers often struggle to evaluate and compare different published works across rapidly evolving software frameworks, heterogeneous hardware platforms, compilers, libraries, algorithms, data sets, models, and environments. I will present our community effort to develop an open co-design tournament platform with an online public scoreboard based on Collective Knowledge workflow framework (CK). It gradually incorporates best research practices while providing a common way for multidisciplinary researchers to optimize and compare the quality vs. efficiency Pareto optimality of various workloads on diverse and complete hardware/software systems. All the winning solutions will be made available to the community as portable and customizable “plug&play” components with a common API to accelerate research and innovation!

I will then discuss how our open competition and collaboration can help to achieve energy efficiency for cognitive workloads based on energy-efficient submissions from the 1st ReQuEST tournament co-located with ASPLOS’18. Further details:

Grigori Fursin is the CTO of dividiti and the Chief Scientist of the non-profit cTuning foundation. He is developing an open Collective Knowledge platform to crowdsource multi-objective optimization and co-design of deep learning and other emerging workloads across the whole SW/HW/model stack. Before co-founding dividiti in 2015, he was the head of workload optimization group at Intel Exascale Lab and a senior tenured scientist at INRIA. Grigori has an interdisciplinary background in computer engineering, physics, electronics and machine learning with a PhD in Computer Science from the University of Edinburgh (2004). He is a recipient of a personal INRIA fellowship for “making an outstanding contribution to research” in 2012 and the ACM CGO “test of time” award in 2017. Further info: