The 6th Workshop on Energy Efficient Machine Learning and Cognitive Computing
description Workshop Objective
As artificial intelligence and other forms of cognitive computing continue to proliferate into new domains, many forums for dialogue and knowledge sharing have emerged. In the proposed workshop, the primary focus is on the exploration of energy efficient techniques and architectures for cognitive computing and machine learning, particularly for applications and systems running at the edge. For such resource constrained environments, performance alone is never sufficient, requiring system designers to carefully balance performance with power, energy, and area (overall PPA metric).
The goal of this workshop is to provide a forum for researchers who are exploring novel ideas in the field of energy efficient machine learning and artificial intelligence for a variety of applications. We also hope to provide a solid platform for forging relationships and exchange of ideas between the industry and the academic world through discussions and active collaborations.
chat Call for Papers
A new wave of intelligent computing, driven by recent advances in machine learning and cognitive algorithms coupled with process technology and new design methodologies, has the potential to usher unprecedented disruption in the way conventional computing solutions are designed and deployed. These new and innovative approaches often provide an attractive and efficient alternative not only in terms of performance but also power, energy, and area. This disruption is easily visible across the whole spectrum of computing systems -- ranging from low end mobile devices to large scale data centers and servers.
A key class of these intelligent solutions is providing real-time, on-device cognition at the edge to enable many novel applications including vision and image processing, language translation, autonomous driving, malware detection, and gesture recognition. Naturally, these applications have diverse requirements for performance,energy, reliability, accuracy, and security that demand a holistic approach to designing the hardware, software, and intelligence algorithms to achieve the best power, performance, and area (PPA).
format_list_bulleted Topics for the Workshop
- Architectures for the edge: IoT, automotive, and mobile
- Approximation, quantization reduced precision computing
- Hardware/software techniques for sparsity
- Neural network architectures for resource constrained devices
- Neural network pruning, tuning and and automatic architecture search
- Novel memory architectures for machine learing
- Communication/computation scheduling for better performance and energy
- Load balancing and efficient task distribution techniques
- Exploring the interplay between precision, performance, power and energy
- Exploration of new and efficient applications for machine learning
- Characterization of machine learning benchmarks and workloads
- Performance profiling and synthesis of workloads
- Simulation and emulation techniques, frameworks and platforms for machine learning
- Power, performance and area (PPA) based comparison of neural networks
- Verification, validation and determinism in neural networks
- Efficient on-device learning techniques
- Security, safety and privacy challenges and building secure AI systems
09:00 - 10:00
Energy Efficient Machine Learning on Encrypted Data: Hardware to the Rescue
Machine Learning on encrypted data is a yet-to-be-addressed challenge. Several recent key advances across different layers of the system, from cryptography and mathematics to logic synthesis and hardware are paving the way for energy-efficient realization of privacy preserving computing for certain target applications. This keynote talk highlights the crucial role of hardware and advances in computing architecture in supporting the recent progresses in the field. I outline the main technologies and mixed computing models. I particularly center my talk on the recent progress in synthesis of Garbled Circuits that provide a leap in scalable realization of energy efficient machine learning on encrypted data. I explore how hardware could pave the way for navigating the complex parameter selection and scalable future mixed protocol solutions. I conclude by briefly discussing the challenges and opportunities moving forward.
Farinaz Koushanfar received her Ph.D. in electrical engineering and computer science as well as an M.A. in Statistics from the University of California, Berkeley, in 2005. From 2006 to 2015, she was a faculty in Rice University where she served as assistant, associate and full professor of electrical and computer engineering. Her primary research interests are domain-specific computing, embedded systems, secure computing, protection of hardware, embedded and IoT systems, as well as design automation, in particular automation of emerging data driven learning and massive data analytic algorithms. At UC San Diego, she plans to continue her work on next generation of efficient and secure data-driven computing and embedded/IoT devices and systems.
10:00 - 10:40
DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks
In the past decade, deep neural networks (DNNs) have shown state-of-the-art performance on a wide range of complex machine learning tasks. Many of these results have been achieved while growing the size of DNNs, creating a demand for efficient compression and transmission of them. This talk will present DeepCABAC, a universal compression algorithm for DNNs that through its adaptive, context-based rate modeling, allows an optimal quantization and coding of neural network parameters. It compresses state-of-the-art DNNs up to 1.5% of their original size with no accuracy loss and has been selected as basic compression technology for the emerging MPEG-7 part 17 standard on DNN compression.
Wojciech Samek has founded and is heading the Machine Learning Group at Fraunhofer Heinrich Hertz Institute since 2014. He studied computer science at Humboldt University of Berlin, Heriot-Watt University and University of Edinburgh from 2004 to 2010 and received the Dr. rer. nat. degree with distinction (summa cum laude) from the Technical University of Berlin in 2014. In 2009 he was visiting researcher at NASA Ames Research Center, Mountain View, CA, and in 2012 and 2013 he had several short-term research stays at ATR International, Kyoto, Japan. He was awarded scholarships from the European Union’s Erasmus Mundus programme, the Studienstiftung des deutschen Volkes and the DFG Research Training Group GRK 1589/1. He is PI at the Berlin Institute for the Foundation of Learning and Data (BIFOLD), member of the European Lab for Learning and Intelligent Systems (ELLIS) and associated faculty at the DFG graduate school BIOQIC. Furthermore, he is an editorial board member of Digital Signal Processing, PLOS ONE and IEEE TNNLS and an elected member of the IEEE MLSP Technical Committee. He has organized special sessions, workshops and tutorials at top-tier machine learning conferences (NIPS, ICML, CVPR, ICASSP, MICCAI), has received multiple best paper awards, and has authored more than 100 journal and conference papers, predominantly in the areas deep learning, interpretable machine learning, neural network compression and federated learning.
10:40 - 11:20
Efficient Machine Learning via Data Summarization
Large datasets have been crucial to the success of modern machine learning models. However, training on massive data has two major limitations. First, it is contingent on exceptionally large and expensive computational resources, and incurs a substantial cost due to the significant energy consumption. Second, in many real-world applications such as medical diagnosis, self-driving cars, and fraud detection, big data contains highly imbalanced classes and noisy labels. In such cases, training on the entire data does not result in a high-quality model.
In this talk, I will argue that we can address the above limitations by developing techniques that can identify and extract the representative subsets from massive datasets. Training on representative subsets not only reduces the substantial costs of learning from big data, but also improves their accuracy and robustness against noisy labels. I will present two key aspects to achieve this goal: (1) extracting the representative data points by summarizing massive datasets; and (2) developing efficient optimization methods to learn from the extracted summaries. I will discuss how we can develop theoretically rigorous techniques that provide strong guarantees for the quality of the extracted summaries, and the learned models’ quality and robustness against noisy labels. I will also show the applications of these techniques to several problems, including summarizing massive image collections, online video summarization, and speeding up training machine learning models.
Baharan Mirzasoleiman is Assistant Professor in Computer Science Department at UCLA. Her research focuses on developing new methods that enable efficient machine learning from massive datasets. More specifically, I am interested in designing techniques that can gain insights from the underlying data structure by utilizing complex and higher-order interactions between data points. The extracted information can be used to efficiently explore and robustly learn from datasets that are too large to be dealt with by traditional approaches. My methods have immediate application to high-impact problems where massive data volumes prohibit efficient learning and inference, such as huge image collections, recommender systems, Web and social services, video and other large data streams. Before joining UCLA, she was a postdoctoral research fellow in Computer Science at Stanford University working with Jure Leskovec. I received my Ph.D. in Computer Science from ETH Zurich advised by Andreas Krause. I received an ETH medal for Outstanding Doctoral Thesis, and was selected as a Rising Star in EECS by MIT.
11:20 - 12:00
Modular Neural Networks for Low-Power Image Classification on Embedded Devices
Embedded devices are generally small, battery-powered computers with limited hardware resources. It is difficult to run Deep Neural Networks (DNNs) on these devices, because DNNs perform millions of operations, and consume significant amounts of energy. Prior research has shown that a considerable number of a DNN’s memory accesses and computation is redundant when performing tasks like image classification. To reduce this redundancy and thereby reduce the energy consumption of DNNs, we introduce the Modular Neural Network-Tree (MNN-Tree) architecture. Instead of using one large DNN for the classifier, this architecture uses multiple smaller DNNs (called modules) to progressively classify images into groups of categories based on a novel visual similarity metric. Once a group of categories is selected by a module, another module then continues to distinguish among the similar categories within the selected group. This process is repeated over multiple modules until we are left with a single category. The computation needed to distinguish dissimilar groups is avoided, thus reducing redundant operations, memory accesses, and energy. Experimental results using several image datasets reveal the effectiveness of our proposed solution to reduce memory requirements by 50%-99%, inference time by 55%-95%, energy consumption by 52%-94%, and the number of operations by 15%-99% when compared with existing DNN architectures, running on two different embedded systems: Raspberry Pi 3 and Raspberry Pi Zero.
Dr. Yung-Hsiang Lu is a professor at the School of Electrical and Computer Engineering at Purdue University, West Lafayette, Indiana, USA. He is the inaugural director of Purdue’s John Martinson Entrepreneurship Center. He is a Distinguished Scientist of the ACM. He received the PhD. from Stanford University and BS from the National Taiwan University. Computing Reviews said, “If you land on a desert island that has a Linux computer, this is the one book to have with you.” about his book “Intermediate C Programming” (CRC Press).
13:00 - 13:40
Techniques for Efficient Inference with Deep Networks
Raghu Krishnamoorthi, Facebook
Efficient inference is a problem of great practical interest for both on-device AI and server side applications. In this talk, I will talk about quantization for efficient inference and discuss practical approaches to get the best performance and accuracy when you deploy a model for inference. I will conclude the talk by touching upon sparsity which can provide further performance improvements on top of quantization.
Raghuraman Krishnamoorthi is a software engineer in the Pytorch team at Facebook, where he leads the effort to optimize deep networks for inference, with a focus on quantization. Prior to this he was part of the Tensorflow team at Google working on quantization for mobile inference as part of Tensorflow Lite. From 2001 to 2017, Raghu was at Qualcomm Research, working on several generations of wireless technologies. His work experience also includes computer vision for AR, ultra-low power, always-on vision, hardware/software co-design for inference on mobile platforms and modem development. He is an inventor in more than 90 issued and filed patents. Raghu has a Masters degree in EE from University of Illinois, Urbana Champaign and a Bachelors degree from the Indian Institute of Technology, Madras.
13:40 - 14:20
Designing Nanosecond Inference Engines for the Particle Collider
Clauidonor N. Coelho Jr, (Palo Alto Networks) Thea Aarrestad (CERN) Vladimir Loncar (CERN) Maurizio Pierini (CERN) Adrian Alan Pol (CERN) Sioni Summers (CERN), Palo Alto Networks, CERN
While the quest for more accurate solutions is pushing deep learning research towards larger and more complex algorithms, edge devices with hard real-time constraints demand very efficient inference engines, e.g. with the reduction in model size, speed and energy consumption. In this talk, we introduce a novel method for designing heterogeneously quantized versions of deep neural network models for minimum-energy, high-accuracy, nanosecond inference and fully automated deployment on chip. Our technique combines AutoML and QKeras (which is called AutoQKeras), combining layer hyperparameter selection and quantization optimization. Users can select among several optimization strategies, such as global optimization of network hyperparameters and quantizers, or splitting the optimization problems into smaller search problems to cope with search complexity. We have applied this design technique for the event selection procedure in proton-proton collisions at the CERN Large Hadron Collider, where resources are strictly limited and latency of O(1) us is required. Nanosecond inference and a resource consumption reduced by a factor of 50 when implemented on FPGA hardware are achieved.
14:20 - 15:00
Efficient Deep Learning At Scale
Though the research on hardware acceleration for neural networks has been extensively studied, the progress of hardware development still falls far behind the upscaling of DNN models at the software level. The efficient deployment of DNN models emerges as a major challenge. For example, the massive number of parameters and high computation demand make it challenging to deploy state-of-the-art DNNs onto resource-constrained devices. Compared to inference, training a DNN is much more complicated and has more significant computation and communication intensity. A common practice is distributing the training on multiple nodes or heterogeneous accelerators, while the balance between the data processing and exchange remains critical. We envision that software/hardware co-design for efficient deep learning is necessary. This talk will present our latest explorations on DNN model compression, architecture search, distributed learning, and corresponding optimization at the hardware level.
Hai “Helen” Li is Clare Boothe Luce Professor and Associate Chair for Operations of the Department of Electrical and Computer Engineering at Duke University. She received her B.S and M.S. from Tsinghua University and Ph.D. from Purdue University. At Duke, she co-directs Duke University Center for Computational Evolutionary Intelligence and NSF IUCRC for Alternative Sustainable and Intelligent Computing (ASIC). Her research interests include machine learning acceleration and security, neuromorphic circuit and system for brain-inspired computing, conventional and emerging memory design and architecture, and software and hardware co-design. She received the NSF CAREER Award, the DARPA Young Faculty Award, TUM-IAS Hans Fischer Fellowship from Germany, ELATE Fellowship, eight best paper awards and another nine best paper nominations. Dr. Li is a fellow of IEEE and a distinguished member of ACM. For more information, please see her webpage at http://cei.pratt.duke.edu/.