EMC^2: EMC2 - Energy Efficient Machine Learning and Cognitive Computing

Sunday, March 25, 2018 Room: Allegheny Room A Half Day (Afternoon Session)

Program
CFP

In the Eleventh edition of EMC2 workshop, we plan to facilitate conversation about the sustainability of large-scale AI computing systems being developed to meet the ever-increasing demands of generative AI. This involves discussions spanning multiple interrelated areas. First, we continue to serve as the leading forums for discussing the energy-efficiency aspect of GenAI workloads which directly impact the overall viability and economic value of AI technology. Second, we reassess the scaling laws of AI with the prevalence of agentic, multi-modal, and reasoning-based models in conjunction with novel techniques such as a highly sparse expert architecture and disaggregated computation. Finally, we discuss sustainable and high-performance computing paradigms towards efficient datacenters and hybrid computing models that can cater to the exponential growth in model sizes, application areas, anduser base. This would allow us to explore ideas to build the hardware, software, systems, and scaling infrastructure, as well as model architectures that make AI technology even more prevalent and accessible.

The goal of this Workshop is to provide a forum for researchers and industry experts who are exploring novel ideas, tools and techniques to improve the energy efficiency of MLLMs as it is practised today and would evolve in the next decade. We envision that only through close collaboration between industry and the academia we will be able to address the difficult challenges and opportunities of reducing the carbon footprint of AI and its uses. We have tailored our program to best serve the participants in a fully digital setting. Our forum facilitates active exchange of ideas through:

Keynotes, invited talks and discussion panels by leading researchers from industry and academia
Peer-reviewed papers on latest solutions including works-in-progress to seek directed feedback from experts
Independent publication of proceedings through IEEE CPS

We invite full-length papers describing original, cutting-edge, and even work-in-progress research projects about efficient machine learning. Suggested topics for papers include, but are not limited to the ones listed below:

Neural network architectures for resource constrained applications.
Efficient hardware designs to implement neural networks including sparsity, locality, and systolic designs.
Power and performance efficient memory architectures suited for neural networks.
Network reduction techniques – approximation, quantization, reduced precision, pruning, distillation, and reconfiguration.
Exploring interplay of precision, performance, power, and energy through benchmarks, workloads, and characterization.
Performance potential, limit studies, bottleneck analysis, profiling, and synthesis of workloads.
Explorations and architctures aimed to promote sustainable computing.
Simulation and emulation techniques, frameworks, tools, and platforms for machine learning.
Optimizations to improve performance of training techniques including on-device and large-scale learning.
Load balancing and efficient task distribution, communication and computation overlapping for optimal performance.
Verification, validation, determinism, robustness, bias, safety, and privacy challenges in AI systems.
Efficient deployment strategies for edge and distributed environments.
Model compression and optimization techniques that preserve reasoning and problem-solving capabilities.
Architectures and frameworks for multi-agent systems and retrieval-augmented generation (RAG) pipelines.
Systems-level approaches for scaling future foundation models (e.g., Llama 4, GPT-5 and beyond).

13:30 - 14:30

Keynote

Presentation

Safety and Security at the Heart of Autonomous Driving

Kamal Khouri, NXP Semiconductors

The automotive industry is undergoing a revolution with connected, autonomous and electric vehicles and the benefits they can bring to the public. Drivers enjoying their daily commute, fewer road fatalities and less pollution are all possible thanks to new technologies. Car makers need to offer these features but at the same time make sure vehicles are safe and secure. In the coming years, there will be various levels of automation until we have fully autonomous vehicles. To achieve any level of automation, cars need to connect to other vehicles, connect to the infrastructure, sense the environment through various sensors such as camera and radar and then make maneuvering decisions based on all these inputs. Artificial intelligence is and will be deployed heavily to accomplish many of the tasks of autonomous driving. Perception and decision-making based on artificial intelligence introduces an entirely new set of challenges to car makers to ensure no security compromises as well as proving the decisions being made are functionally, behaviorally and environmentally safe. The challenge can be described in a simple question: “If a machine learning based car system is accurate 99% of the time, are you willing to ride this car knowing that it will be wrong 1% of the time? What is the consequence of that incorrect decision?” Deep expertise and research in the safety and security aspects of AI are needed to ensure future mass deployment and success in the area of autonomous driving.

Kamal Khouri is General Manager & Vice-president of Automotive Microcontrollers and Processors for ADAS Product Line at NXP Semiconductors. Kamal holds a BS in Electrical Engineering from Bucknell University and a Masters and Ph.D. in Computer Engineering from Princeton University. He has over 17 years of semiconductor industry experience with multiple patents and over 25 publications. He started his career at Motorola SPS and later Freescale working in various roles in engineering and product management within the compute and networking divisions. Kamal was also Director of Products for various businesses at AMD, ranging from embedded to gaming and semi-custom products. At NXP, his team is now defining the future of autonomous vehicles and the processing power they need to make them a reality.

14:30 - 15:50

Paper Session #1

Paper Presentation

A High Efficiency Accelerator for Deep Neural Networks

Aliasger Zaidy, Andre Xian Ming Chang, Vinayak Gokhale and Eugenio Culurciello

FWDNXT and Google

Paper Presentation

A Case for Dynamic Activation Quantization in CNNs

Karl Taht, Surya Narayanan and Rajeev Balasubramonian

University of Utah

Paper Presentation

Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit

Seyed Hamed Fatemi Langroudi, Tej N. Pandit and Dhireesha Kudithipudi

Rochester Institute of Technology

Paper Presentation

A Quantization-Friendly Separable Convolution for MobileNets

Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen and Mickey Aleksic

Qualcomm

16:00 - 17:00

Invited Talk

Presentation

Challenges and Solutions for Embedding Vision AI

Charles Qi, Tensilica/Cadence

Recently computer vision and neural network based AI technology have seen explosive demands in embedded systems such as robots, drones, autonomous vehicles, etc. Due to cost and power constraints, it remains quite challenging to achieve satisfactory performance, while maintaining power efficiency and scalability for embedded vision AI. This presentation first analyzes the technical challenges of embedding vision AI, from the perspectives of algorithm complexity, computation and memory BW demands, and constrains of power consumption profile. The analysis shows that modern neural networks for vision AI contain complex topology and diversified computation steps. These neural networks are often part of a large embedded vision processing pipeline, intermixed with conventional vision algorithms. As a result, the vision AI implementation demands several TOPS computation performance and ten’s of GB memory BW. Subsequently the architecture of Tensilica Vision AI DSP processor technology is presented with three distinctive advantages: The optimized instruction sets of Vision P6 and Vision C5 DSP are explained as examples of achieving instruction level computation efficiency and performance. This is coupled with unique processor architecture features for achieving SoC level data processing efficiency and scalability that lead to a high-performance vision AI sub-system. The fully automated AI optimization framework, software libraries and tools provide practical performance tuning methodology and rapid turn-around time for embedded vision AI system design. In conclusion, the presentation offers considerations for future research and development to bring embedded vision AI to the next performance level.

Charles Qi is a system solutions architect in Cadence’s IPG System and Software team, responsible for providing vision system solutions based on the Cadence® Tensilica Vision DSP technology and a broad range of interface IP portfolio. At system level, his primary focus is image sensing, computer vision and deep learning hardware and software for high-performance automotive vision ADAS SoC. Currently he is also an active internal architecture team member for high performance neural network acceleration hardware IPs.

Prior to joining Cadence, Charles held various technical positions in Intel, Broadcom and several high-tech startups.

17:00 - 18:00

Paper Session #2

Paper Presentation

Efficient Compiler Code Generation for Deep Learning Snowflake Co-processor

Andre Xian Ming Chang, Aliasger Zaidy and Eugenio Culurciello

FWDNXT

Paper Presentation

Moving CNN Accelerator Computations Closer to Data

Sumanth Gudaparthi, Surya Narayanan and Rajeev Balasubramonian

University of Utah

Paper Presentation

Event Prediction in Processors Using Deep Temporal Models

Tharindu Mathew, Aswin Raghavan, Sek Chai

SRI International

18:00 - 18:45

Invited Talk

Presentation

Introducing ReQuEST: an Open Platform for Reproducible and Quality-Efficient Systems-ML Tournaments

Grigori Fursin, dividiti and cTuning Foundation

Co-designing efficient machine learning based systems across the whole application/hardware/software stack to trade off speed, accuracy, energy and costs is becoming extremely complex and time consuming. Researchers often struggle to evaluate and compare different published works across rapidly evolving software frameworks, heterogeneous hardware platforms, compilers, libraries, algorithms, data sets, models, and environments. I will present our community effort to develop an open co-design tournament platform with an online public scoreboard based on Collective Knowledge workflow framework (CK). It gradually incorporates best research practices while providing a common way for multidisciplinary researchers to optimize and compare the quality vs. efficiency Pareto optimality of various workloads on diverse and complete hardware/software systems. All the winning solutions will be made available to the community as portable and customizable “plug&play” components with a common API to accelerate research and innovation!

I will then discuss how our open competition and collaboration can help to achieve energy efficiency for cognitive workloads based on energy-efficient submissions from the 1st ReQuEST tournament co-located with ASPLOS’18. Further details: http://cKnowledge.org/request

Grigori Fursin is the CTO of dividiti and the Chief Scientist of the non-profit cTuning foundation. He is developing an open Collective Knowledge platform to crowdsource multi-objective optimization and co-design of deep learning and other emerging workloads across the whole SW/HW/model stack. Before co-founding dividiti in 2015, he was the head of workload optimization group at Intel Exascale Lab and a senior tenured scientist at INRIA. Grigori has an interdisciplinary background in computer engineering, physics, electronics and machine learning with a PhD in Computer Science from the University of Edinburgh (2004). He is a recipient of a personal INRIA fellowship for “making an outstanding contribution to research” in 2012 and the ACM CGO “test of time” award in 2017. Further info: http://fursin.net/research

Energy Efficient Machine Learning and Cognitive Computing

1st Edition

Co-located with ASPLOS 2018 in Williamsburg, VA

description Workshop Objective

chat Call for Papers

format_list_bulleted Topics for the Workshop

We will follow that same formatting guidelines and duplicate submission policies as ASPLOS.

13:30 - 14:30

Keynote

Safety and Security at the Heart of Autonomous Driving

Kamal Khouri, NXP Semiconductors

14:30 - 15:50

Paper Session #1

A High Efficiency Accelerator for Deep Neural Networks

Aliasger Zaidy, Andre Xian Ming Chang, Vinayak Gokhale and Eugenio Culurciello

FWDNXT and Google

A Case for Dynamic Activation Quantization in CNNs

Karl Taht, Surya Narayanan and Rajeev Balasubramonian

University of Utah

Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit

Seyed Hamed Fatemi Langroudi, Tej N. Pandit and Dhireesha Kudithipudi

Rochester Institute of Technology

A Quantization-Friendly Separable Convolution for MobileNets

Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen and Mickey Aleksic

Qualcomm

16:00 - 17:00

Invited Talk

Challenges and Solutions for Embedding Vision AI

Charles Qi, Tensilica/Cadence

17:00 - 18:00

Paper Session #2

Efficient Compiler Code Generation for Deep Learning Snowflake Co-processor

Andre Xian Ming Chang, Aliasger Zaidy and Eugenio Culurciello

FWDNXT

Moving CNN Accelerator Computations Closer to Data

Sumanth Gudaparthi, Surya Narayanan and Rajeev Balasubramonian

University of Utah

Event Prediction in Processors Using Deep Temporal Models

Tharindu Mathew, Aswin Raghavan, Sek Chai

SRI International

18:00 - 18:45

Invited Talk

Introducing ReQuEST: an Open Platform for Reproducible and Quality-Efficient Systems-ML Tournaments

Grigori Fursin, dividiti and cTuning Foundation link

Workshop Objective

Call for Papers

Topics for the Workshop

Grigori Fursin, dividiti and cTuning Foundation