Towards More Efficient Distributed Machine Learning Pipelines

The goal of the internship is to work on which operations of a machine learning pipeline can be offloaded to specialized hardware accelerators most efficiently, and whether it is possible to capture a common set of such operators that can be re-used by different machine learning pipelines without requiring the complete reprogramming of the accelerator. The project constitutes of:

  • Detailed study of various machine learning operators and feature engineering transformations to understand their computation complexity and space requirements
  • Benchmarking their performance and characterizing their (expected) behavior on GPUs, FPGAs and similar accelerators
  • Implementing a proof of concept “splitting” of a processing pipeline across multiple nodes in the network

It is expected that the results of this project will form the basis of further work in hardware-accelerated operations in distributed ML frameworks, and this way a likely publication in a top-tier conference.


You graduated from, or are currently enrolled in, a Bachelor’s or Master’s program in Computer Science, Electrical Enginerring, Telecommunications or a related field, with experience in at least 3 of the following 4 areas:

  • C/C++ and Python programming
  • FPGA programming
  • Distributed Systems
  • Machine Learning Algorithms

Mor info: