Stan Furrer

About Me

I am passionate about the foundation of AI (GenAI, ML, DL, CV) and its deployment and management in industrial settings.

Over the past three years, I have been working in both academic and industrial environments, advancing the frontiers of research
and integrating the latest findings in Generative AI, Deep Learning, and Machine Learning into impactful softwares and applications.

I see the modern practice of a Machine Learning software engineer as follow;

Staying updated with the latest advancements in artificial intelligence..
Employing an iterative software development approach with continuous monitoring and evaluation, both online and offline.
Building robust unit tests, optimizing code, and managing memory and resources efficiently.
Utilizing strong communication skills to enhance collaboration with domain experts.
Constantly challenging initial assumptions to foster innovation.

Technically I am deeply interested into

Distributed design (Scrum, Kubernetes).
Inference services (Docker, ONNX).
Agile Monitoring and Testing (Weight & Biases, Changelogs, Data Contract, Pytest).
Pyspark Optimization (Yarn UI, cache usage, Disk spillage, partitioning...)
Memory and computational optimization (tensorRT, AMP, Pruning).

While(True) : Sensors -> Data Wrangling -> Machine Learning -> Real Word
[PSEUDO CODE 1.] Synergy of my bachelor's in Microengineering and my Master's in Data Science and Robotics

I am open to connect for Machine Learning / Data Science / Software Engineer related topics

Skills

Programation	Proficiency : Python, Pyspark Good : C/C++, SQL, HTML, CSS
Machine Learning Library	PyTorch, PyTorch Lightning, TensorFlow, Keras, Scikit-Learn
Development Tools	Git & Github, bash/UNIX, Jira
Cloud and Deployment	AWS, Azure, Dockers
Operating Systems	Linux, Windows
CPU/GPU Computing	SLURM, Kubernetes, DDP and Multiprocessing

My Professional Experiences

Sr. AI Engineer

I focus on integrating the latest AI research into our products at IBM.
I primary work on developing GenAI agentic framework in multimodal settings (Text, Image, Audio) with optimal memory management, function calling and integration with various software and tools. In addition I implement hardware-efficient finetuning techniques for LLMs to enhance this framework continuously.
In parallel, I work on creating audio-driven, human-like talking faces to automate interactions with IBM clients
Extra Contribution
Contribute to IBM internal innitiatives in Research (Deep Learning).

Feb 2024 - Today

Data Scientist

Primary Tasks Reducing Risk in Hedge Fund Investment
I developed and scale production ML/Data pipelines in Python/PySpark to bolster Hedge Fund Investment. I created and implemented production LLM frameworks (RAG, Multi-Agents), guiding over 50 collaborators in their daily investments. My work encompasses building ML/Data solutions from data modeling, large-scale ETL, ML development, inference, monitoring, and unit testing.

I also drove the MLOps culture through iterative improvements in monitoring, data analysis, and CI/CD processes. I served as the Deputy Tech Lead and Agile Team Delegate, leading the CI/CD cycle and mentoring 10+ developers.
Extra Contribution
Enhancing Spark Optimization and Memory & Resources management in our development and review processes.

Feb 2022 - Feb 2024

Research Assistant in Applied Machine Learning | EPFL-LASA

Research Project in Prof. Aude Billard’s laboratory of Learning Algorithms and Systems Laboratory (LASA) at EPFL. The project’s target was to train, test, and deploy a real-time multi modal algorithm based on sound and tactile sensors to enhance the grasp stability of industrial robots in high-speed settings.
Great opportunity to connected Ai practices with real worlds issues.
Phase1. Implementation in Python, Phase2.. Implementation in C++ on ROS
Challenges : Optimization of the inference time and prediction robust to environment's changes.

Feb 2021 - June 2021 || AAAI AI-HRI 2021

Video | ArXiv | Code

Machine Learning Engineer Intern

I led a deep learning project for early tremor detection and diagnosis reaching a F1-score of 0.91. The project leveraged keyboard and mouse usage data to anticipate various human diseases and focus on interpretability. Logitech focus on prioritizing MLOps, therefore I adopted iterative improvements through constant feedback monitoring and data analysis. I had the chance to collaborate in parallel to development and deployment of other research project in Deep Learning and Computer Visions at Logitech.
Strong collaboration with the hospitals of CHUV (Lausanne) and HUG (Geneva).
Challenges : Calculating score such as a rare user's behaviours does not impact the classification

Sep 2020 - Feb 2021

Business Developer | SMATCH SA

For two years, I contributed to develop a sports connectivity platform in a dynamic Start-up. I conducted data analysis to gain insights from user interactions across the application, informing decisions and contributed to user base growth to +50,000 by making strategic business & technical decisions for the startup.
Our project Secured funding for the venture in the Olympic Capital, Lausanne
Challenges : Pluridisciplinary of the tasks and the tradeoff simplicity/perfectionism

Feb 2016 - Feb 2018

Website | Video

Teaching Assistant | College Arnold Reymond

Teaching in Maths, Physics, and Biology in Primary School at College Arnold Reymond.
Challenges : Bringing clarity out of ambiguity.

Aug 2017 - Sep 2017

My Selection of Projects

Master Thesis : Robust Multimodal Contrastive Learning

Master thesis in Prof. Roger Wattenhofer’s laboratory of distributed computing at ETH Zurich. The goal of the Master thesis was to create a novel vision and language multi-modal self-supervised framework called : robust multi-modal contrastive learning. The idea was to reinforce the latent relation between modalities through their adversarial samples. The project offered a fascinating insight into multi modality algorithms, transformer-based architecture, contrastive learning, and robustness optimization.
The project was in collaboration with the Prof. Martin Jaggi from EPFL (Switzerland) , Dr. Yunpu Ma from LMU (Germany), Kangning from NYC.
Challenges : A strong management of scaling, precision and distributed computing.

Feb 2021 - September 2021

Paper | Code

Course Project : Comparison Between Two Dimensionality Reduction Techniques

Comprehensively reviews and discusses two dimension reduction technics: LLE and its modified version. Their stability with various data and hyperparameters is depicted as well as their topology preservation and classification performance. Further comparison with t-SNE and UMAP.
Challenges : The fondation of Geometric preservation with manifolds and the metrics.

Feb 2020 - Jul 2020

Paper | Code

Semester Project: Emotion Analysis On Opensubtitle

In this paper, we present a data-driven approach to the segmentation of subtitles in movie into a speaker-aligned dataset. Furthermore, we finetune BERT to label the dialogues with emotions. Finally, we measure the performance of the EmoBERT in a Social bots context.
Challenges : The data cleaning and preprocessing. First hands-on the transformer-based architecture.

Feb 2020 - Jul 2020

Paper | ArXiv | Code

Course Project : Meta-Learner LSTM for Few-Shot Learning

Ravi & Larochelle have addressed the weakness of neural networks trained with gradient-based optimization on the few-shot learning problem with an LSTM-based meta-learner. Our paper expands the performance analysis of their algorithm.
Challenges : The translation from Pytorch to TensorFlow and the selection of experiences.

Feb 2020 - Jul 2020

Paper | Code

Course Project : Learning to play Pong with Deep Reinforcement Learning

In this project we taught an agent to play the game Pong from the PyGame learning environment. We used and compared two policy gradient approaches to learn the task : Actor Critic Versus Advantage Actor-Critic (A2C)
Challenges : The grid search to select optimal hyperparameters.

May 2019 - Jun 2019

Code

Awards

Hackathon Microsoft x IBM | 1st /250 teams - May 2024

Integrate a Multi-Agents Framework directly in Teams-Call to improve calls efficiency and take data driven decision.
Agents leverage; function call (Web scraping), RAG (internal DB), Tool (email, Jira), Whisper (Audio-to-text for RAG).
Developed with AutoGen and various Azure services (Azure Docker, Functions, Power Automate, Copilot Studio)

Hackathon UBS | Finalist Top 10/230 teams - Nov 2023

Build an LLM application for Legal document with Multi-Agents Orchestration and RAG capability on Azure.
Production-ready and currently used across the Bank.

Hackathon Databricks | 2nd/130 teams - Sep 2023

Build a RAG-based Q&A LLM Application. Developed in Python, scaled in Pyspark, and deployed on MLFlow.
Process the data, generate embeddings (SBert), index vectors on ChromaDB, and retrieve and augment prompt for LLM.

Nominated project | 50 years event @ EPFL - Sep 2019

Engineered flexible, biocompatible thin-film sensor in cleanroom for vein temperature and blood flow measurements.
Contribute to the C++ interface development for ESP32 Microcontroller to collect sensor data from PCB.

My Belief about Artificial Intelligence

Nowadays, it is an amazing time to work in AI. LLMs and vLLMs serve as interfaces for computers to understand human language and the human world. By connecting these interfaces to other components of an operating system, we enable interactions with tools (software), functions (Python, C++), internal databases (RAG), and memory management.

This vision is revolutionary and will radically change how we interact with computers, making our daily lives more productive. All the giant tech companies are racing toward this vision by launching products, conducting research, and initiating open-source projects.

I am excited to be part of this journey and look forward to contributing to this transformative path.

My Inspirations

François Chollet | Software Engineer @ Google

“The future of Ai will be discrete as well as continuous”

Deep learning is limited; we can use deep learning for a continuous problem where the data is interpolative and has a learnable manifold with a dense sampling across the entire surface of the manifold between which we need to make predictions.

For Francois Chollet, generalization itself is by far the most crucial feature of artificial intelligence.

The early history of AI focused on defining priors in data structures and algorithms and tended to leave out experience and learning.

The field of AI post the deep-learning revolution seems to have the opposite obsession. In the last few years, there has been much emphasis on learning as much as possible from scratch.

The connection to Chollet’s ideas is that the deep learning era focuses on maximally exploiting experience in the form of training data.

In Chollet’s view, task-specific skills cannot tell us anything about an agent’s intelligence because it is possible to “buy arbitrary performance” by simply defining better priors or utilizing more experience. “Defining better priors or training from more experience reduces uncertainty and novelty and thus reduces the need for generalization.”

Take Away for Ai in Industry :

The role of the quantity and quality of data in the approximation of the underlying manifold.
The pro and cons of features engineering regarding the generalization.
Task-specific can result in short-cut solutions that may reveal a mismatch with our intentions.

Pr. Yann LeCun | Chief AI Scientist @ Facebook

“[...] self-supervised learning (SSL) is one of the most promising ways to build [...] a form of common sense in AI systems.”

In recent years, the AI field has made massive strides in developing AI systems that learn from vast amounts of carefully labeled data. However, moving forward, it seems impossible to annotate the vast amounts of data with everything that we care about. Supervised learning is a bottleneck for allowing more intelligent generalist models to do various jobs and gain new abilities without extensive amounts of labeled data.

For Yann LeCun a promising way to approximate such common sense is self-supervised learning (SSL). Self-supervised tasks act as a proxy strategy to learn representations of the data using pseudo labels. These pseudo labels are created automatically based on the attributes found in the data. Nevertheless, the outcome of this created task is habitually dismissed. Instead, we focus on the learned intermediate representation with the hypothesis that this representation can offer excellent semantic and benefit a diversity of useful downstream tasks.

Take Away for Ai in Industry :

Self-supervised learning capture the inherent structure in the data with less prior than supervised learning methods.
The underlying concept in modern Natural language and multi-modal architecture.
The fine-tuned solution is projected from a more general representation learn through self-supervised.

Pr. Andrex Ng | Co-founder of Google Brain

“The consistency of the data is paramount”

Data is everything in modern-day machine learning but is often neglected and not handled properly in AI projects. As a result, hundreds of hours are wasted on tuning a model built on low-quality data. That’s the main reason why the model's accuracy is significantly lower than expected - it has nothing to do with model tuning.

The Data-Centric Architecture treats data as a valuable and versatile asset instead of an expensive afterthought. Data-centricity significantly simplifies security, integration, portability, and analysis while delivering faster insights across the entire data value chain.

Andrew Ng’s idea is relatively simple; let us hold the model architecture fix assuming it is good enough and instead iteratively improve the data quality.

Instead of asking, “What data do I need to train a useful model?”, the question should be: “What data do I need to measure and maintain the success of my ML application?”

Take Away for Ai in Industry :

Considering how modern Ai architecture relies on a large amount of data, it is fundamental for industries to opt for a data-centric approach.
Managing the data drift, the data shift, the ethic and ethnic problematic, the data security and privacy, and correctly tracking and interpreting the outcome of the Ai pipeline are crucial steps for Ai large-scale deployment.

Pr Michael Bronstein | Head of Graph Learning @ Twitter

“Geometric Deep Learning aims to bring geometric unification to deep learning [...]”

Modern machine learning operates with large high-quality dataset, which together with the appropriate computational resources, motivate the design of rich function space with the capacity to interpolate over the data points.

"Symmetry, as wide or narrow as you may define its meaning, is one idea by which man through the ages has tried to comprehend and create order, beauty, and perfection. (Hermann Weyl)

Since the early days, researchers have adapted neural networks to exploit the low dimensional geometric arising from physical measurements, such as grids in images, sequences in time series or position and momentum in the molecule, and their associated symmetries such as translation or translation rotation.

Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, and second, learning by local gradient-descent type methods, typically implemented as backpropagation.

Geometric Deep Learning unifies a broad class of ML problems from the perspectives of symmetry and invariance. These principles not only underlie the breakthrough performance of convolutional neural networks and the recent success of graph neural networks but also provide a principled way to construct new types of problem-specific inductive biases

Take Away for Ai in Industry :

It is fundamental to understanding the foundation and limitation of deep learning architecture

François Chollet Software Engineer @ Google	Pr. Yann LeCun Chief AI Scientist @ Facebook	Pr. Andrex Ng Co-founder of Google Brain	Pr. Michael Bronstein Head of Graph Learning @ Twitter

“The future of Ai will be discrete as well as continuous”	“[...] self-supervised learning (SSL) is one of the most promising ways to build [...] a form of common sense in AI systems.”	“The consistency of the data is paramount”	“Geometric Deep Learning aims to bring geometric unification to deep learning [...]”
Deep learning is limited; we can use deep learning for a continuous problem where the data is interpolative and has a learnable manifold with a dense sampling across the entire surface of the manifold between which we need to make predictions. For Francois Chollet, generalization itself is by far the most crucial feature of artificial intelligence. The early history of AI focused on defining priors in data structures and algorithms and tended to leave out experience and learning. The field of AI post the deep-learning revolution seems to have the opposite obsession. In the last few years, there has been much emphasis on learning as much as possible from scratch. The connection to Chollet’s ideas is that the deep learning era focuses on maximally exploiting experience in the form of training data. In Chollet’s view, task-specific skills cannot tell us anything about an agent’s intelligence because it is possible to “buy arbitrary performance” by simply defining better priors or utilizing more experience. “Defining better priors or training from more experience reduces uncertainty and novelty and thus reduces the need for generalization.”	In recent years, the AI field has made massive strides in developing AI systems that learn from vast amounts of carefully labeled data. However, moving forward, it seems impossible to annotate the vast amounts of data with everything that we care about. Supervised learning is a bottleneck for allowing more intelligent generalist models to do various jobs and gain new abilities without extensive amounts of labeled data. For Yann LeCun a promising way to approximate such common sense is self-supervised learning (SSL). Self-supervised tasks act as a proxy strategy to learn representations of the data using pseudo labels. These pseudo labels are created automatically based on the attributes found in the data. Nevertheless, the outcome of this created task is habitually dismissed. Instead, we focus on the learned intermediate representation with the hypothesis that this representation can offer excellent semantic and benefit a diversity of useful downstream tasks.	Data is everything in modern-day machine learning but is often neglected and not handled properly in AI projects. As a result, hundreds of hours are wasted on tuning a model built on low-quality data. That’s the main reason why the model's accuracy is significantly lower than expected - it has nothing to do with model tuning. The Data-Centric Architecture treats data as a valuable and versatile asset instead of an expensive afterthought. Data-centricity significantly simplifies security, integration, portability, and analysis while delivering faster insights across the entire data value chain. Andrew Ng’s idea is relatively simple; let us hold the model architecture fix assuming it is good enough and instead iteratively improve the data quality. Instead of asking, “What data do I need to train a useful model?”, the question should be: “What data do I need to measure and maintain the success of my ML application?”	Modern machine learning operates with large high-quality dataset, which together with the appropriate computational resources, motivate the design of rich function space with the capacity to interpolate over the data points. "Symmetry, as wide or narrow as you may define its meaning, is one idea by which man through the ages has tried to comprehend and create order, beauty, and perfection. (Hermann Weyl) Since the early days, researchers have adapted neural networks to exploit the low dimensional geometric arising from physical measurements, such as grids in images, sequences in time series or position and momentum in the molecule, and their associated symmetries such as translation or translation rotation. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, and second, learning by local gradient-descent type methods, typically implemented as backpropagation. Geometric Deep Learning unifies a broad class of ML problems from the perspectives of symmetry and invariance. These principles not only underlie the breakthrough performance of convolutional neural networks and the recent success of graph neural networks but also provide a principled way to construct new types of problem-specific inductive biases
Take Away for Ai in Industry : The role of the quantity and quality of data in the approximation of the underlying manifold. The pro and cons of features engineering regarding the generalization. Task-specific can result in short-cut solutions that may reveal a mismatch with our intentions.	Take Away for Ai in Industry : Self-supervised learning capture the inherent structure in the data with less prior than supervised learning methods. The underlying concept in modern Natural language and multi-modal architecture. The fine-tuned solution is projected from a more general representation learn through self-supervised.	Take Away for Ai in Industry : Considering how modern Ai architecture relies on a large amount of data, it is fundamental for industries to opt for a data-centric approach. Managing the data drift, the data shift, the ethic and ethnic problematic, the data security and privacy, and correctly tracking and interpreting the outcome of the Ai pipeline are crucial steps for Ai large-scale deployment.	Take Away for Ai in Industry : It is fundamental to understanding the foundation and limitation of deep learning architecture

Stan‑Furrer

About Me

Skills

My Professional Experiences

My Selection of Projects

Awards

My Belief about Artificial Intelligence

My Inspirations