Talks and presentations

A theory of injection-based vulnerabilities in formal grammars

March 29, 2023

Talk, 2023 Annual Meeting of the WG "Formal Methods for Security", Roscoff, France

Many systems work by receiving instructions and processing them: e.g., a browser receives and then displays an HTML page and executes Javascript scripts, a database receives a query and then applies it to its data, an embedded system controlled through a protocol receives and then processes a message. When such instructions depend on user input, one generally constructs them with concatenation or insertion. It can lead to injection-based attacks: when the user input modifies the query’s intended semantics and leads to a security breach. Protections do exist but are not sufficient as they never tackle the origin of the problem: the language itself. We propose a new formal approach based on formal languages to assess risk, enhance static analysis, and enable new tools. This approach is general and can be applied to query, programming, and domain-specific languages as well as network protocols. Slides

The complexity of unsupervised learning of lexicographic preferences

March 17, 2023

Talk, Séminaire ANITI, IRIT, Toulouse, France

This work considers the task of learning users’ preferences on a combinatorial set of alternatives, as generally used by online configurators, for example. In many settings, only a set of selected alternatives during past interactions is available to the learner. Fargier et al. [2018] propose an approach to learn, in such a setting, a model of the users’ preferences that ranks previously chosen alternatives as high as possible; and an algorithm to learn, in this setting, a particular model of preferences: lexicographic preferences trees (LP-trees). In this paper, we study complexity-theoretical problems related to this approach. We give an upper bound on the sample complexity of learning an LP-tree, which is logarithmic in the number of attributes. We also prove that computing the LP tree that minimises the empirical risk can be done in polynomial time when restricted to the class of linear LP-trees. Slides

Anomaly detection and explanation in networks with machine learning

March 02, 2023

Talk, NICT seminar, Campus Cyber, Puteaux, France

This talk presents recent work on anomaly detection in network data and anomaly explanation. Our approach represents the network data with a security objects graph analyzed by an autoencoder. We introduce a new statistical explanation technique for reconstruction-based methods and compare it with SHAP. Finally, we use these explanations to analyze the dataset CICIDS2017 and check whether they match the expert’s expectations. Slides

Behavioral intrusion detection system based on machine learning

September 20, 2022

Talk, Supsec 3rd workshop: AI for supervision, Rennes, France

In this talk, I present the sec2graph approach, its performances, and its explanation mechanism. This mechanism helped us identify several flaws we identified in the labelling of the CICIDS2017 dataset and in the traffic capture, such as packet misorder, packet duplication and attack that were performed but not correctly labelled. Slides

The complexity of unsupervised learning of lexicographic preferences

July 23, 2022

Talk, M-PREF Workshop of IJCAI, Vienne, Autriche

This work considers the task of learning users’ preferences on a combinatorial set of alternatives, as generally used by online configurators, for example. In many settings, only a set of selected alternatives during past interactions is available to the learner. Fargier et al. [2018] propose an approach to learn, in such a setting, a model of the users’ preferences that ranks previously chosen alternatives as high as possible; and an algorithm to learn, in this setting, a particular model of preferences: lexicographic preferences trees (LP-trees). In this paper, we study complexity-theoretical problems related to this approach. We give an upper bound on the sample complexity of learning an LP-tree, which is logarithmic in the number of attributes. We also prove that computing the LP tree that minimises the empirical risk can be done in polynomial time when restricted to the class of linear LP-trees. Slides

Interactive configuration and recommendation in presence of constraints

May 20, 2022

Talk, Séminaire ANITI, IRIT, Toulouse, France

We present our work on the recommendation of values in interactive configuration, with no prior knowledge about the user, but given a list of products previously configured and bought by other users (“sale histories”). The basic idea is to recommend, for a given variable at a given step of the configuration process, a value that has been chosen by other users in a similar context, where the context is defined by the variables that have already been decided, and the values that the current user has chosen for these variables. This presentation details how we handle constraints about the configuration and highlights some experimental results. Slides

Une introduction aux méthodes d’IA explicables

March 29, 2022

Talk, "Papers, please" seminar, Rennes, France

Avec l’utilisation toujours croissantes des techniques d’IA, le besoin de vérification des prédictions se fait de plus en plus pressant. C’est l’objectif des méthodes d’explicabilité : permettre à un utilisateur de savoir pourquoi une décision a été prise par le système. Cette introduction fait un point sur les nombreuses familles de techniques disponibles. Slides

Machine learning et sécurité : entre menaces et opportunités

November 29, 2021

Talk, MeetUp LumenAI, Rennes, France

Cette présentation rappelle les concepts fondamentaux en sécurité et présente les applications des techniques de machine learning aux multiples problématiques liées à la sécurité. Le dernier axe abordé est celui de la sécurité des méthodes de machine learning, qui sont directement attaquées par de nombreuses techniques aux objectifs et aux moyens multiples. Séminaire présenté avec Ludovic Mé. Slides

La sécurité informatique à l’ère de l’intelligence artificielle

October 13, 2021

Talk, Séminaires du département informatique, Rennes, France

Avec la numérisation de nos vies, les systèmes informatiques ont pris une place prépondérante dans notre société et sont naturellement devenus la cible d’attaquants, du “script kiddie” au groupe organisé à l’objectif politique. Depuis quelques années, l’intelligence artificielle s’est invitée à la fête : qu’elle permette de détecter automatiquement des attaques ou qu’elle soit subrepticement manipulée dans les voitures autonomes, elle amène son lot de problèmes et de solutions. Ce séminaire a pour objectif de présenter ces deux domaines, leurs enjeux et leurs interactions, et de mettre en avant les pistes de recherche que la communauté scientifique privilégie pour lutter contre ces menaces. Slides

Machine Learning 101

September 24, 2021

Talk, Hands-on Machine Learning for Security seminar, Rennes, France

Machine learning is applied successfully in various domains, including cybersecurity, where it has been used for intrusion detection, malware analysis, and attack comprehension, for example. Therefore, many cybersecurity researchers seek to catch up and introduce such techniques in their research. This presentation aims to provide the basics of machine learning for cybersecurity researchers. The presentation is available on Youtube. Slides

A formal study of injection-based vulnerabilities and some tools it will enable

February 19, 2021

Talk, SoSySec seminars at IRISA, Rennes, France

Many systems work by receiving instructions and processing them: e.g., a browser receives and then displays an HTML page and executes Javascript scripts, a database receives a query and then applies it to its data, an embedded system controlled through a protocol receives and then processes a message. When such instructions depend on user input, one generally constructs them with concatenation or insertion. It can lead to injection-based attacks: when the user input modifies the query’s intended semantics and leads to a security breach. Protections do exist but are not sufficient as they never tackle the origin of the problem: the language itself. We propose a new formal approach based on formal languages to assess risk, enhance static analysis, and enable new tools. This approach is general and can be applied to query, programming, and domain-specific languages as well as network protocols. We are setting up an ANR project to go into this subject in more depth. The presentation, in French, is available on Youtube. Slides

Un système agnostique de détection d’intrusion radio pour protéger l’Internet des objets

January 22, 2020

Talk, Nouvelles Avancées en Sécurité des Systèmes d'Information, INSA-Toulouse, Toulouse, France

L’expansion de l’Internet des objets (IoT) entraîne l’apparition de maisons intelligentes, d’usines intelligentes et même de villes intelligentes. Bien que ces objets améliorent la qualité de vie de ses utilisateurs et offrent de nouvelles opportunités économiques, ils sont aussi un important vecteur d’attaques (le botnet Mirai étant sûrement l’exemple le plus connu). Pour protéger ces environnements, des systèmes de détection d’intrusion (IDS) sont développés. Ces IDS rencontrent des problématiques uniques à l’IoT, telles que l’évolution rapide des technologies et des protocoles ou encore leur réseau décentralisé. Pour surmonter ces problèmes, nous proposons un IDS qui surveille de larges bandes de fréquences au niveau de la couche physique sans faire d’hypothèses sur les protocoles ou les technologies présentes. De plus, notre solution propose pour chaque attaque détectée un diagnostic triple : temporel (les dates exactes de l’anomalie détectée), fréquentiel (la fréquence principale de l’anomalie) et spatial (la position estimée de l’origine de l’anomalie). Nous avons expérimenté notre méthode avec une expérimentation grandeur nature: notre système a pu efficacement détecter et diagnostiquer les attaques lancées sur les bandes 400-500 MHz et 800-900 MHz, deux bandes qui ne sont pas couvertes par les solutions traditionnelles. Slides

Interactive configuration with constraints consistency and recommendation

June 12, 2018

Talk, Panorama des recherches dans le domaine automobile, LAAS-IRIT-Laplace, Toulouse, France

We present our work on the recommendation of values in interactive configuration, with no prior knowledge about the user, but given a list of products previously configured and bought by other users (“sale histories”). The basic idea is to recommend, for a given variable at a given step of the configuration process, a value that has been chosen by other users in a similar context, where the context is defined by the variables that have already been decided, and the values that the current user has chosen for these variables. This presentation details how we handle constraints about the configuration and highlights some experimental results. Slides

Learning Lexicographic Preference Trees from Positive Examples

February 07, 2018

Talk, AAAI’18 Technical Track, New Orleans, USA

We consider the task of learning the preferences of users on a combinatorial set of alternatives, as it can be the case for example with online configurators. In many settings, what is available to the learner is a set of positive examples of alternatives that have been selected during past interactions. We propose to learn a model of the users’ preferences that ranks previously chosen alternatives as high as possible. Here, we study the particular task of learning conditional lexicographic preferences. We present an algorithm to learn several classes of lexicographic preference trees, prove convergence properties of the algorithm, and experiment on both synthetic data and on a real-world bench in the domain of recommendation in interactive configuration. Slides