Learning Structured Neural Representations for Visual Reasoning Tasks

Decanato - Facoltà di scienze informatiche

Data: 4 Novembre 2020 / 17:00 - 18:30

You are cordially invited to attend the PhD Dissertation Defense of Simon van Steenkiste on Wednesday November 4th, 2020 at 17:00
Please note that given the updated Covid-19 restrictions, the Dissertation Defense will be held online.

You can join here

Abstract:
Deep neural networks learn representations of data to facilitate problem-solving in their respective domains. However, they struggle to acquire a structured representation based on more symbolic entities, which are commonly understood as core abstractions central to human capacity for generalization. This dissertation studies this issue for visual reasoning tasks. Inspired by how humans solve these tasks, we propose to learn structured neural representations that distinguish objects: abstract visual building blocks that can separately be composed and reasoned with. We investigate the limitations of current deep neural networks at effectively discovering, representing, and relating these more symbolic entities, and present several improvements. To address the problem of discovering and representing objects, we propose two novel approaches. In one case, we formalize this problem as a pixel-level clustering problem and formulate a neural differentiable clustering algorithm that solves it. We demonstrate how, unlike standard representation learning techniques, it can be trained to learn about objects in an unsupervised manner and acquire corresponding representations that can be treated as symbols for reasoning. In the other case, we adopt a purely generative approach and demonstrate how a neural network equipped with the right inductive bias can learn about objects in the process of synthesizing images, even in complex visual settings. Concerning the problem of relating symbolic entities with neural networks, we investigate how object representations can help facilitate building structured models for common-sense physical reasoning that generalize more systematically. We extend our previous representation learning approach to facilitate model building in this way and demonstrate how it can learn about general relations between objects to reason about their (future) physical interactions. Finally, we investigate the utility of a representational format that isolates independent sources of information for encoding the features of individual objects. We conduct a large-scale study of such 'disentangled' representations that includes various methods and metrics on two new abstract visual reasoning tasks. Our results indicate that better disentanglement enables quicker learning using fewer samples.

Dissertation Committee:
- Prof. Jürgen Schmidhuber, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Cesare Alippi, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Natasha Sharygina, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Leslie Kaelbling, MIT, USA (External Member)
- Prof. Michael Mozer, Google Brain & University of Colorado, Boulder, USA (External Member)
- Prof. Bernhard Schölkopf, MPI, Germany (External Member)

Facoltà

Eventi
22
Luglio
2024
22.
07.
2024

PyTamaro Summer Academy 2024

Facoltà di scienze informatiche
30
Luglio
2024
30.
07.
2024
01
Agosto
2024
01.
08.
2024
13
Agosto
2024
13.
08.
2024

Cinema and Audiovisual Futures Conference 2024

Facoltà di comunicazione, cultura e società

The Future of Survival Public Event: AI and Generative humanity

Facoltà di comunicazione, cultura e società
14
Agosto
2024
14.
08.
2024

The Future of Survival Public Event: Digital Migrations

Facoltà di comunicazione, cultura e società