COLING 2022

Tutorials

PROGRAMME Tutorials

[Virtual]

Psychological, Cognitive and Linguistic BERTology: An Idiomatic Multiword Expression Perspective

Harish Tayyar Madabushi, Carlos Ramisch, Marco Idiart, Aline Villavicencio

The success of BERT and similar pre-trained language models (PLMs) has led to what might be described as an existential crisis for certain aspects of Natural Language Processing: PLMs can now do better than other models on numerous tasks in multiple evaluation scenarios and are argued to outperform human performances on some benchmarks (Wang et al., 2018; Sun et al., 2020; Hassan et al., 2018). In addition, PLMs also seem to have access to a variety of linguistic information as diverse as parse trees (Hewitt and Manning, 2019), entity types, relations, semantic roles (Tenney et al., 2019a), and constructional information (Tayyar Madabushi et al., 2020). Does this mean that there is no longer a need to tap into the decades of progress that was made in traditional NLP and related fields including corpus and cognitive linguistics?

In short, can deep(er) models replace linguistically motivated (layered) models and systematic engineering as we work towards high-level symbolic artificial intelligence systems? This tutorial will explore these questions through the lens of a linguistically and cognitively important phenomena that PLMs do not (yet) handle very well: Idiomatic Multiword Expressions (MWEs) (Yu and Ettinger, 2020; Garcia et al., 2021; Tayyar Madabushi et al., 2021).

[Hybird]

Information Theory in Linguistics: Methods and Applications

Ryan Cotterell, Richard Futrell, Kyle Mahowald, Clara Meister, Tiago Pimentel, Adina Williams, and Aryaman Arora

Since Shannon originally proposed his mathematical theory of communication in the middle of the 20th century, information theory has been an important way of viewing and investigating problems at the interfaces between linguistics, cognitive science, and computation, respectively. With the upsurgence in applying machine learning approaches to linguistics questions, information-theoretic methods are becoming an ever more important tool in the linguist’s toolbox. This cutting-edge tutorial, which draws on the work of many different researchers, emphasizes interdisciplinary connections between the fields of linguistics and natural language processing. We plan to do this by reviewing the mathematical basis of information theory. We then show it can be fruitfully applied to several linguistic applications, ranging from semantics, typology, morphology, phonotactics, and to the interface between cognitive science and linguistics. We then discuss recent research---spanning fields from psycholinguistics to machine learning---that have made progress in the analysis of natural language using these techniques. Throughout the tutorial, we will provide hands-on exercises that allow you to put theory into practice in linguistic applications.

[Virtual]

Uncertainty Estimation for Natural Language Processing

Adam Fisch, Robin Jia, Tal Schuster

Accurate estimates of uncertainty are important for many difficult or sensitive prediction tasks in natural language processing (NLP). Though large-scale pre-trained models have vastly improved the accuracy of applied machine learning models throughout the field, there still are many instances in which they fail. The ability to precisely quantify uncertainty while handling the challenging scenarios that modern models can face when deployed in the real world is critical for reliable, consequential-decision making. This tutorial is intended for both academic researchers and industry practitioners alike, and provides a comprehensive introduction to uncertainty estimation for NLP problems---from fundamentals in probability calibration, Bayesian inference, and confidence set (or interval) construction, to applied topics in modern out-of-distribution detection and selective inferen

[Hybird]

Knowledge Graph Embeddings for NLP: From Theory to Practice

Luca Costabello, Adrianna Janik, Eda Bayram, Sumit Pai

Knowledge graph embeddings are supervised learning models that learn vector representations of nodes and edges of labeled, directed multi-graphs. We describe their design rationale, and explain why they are receiving growing attention within the graph representation learning and the broader NLP communities. We highlight their limitations, open research directions, and real-world use cases. Besides a theoretical overview, we also provide a handson session, where we show how to use such models in practice.

[Virtual]

NS4NLP: Neuro-Symbolic Modeling for NLP

Dan Roth, Yejin Choi, Vivek Srikumar, Dan Goldwasser, Maria L. Pacheco and Sean Welleck

The goal of neuro-symbolic methods is to combine symbolic representations and neural networks and benefit from the complementary strengths of the two paradigms. These ideas have a long history in AI and currently experience a resurgence of interest in several AI communities, including NLP. This tutorial targets researchers and practitioners interested in applying and advancing the use of these methods for natural language processing problems. A major goal of this tutorial is to provide a framework for analyzing the different modeling and algorithmic choices when combining symbolic and neural models for knowledge representation, learning and reasoning.

[Virtual]

Analysing Human Communication: Sociopragmatic and Pragmalinguistic Models of Im/politeness and Verbal Aggression

Ritesh Kumar, Daniel Kadar and Juliane House

A large body of NLP research has pointed to the need of developing systems that are socially aware and culturally sensitive. Two such broad fields include

Development of socially-aware dialog systems or assistants, including both a generic system like SARA or a goal-oriented system.
Development of systems for automatic recognition of hateful and abusive speech online.

Given the huge interest of the computational linguistics community in these areas of research, it might be of some significance for the community to become aware of the advances made within the field of sociopragmatics and pragmalinguistics of im/politeness and aggression. In this introductory tutorial, we will trace through the historical progression in the understanding of im/politeness and aggression within the sociopragmatic and pragmalinguistic approaches to understanding these. We will then present a new interdisciplinary approach for conducting a large-scale corpus-based analysis of socially and culturally sensitive, pragmatic phenomena such as im/politeness and aggression. These different theoretical approaches and frameworks may be incoporated by the computational linguistic community for building both the socially-aware dialog systems as well as detecting online hate.

[Hybird]

Spatial Language Understanding: Representation, Reasoning, and Grounding

Parisa Kordjamshidi, James Pustejovsky, Sine Moens

In this tutorial, we overview the cutting edge research on spatial language understanding and its applications. This includes the way the spatial semantics are represented, the existing datasets and annotations, the connection between information extraction models, qualitative reasoning based on spatial language and end-to-end deep learning models. We overview the recent transformer-based language-model used for spatial language comprehension and the challenges. We clarify the role of spatial language in grounding language in visual world and the related applications in navigation and way finding agents as well as human machine interaction and dialogue systems.

THE 29TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS

October 12-17, 2022 / Gyeongju, Republic of Korea

Tutorials