SLATE'23

Monday, 26^th June | Tuesday, 27^th June | Keynotes

26^th June

All papers have 20 minutes for presentation, plus 5 minutes for Q&A;

Papers/Events	Authors	Time
Welcome Session		09:30
Session I	(Chair: António Teixeira)
Web of Science Citation Gaps: An Automatic1 Approach to Detect Indexed but Missing Citations	David Rodrigues, António Lopes and Fernando Batista	09:45
Querying Relational Databases with Speech-Recognition driven by Contextual Knowledge	Dietmar Seipel, Benjamin Förster, Magnus Liebl, Marcel Waleska and Salvador Abreu	10:10
OCRticle - a structure-aware OCR application	Sofia Santos and José João Almeida	11:35
Coffee Break		11:00
Session II	(Chair: Mário Pinto)
Romaria de Nª Srª d'Agonia: Building a Digital Repository and a Virtual Museum	Sara Cristina Freitas Queirós Queirós, Cristiana Esteves Araujo and Pedro Manuel Rangel Henriques	11:20
Towards a universal and interoperable scientific data model	João Oliveira, Diogo Gomes, Francisca Santana and Filipe Portela	11:45
Integrating Gamified Educational Escape Rooms in Learning Management Systems	Ricardo Queirós, Carla Pinto, Mário Cruz and Daniela Mascarenhas	12:10
Lunch		12:35
Session III	(Chair: Ricardo Queirós)
Narrative Extraction from Semantic Graphs	Daniil Lystopadskyi, André Santos and José Paulo Leal	14:30
A Framework for Fostering Easier Access to Enriched Textual Information	Gabriel Silva, Mário Rodrigues, António Teixeira and Marlene Amorim	14:55
Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications	Simone Wills, Yu Bai, Cristian Tejedor García, Catia Cucchiarini and Helmer Strik	15:20
Free Break		15:45
Keynote	(Chair: Alberto Simões)
From similarity to paraphrase detection: Some NLP techniques	Gerardo Sierra	16:00

Papers/Events	Authors	Time
Session IV	(Chair: Fernando Batista)
Question Answering over Linked Data with GPT-3	Bruno Faria, Dylan Perdigão and Hugo Gonçalo Oliveira	09:30
A pseudonymization prototype for Hungarian	Attila Novák and Borbála Novák	09:55
Generating and Ranking Distractors for Multiple-Choice Questions in Portuguese	Hugo Gonçalo Oliveira, Igor Caetano, Renato Matos and Hugo Amaro	10:20
Coffee Break		10:45
Session V	(Chair: José Paulo Leal)
Hierarchical Data-Flow Graphs	José Pereira, Vitor Vieira and Alberto Simões	11:00
Characterization and Identification of Programming Languages	Júlio Alves, Alvaro Costa Neto, Maria João Varanda Pereira and Pedro Rangel Henriques	11:25
Type Annotation for SAST	Marco Pereira, Alberto Simões and Pedro Henriques	11:50
Large language models: compilers for the 4th. generation of programming languages?	Francisco Marcondes, José João Almeida and Paulo Novais	12:15
Lunch		12:40

Dinner		19:30

Gerardo Sierra

Engineering Institute, Universidad Nacional Autónoma de México (UNAM)

Abstract

In the field of Natural Language Processing (NLP), the task of identifying textual similarity, particularly paraphrase detection, presents challenges in various applications like plagiarism detection, question answering, textual entailment, summarization, and evaluating automatic machine translation, among others. To tackle this, numerous NLP techniques have been developed, including vector space models (based on terms), text alignment (based on linguistic knowledge), n-gram overlapping (based on strings), machine learning algorithms, and deep learning architectures.

Most of the datasets used for detecting and quantifying semantic textual similarity rely on pairs of texts treated as feature vectors, with each feature representing a score corresponding to a specific type of similarity. However, paraphrases can take different forms beyond sentence pairs, leading to a wide range of variations. Examples include the mixing or splitting of sentences, or even the deletion of certain elements while combining others.

As a result, paraphrase detection models need to consider the analysis of datasets that encompass more complex forms of paraphrasing. Additionally, it becomes necessary to account for other levels of linguistic analysis, such as discursive or stylometric analysis.

Short Bio

Gerardo Sierra is Researcher and Head of the Language Engineering Group at Universidad Nacional Autónoma de México (UNAM). His work focuses on research and development on corpus linguistics and computational Lexicography. Regarding the former, he has published the book "Introduction to corpus linguistics", which constitutes a reference in the linguistic and language technology community. He is the researcher who has put more corpora on Internet in Mexico, with own technology that includes the GECO corpus manager. Among them, the Corpus of Sexualities in Mexico, the RST Spanish Treebank and the Parallel Corpus of Mexican Languages. On computational lexicography, his work on onomasiological dictionaries, terminological extraction systems and definitional contexts are recognized worldwide.

Co-Located With

Organization

26^th June

27^th June

Gerardo Sierra

Engineering Institute, Universidad Nacional Autónoma de México (UNAM)

Abstract

Short Bio

Co-Located With

Organization

26th June

27th June

Gerardo Sierra

Engineering Institute, Universidad Nacional Autónoma de México (UNAM)

Abstract

Short Bio

26^th June

27^th June