|Session I||(Chair: António Teixeira)|
|Web of Science Citation Gaps: An Automatic1 Approach to Detect Indexed but Missing Citations||David Rodrigues, António Lopes and Fernando Batista||09:45|
|Querying Relational Databases with Speech-Recognition driven by Contextual Knowledge||Dietmar Seipel, Benjamin Förster, Magnus Liebl, Marcel Waleska and Salvador Abreu||10:10|
|OCRticle - a structure-aware OCR application||Sofia Santos and José João Almeida||11:35|
|Session II||(Chair: Mário Pinto)|
|Romaria de Nª Srª d'Agonia: Building a Digital Repository and a Virtual Museum||Sara Cristina Freitas Queirós Queirós, Cristiana Esteves Araujo and Pedro Manuel Rangel Henriques||11:20|
|Towards a universal and interoperable scientific data model||João Oliveira, Diogo Gomes, Francisca Santana and Filipe Portela||11:45|
|Integrating Gamified Educational Escape Rooms in Learning Management Systems||Ricardo Queirós, Carla Pinto, Mário Cruz and Daniela Mascarenhas||12:10|
|Session III||(Chair: Ricardo Queirós)|
|Narrative Extraction from Semantic Graphs||Daniil Lystopadskyi, André Santos and José Paulo Leal||14:30|
|A Framework for Fostering Easier Access to Enriched Textual Information||Gabriel Silva, Mário Rodrigues, António Teixeira and Marlene Amorim||14:55|
|Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications||Simone Wills, Yu Bai, Cristian Tejedor García, Catia Cucchiarini and Helmer Strik||15:20|
|Keynote||(Chair: Alberto Simões)|
|From similarity to paraphrase detection: Some NLP techniques||Gerardo Sierra||16:00|
|Session IV||(Chair: Fernando Batista)|
|Question Answering over Linked Data with GPT-3||Bruno Faria, Dylan Perdigão and Hugo Gonçalo Oliveira||09:30|
|A pseudonymization prototype for Hungarian||Attila Novák and Borbála Novák||09:55|
|Generating and Ranking Distractors for Multiple-Choice Questions in Portuguese||Hugo Gonçalo Oliveira, Igor Caetano, Renato Matos and Hugo Amaro||10:20|
|Session V||(Chair: José Paulo Leal)|
|Hierarchical Data-Flow Graphs||José Pereira, Vitor Vieira and Alberto Simões||11:00|
|Characterization and Identification of Programming Languages||Júlio Alves, Alvaro Costa Neto, Maria João Varanda Pereira and Pedro Rangel Henriques||11:25|
|Type Annotation for SAST||Marco Pereira, Alberto Simões and Pedro Henriques||11:50|
|Large language models: compilers for the 4th. generation of programming languages?||Francisco Marcondes, José João Almeida and Paulo Novais||12:15|
In the field of Natural Language Processing (NLP), the task of identifying textual similarity, particularly paraphrase detection, presents challenges in various applications like plagiarism detection, question answering, textual entailment, summarization, and evaluating automatic machine translation, among others. To tackle this, numerous NLP techniques have been developed, including vector space models (based on terms), text alignment (based on linguistic knowledge), n-gram overlapping (based on strings), machine learning algorithms, and deep learning architectures.
Most of the datasets used for detecting and quantifying semantic textual similarity rely on pairs of texts treated as feature vectors, with each feature representing a score corresponding to a specific type of similarity. However, paraphrases can take different forms beyond sentence pairs, leading to a wide range of variations. Examples include the mixing or splitting of sentences, or even the deletion of certain elements while combining others.
As a result, paraphrase detection models need to consider the analysis of datasets that encompass more complex forms of paraphrasing. Additionally, it becomes necessary to account for other levels of linguistic analysis, such as discursive or stylometric analysis.
Gerardo Sierra is Researcher and Head of the Language Engineering Group at Universidad Nacional Autónoma de México (UNAM). His work focuses on research and development on corpus linguistics and computational Lexicography. Regarding the former, he has published the book "Introduction to corpus linguistics", which constitutes a reference in the linguistic and language technology community. He is the researcher who has put more corpora on Internet in Mexico, with own technology that includes the GECO corpus manager. Among them, the Corpus of Sexualities in Mexico, the RST Spanish Treebank and the Parallel Corpus of Mexican Languages. On computational lexicography, his work on onomasiological dictionaries, terminological extraction systems and definitional contexts are recognized worldwide.