Papers/Events | Authors | Time |
---|
Papers/Events | Authors | Time |
---|
I will do a systematic exploration of strategies for pretraining generative Large Language Models (LLMs) within the Galician-Portuguese diasystem. We investigate the impact of combining versus separating linguistic varieties during continued pretraining, the trade-offs between large-scale noisy data and smaller high-quality corpora, and the potential gains from incorporating instruction-based data during the training phase instead of in post-training (e.g., instruction tuning). In sum, I will try to give some hints on how to improve an LLM, taking into account factors such as the quality and size of the corpus, language varieties used and the ability to understand instructions.
Pablo Gamallo defended his Linguistics thesis in 1998 at the Université Blaise Pascal (France), and since 2004, he has been working at the University of Santiago de Compostela (Spain), first as a Ramón y Cajal research fellow and now as Full Professor. He was promoter and founding partner of Cilenis, spin-off of the University of Santiago de Compostela on language technologies. Concerning his research activities, he is a member of the Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS). His main scientific interest is Natural Language Processing and Information Extraction. At present, he is the Coordinator of the European project HYBRIDS, a MSC Doctoral Network with nine European Beneficiaries, and is one of the Principal Investigators of Proxecto NÓS, an ambitous project aimed at building linguistic resources (corpora, datasets, and language models) for Galician language.
After four decades in the study of (formal) languages, the time has come to pause, reflect on the insights gained, and weave an ontology that elegantly connects the myriad facets of this rich domain, laying the foundation for a deeper characterization. In this talk, I invite you to explore topics such as knowledge representations, language types and their subclasses, language affinities, blended language paradigms, problem-solving, programming, reasoning frameworks, computational thinking, grammars, quality, language processing, interpretation, compilation, and static analysis, among others. Through this journey, I will adopt a structured lens, presenting concise slices of the ontology. My aim is not to reach a singular destination but to meander thoughtfully through these interwoven themes, embracing the exploration itself as the reward.
Pedro Rangel Henriques holds a PhD in Formal Languages and Attribute Grammars from the University of Minho (UM), where he serves as a Full Professor in the Informatics Department. A dedicated researcher at the Algoritmi Research Center and a member of LASI, he leads the Language Processing Group. His teaching spans a diverse array of Computer Science courses, including Programming Languages and Paradigms, Compilers, Language and Grammar Engineering, Markup Languages for Document Annotation, Ontologies, and Introduction to Informatics. With an extensive supervisory record, Pedro has guided 19 PhD dissertations, over 100 master’s theses, and more than 100 undergraduate projects. His mentorship focuses on areas such as language and document processing, code analysis, program visualization and comprehension, computational thinking, ontologies, natural language processing, data mining, and data cleaning. A prolific scholar, he has co-authored one book, contributed over 15 book chapters, published more than 35 journal articles and 100 conference papers, and participated in 28 R&D projects, advancing the frontiers of language processing and related fields.
In this talk an overview of the current state of the art of agentic artificial intelligence systems will be presented. Recent advancements in the field, highlighting how these systems can be developed and applied to different domains, will be discussed. A special focus will be done on the current challenges and limitations associated with the development of trustworthy AI agentic systems, namely, systems with the following characteristics: explainability, transparency, privacy, security, accountability, fairness, inclusiveness, reliability, and ethics.
Paulo Quaresma is Vice-Rector for Research, Innovation and Internationalization at the University of Évora and a Full Professor at the Department of Informatics at the same University. In 2021 he was Member of the Board of Directors of FCT – Portuguese Foundation for Science and Technology. At the University of Évora, he was Vice-Rector for Research and Development from 2014 to 2018 and Director of the School of Science and Technology from 2009 to 2013. He holds a PhD in Informatics from Universidade Nova de Lisboa, specialized in Artificial Intelligence and Natural Language Processing. He was responsible or coordinator of several research projects, funded by Portuguese and European entities and published more than 100 scientific articles in international journals and conferences.