02dec9:000:30[Tutorial] D. Croce & C.D. Hromei: Large Language Models and How to Instruction Tune Them (in a Sustainable Way)9:00 - 12:30 AM

Event Details

In recent years, the evolution of Large Language Models (LLMs) has marked a profound transformation in computational linguistics and computer science. This tutorial aims to provide participants with a comprehensive understanding of the state-of-the-art LLMs introduced in the literature. We will particularly focus on the progression that led models like GPT or LLaMA to be adapted for language inference tasks and their capability to be instruction fine-tuned, eventually leading to the development of models such as ChatGPT, Alpaca and Vicuna.
A notable challenge with LLMs has been their computational complexity, making them appear unfeasible for common usage. To (partially) solve such issues, this tutorial will delve into techniques like quantization and, more prominently, low-rank adaptation (LoRA). These techniques make it possible to fine-tune an LLM with 7 or 13 billion parameters on a standard 16 GB GPU. The application of these methods has enabled the adaptation of foundational LLMs pre-trained on large-scale text collections to a myriad of tasks, leading to a remarkable proliferation of models being released on the web daily.
Building on this, we will explore the development of a unified architecture that participated under the name of ExtremITA in all tasks of EVALITA 2023. By effectively combining prompting engineering and sustainable learning techniques, this monolithic architecture based on LLaMA tackled twenty-two complex semantic processing tasks in the Italian language, across varied semantic dimensions, including Affect Detection, Authorship Analysis, Computational Ethics, Named Entity Recognition, Information Extraction, and Discourse Coherence.
Lastly, attendees will gain insights into training a state-of-the-art foundational model (LLaMA2) using data from the competition, replicating the approach in ExtremITA. This will empower them with the knowledge to develop a monolithic model capable of addressing all tasks and extending beyond.

more

0 commenti

Lascia un commento

Segnaposto per l'avatar

Il tuo indirizzo email non sarĂ  pubblicato. I campi obbligatori sono contrassegnati *