LLM Intro

Tuesday, December 19, 2023

Large, general-purpose language models that can be pre-trained and then fine-tuned for specific purposes.

For solving common language problems, such as

The models can then be tailored to solve specific problems in different files using a relatively small size of field datasets, like

Large
training dataset
number of parameters
General purpose: models are sufficient to solve common problems because
the commonality of human languages regardless of the specific tasks
Resource restriction: only certain organizations can train such models with huge datasets
Pre-trained and fine-tuned
for specific aims with a smaller datasets

A single model can be used for different tasks (dream come true)
The fine-tuning process requires minimal field data when you tailor them to solve specific problems (for few or zero shots senariors)
The performance is continuously growing with more data and parameters

PaLM: Pathway Language Model
Dense decoder only transformer model
540 billion parameters
Leverages the new Pathway system, which enabled Google to efficiently train a single model across multiple TPU v4 Pods
A new AI architecture that will handle many tasks at once, learn new tasks quickly and reflect a better understanding of the world.
The system enables PaLm to orchestrate distributed computation for accelerators.

LLM is a transformer model which includes

encoder: encodes the input sequence as representation and pass it to the decoder
decoder: learns representation and decodes representations for a relevant task

Traditional programming: hard code rules about a dog
Neural networks: give pictures about dog and ask is this a dog, and it would predict a dog
Generative language models (LaMDA, PaLM, GPT): users generate own text, ask models to read the text, then ask what is a dog

They both intend to create prompts that are clear, precise, and informative. However, there are key differences.

	Prompt Design	Prompt Engineering
Definition	The process of creating tailored instructions and context passed to a language model to achieve a desired task	The practice of developing and optimizing prompts to efficiently use language models for a variety of applications.
Scope	Specific	Generalized
Scenario	Essential for a specific task	When requires a high degree of accuracy/performance

	Generic/Base	Instruction Tuned	Dialog Tuned
What does the model do?	Predict the next word (token) based on the language in the training data (autocomplete in search)	Follow instructions. Predict a response to the instructions given in the input. Use RLHF (Reinforcement learning with human feedback)	To have a dialog by predicting the next response (a type of Instruction based)
Examples	Predict the next word	Sentiment analysis	Further specialization of instruction tuning that is expected to be in the context of a longer back and forth conversation, and typically works better with natural questions-like phrasings (Chatbot)

Models are better at getting the right answer when they first output text that explains the reason for the answer

	Language	Vision
Extraction	Syntax analysis	Object detector
Classification	Entity analysis	Occupancy analytics