LLM Intro
1.What is large language model (LLM)?
Large, general-purpose language models that can be pre-trained and then fine-tuned for specific purposes.
2. What are LLMs trained for?
For solving common language problems, such as
- text classification
- questions and answering
- document summarization
- text generation
The models can then be tailored to solve specific problems in different files using a relatively small size of field datasets, like
- retail
- finance
- entertainment
3. What are the features of LLMs?
- Large
- training dataset
- number of parameters
- General purpose: models are sufficient to solve common problems because
- the commonality of human languages regardless of the specific tasks
- Resource restriction: only certain organizations can train such models with huge datasets
- Pre-trained and fine-tuned
- for specific aims with a smaller datasets
4. What are the benefits of using LLMs?
- A single model can be used for different tasks (dream come true)
- The fine-tuning process requires minimal field data when you tailor them to solve specific problems (for few or zero shots senariors)
- The performance is continuously growing with more data and parameters
5. Examples
- PaLM: Pathway Language Model
- Dense decoder only transformer model
- 540 billion parameters
- Leverages the new Pathway system, which enabled Google to efficiently train a single model across multiple TPU v4 Pods
- A new AI architecture that will handle many tasks at once, learn new tasks quickly and reflect a better understanding of the world.
- The system enables PaLm to orchestrate distributed computation for accelerators.
6. How does LLM work?
LLM is a transformer model which includes
- encoder: encodes the input sequence as representation and pass it to the decoder
- decoder: learns representation and decodes representations for a relevant task
7. Traditional programming vs Neural Networks
- Traditional programming: hard code rules about a dog
- Neural networks: give pictures about dog and ask is this a dog, and it would predict a dog
- Generative language models (LaMDA, PaLM, GPT): users generate own text, ask models to read the text, then ask what is a dog
8. LLM vs Traditional model development
LLM | Traditional | |
---|---|---|
Think about | Prompt design | Minimizing a loss function |
ML expertise needed | No | Yes |
Compute time and hardware | No | Yes |
Training examples | No | Yes |
Training a model | No | Yes |
9. What are prompt design and prompt engineering?
They both intend to create prompts that are clear, precise, and informative. However, there are key differences.
Prompt Design | Prompt Engineering | |
---|---|---|
Definition | The process of creating tailored instructions and context passed to a language model to achieve a desired task | The practice of developing and optimizing prompts to efficiently use language models for a variety of applications. |
Scope | Specific | Generalized |
Scenario | Essential for a specific task | When requires a high degree of accuracy/performance |
10. What are the different kinds of LLMs?
Generic/Base | Instruction Tuned | Dialog Tuned | |
---|---|---|---|
What does the model do? | Predict the next word (token) based on the language in the training data (autocomplete in search) | Follow instructions. Predict a response to the instructions given in the input. Use RLHF (Reinforcement learning with human feedback) | To have a dialog by predicting the next response (a type of Instruction based) |
Examples | Predict the next word | Sentiment analysis | Further specialization of instruction tuning that is expected to be in the context of a longer back and forth conversation, and typically works better with natural questions-like phrasings (Chatbot) |
11. Chain of thought reasoning
- Models are better at getting the right answer when they first output text that explains the reason for the answer
12. Why is tuning needed?
- A model that can do everything has practical limitations
13. Task specific models
Language | Vision | |
---|---|---|
Extraction | Syntax analysis | Object detector |
Classification | Entity analysis | Occupancy analytics |