LLM Intro


1.What is large language model (LLM)?

Large, general-purpose language models that can be pre-trained and then fine-tuned for specific purposes.

2. What are LLMs trained for?

For solving common language problems, such as

  • text classification
  • questions and answering
  • document summarization
  • text generation

The models can then be tailored to solve specific problems in different files using a relatively small size of field datasets, like

  • retail
  • finance
  • entertainment

3. What are the features of LLMs?

  • Large
  • training dataset
  • number of parameters
  • General purpose: models are sufficient to solve common problems because
  • the commonality of human languages regardless of the specific tasks
  • Resource restriction: only certain organizations can train such models with huge datasets
  • Pre-trained and fine-tuned
  • for specific aims with a smaller datasets

4. What are the benefits of using LLMs?

  • A single model can be used for different tasks (dream come true)
  • The fine-tuning process requires minimal field data when you tailor them to solve specific problems (for few or zero shots senariors)
  • The performance is continuously growing with more data and parameters

5. Examples

  • PaLM: Pathway Language Model
  • Dense decoder only transformer model
  • 540 billion parameters
  • Leverages the new Pathway system, which enabled Google to efficiently train a single model across multiple TPU v4 Pods
  • A new AI architecture that will handle many tasks at once, learn new tasks quickly and reflect a better understanding of the world.
  • The system enables PaLm to orchestrate distributed computation for accelerators.

6. How does LLM work?

LLM is a transformer model which includes

  • encoder: encodes the input sequence as representation and pass it to the decoder
  • decoder: learns representation and decodes representations for a relevant task

7. Traditional programming vs Neural Networks

  • Traditional programming: hard code rules about a dog
  • Neural networks: give pictures about dog and ask is this a dog, and it would predict a dog
  • Generative language models (LaMDA, PaLM, GPT): users generate own text, ask models to read the text, then ask what is a dog

8. LLM vs Traditional model development

LLM Traditional
Think about Prompt design Minimizing a loss function
ML expertise needed No Yes
Compute time and hardware No Yes
Training examples No Yes
Training a model No Yes

9. What are prompt design and prompt engineering?

They both intend to create prompts that are clear, precise, and informative. However, there are key differences.

Prompt Design Prompt Engineering
Definition The process of creating tailored instructions and context passed to a language model to achieve a desired task The practice of developing and optimizing prompts to efficiently use language models for a variety of applications.
Scope Specific Generalized
Scenario Essential for a specific task When requires a high degree of accuracy/performance

10. What are the different kinds of LLMs?

Generic/Base Instruction Tuned Dialog Tuned
What does the model do? Predict the next word (token) based on the language in the training data (autocomplete in search) Follow instructions. Predict a response to the instructions given in the input. Use RLHF (Reinforcement learning with human feedback) To have a dialog by predicting the next response (a type of Instruction based)
Examples Predict the next word Sentiment analysis Further specialization of instruction tuning that is expected to be in the context of a longer back and forth conversation, and typically works better with natural questions-like phrasings (Chatbot)

11. Chain of thought reasoning

  • Models are better at getting the right answer when they first output text that explains the reason for the answer

12. Why is tuning needed?

  • A model that can do everything has practical limitations

13. Task specific models

Language Vision
Extraction Syntax analysis Object detector
Classification Entity analysis Occupancy analytics

Ref