Consider the following plots related to scaling laws, taken from the scaling laws paper (left) and the GPT-3 (right) paper. 10 Par Computer-day Compute eficient training stops fr short of convergence -4-2.57-c-m ComputePLOP-day [10" Which options are true? In both sets of plots, the first batch of the largest model uses more compute than the entire training run of the smallest model. In the right plot, the loss lower bound is L = 2.57-C-0.048 If the 2.57 were replaced by 3, the slope of the line representing I would change. Some of the models in these plots are showing signs of overfitting. The left plot shows that larger models require fewer training tokens to reach the same performance as smaller models. RELEVA

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question
100%
Consider the following plots related to scaling laws, taken from the scaling laws paper (left) and the
GPT-3 (right) paper.
Test Loss 10
6
10 Params
Tokens Processed
-10 Param
10+
Computer-day
Line color indica
number of parameters
Compute-efficient
training stops far
short of convergence
2
-4-2.57-C-
10"
10
10²
Compute(PLOP-days)
10"
10²
10²
10²
10
Which options are true?
In both sets of plots, the first batch of the largest model uses more compute than the entire
training run of the smallest model.
In the right plot, the loss lower bound is L = 2.57-C-0.048. If the 2.57 were replaced by 3,
the slope of the line representing I would change.
Some of the models in these plots are showing signs of overfitting.
The left plot shows that larger models require fewer training tokens to reach the same
performance as smaller models.
WALA
Fa
Transcribed Image Text:Consider the following plots related to scaling laws, taken from the scaling laws paper (left) and the GPT-3 (right) paper. Test Loss 10 6 10 Params Tokens Processed -10 Param 10+ Computer-day Line color indica number of parameters Compute-efficient training stops far short of convergence 2 -4-2.57-C- 10" 10 10² Compute(PLOP-days) 10" 10² 10² 10² 10 Which options are true? In both sets of plots, the first batch of the largest model uses more compute than the entire training run of the smallest model. In the right plot, the loss lower bound is L = 2.57-C-0.048. If the 2.57 were replaced by 3, the slope of the line representing I would change. Some of the models in these plots are showing signs of overfitting. The left plot shows that larger models require fewer training tokens to reach the same performance as smaller models. WALA Fa
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Intelligent Machines
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education