Mathematics ability exceeds ChatGPT, 70B open source large model is on fire: fine-tuning AI with AI, produced by Microsoft All-China Class

Source: "Qubit" (ID: QbitAI), Author: Feng Se

Use AI-generated instructions to fine-tune the large alpaca model, and the mathematical ability exceeds ChatGPT——

Microsoft's latest open source model WizardMath is here.

As shown in the figure below, after being tested on the GSM8k data set, WizardMath's mathematical ability directly defeated many large models such as ChatGPT, Claude Instant 1, and PaLM 2-540B——

And it is under the condition that the key parameters are only 70 billion, which is far less than the latter three.

HuggingFace has launched 3 online playable versions (parameters 7B, 13B and 70B respectively), and various math problems can be thrown in for a try.

For example, solve the following quartic polynomial equation:

Or a simple calculus:

Or a slightly modified Lagrange equation derivation:

It's all correct (and the process doesn't have to wait too long).

Some netizens said to the author:

The effect is really amazing, thank you for your contribution to open source LLM.

At present, relevant codes, reproduction methods, and papers are also open source or online, and GitHub has received 4.8k stars in just a few days.

So, how exactly does WizardMath do it?

Enhance large model capabilities with AI-generated instructions

OpenAI's large models (InstructGPT, GPT-4, etc.) have been able to perform a variety of complex and diverse tasks with great success, partly due to fine-tuning using open-domain instruction data generated by real human users.

However, not everyone has access to such command datasets as this company does.

One is because the entire annotation process is extremely expensive and time-consuming, and the other is that it is difficult for humans to create a sufficient proportion of difficult instructions.

Therefore, developing a relatively low-cost, large-scale open-domain instruction automatic production method has become the key to the current instruction tuning language model.

Here, the authors name their method Evol Instruction.

It is a new method of using AI to replace humans to automatically generate open-field instructions covering various difficulty levels.

Specifically, Evol Instruction is divided into Instruction Evolver and Instruction Eliminator.

Among them, the instruction evolver can upgrade a simple instruction to a more complex instruction or create a new instruction through two paths of deep evolution (blue line) or extensive evolution (red line).

Which one should be implemented? Just choose randomly.

Among them, the specific "evolution method" of in-depth evolution is completed through five types of operations, including:

Add constraints, deepening, concretizing, increase reasoning steps, and complicate input.

Since all instructions are done by AI, sometimes mistakes are inevitable. Therefore, the instruction eliminator is used to filter failed instructions.

Here is a concrete example of a method that starts with "1+1=?" and ends up automatically generating quite a few new instructions through the above steps.

By repeating this generation process, we can finally get enough instructions, and then combine them and randomly scramble them to form an instruction set with a difficulty level uniform distribution, and then we can fine-tune the basic large model.

Here, the author selects Alpaca's training data (generated by only 175 artificially created seed instructions) as the initial data set, and then uses ChatGPT's API to perform four evolution cycles, and finally obtains 250,000 instructions.

In order to make a fair comparison with Vicuna's 70k real user data (ShareGPT), the author extracted an equal amount of samples from the 250,000 pieces of data, trained the LLaMA 7B model, and finally obtained WizardLM. As a result, the performance of WizardLM was significantly better than Vicuna.

(Alpaca: Stanford fine-tuned model based on LLaMa-7B; Vicuna, UC Berkeley fine-tuned based on LLaMa-13B)

In addition, humans prefer the output of WizardLM to ChatGPT under more complex test instructions, suggesting that this method can significantly improve the ability of LLM to handle complex instructions.

Based on this, the author used Evol Instruction to generate many instructions related to the field of mathematics, and then fine-tuned the large alpaca model to obtain WizardMath.

Its effect is as shown at the beginning. Its mathematical ability is measured on the GSM8k data set, surpassing many large models including ChatGPT, Claude Instant 1, PaLM 2-540B, etc., ranking fifth, second only to GPT-4 and Claud1. 3 and 2.0, and after Flan-PaLM 2 with 540 billion parameters.

By analogy, the author also got WizardCoder, which specializes in coding capabilities on the alpaca, and the effect surpasses Claude and Bard (for details, please click the address at the end of the article).

team introduction

There are 9 authors in this article, all Chinese.

There are 3 characters in one work:

Can Xu, Senior Application Scientist, S+D NLP Group, Microsoft Asia Internet Engineering Academy, previously worked on chat robot systems in Microsoft Xiaoice Research Group and Microsoft Asia Research Institute;

Qingfeng Sun, Microsoft Research scientist, research direction is natural language processing and information retrieval, proficient in building efficient search systems, contributed core deep models to Microsoft Bing and Office 365;

Kai Zheng, Microsoft Research scientist, research direction is natural language processing, search and recommendation ranking, also contributed core deep model to Microsoft Bing and Office 365.

The corresponding author is Jiang Daxin, global partner and vice president of Microsoft, and former chief scientist of Microsoft Research Asia. He has worked in Microsoft for more than 16 years and was the person in charge of natural language understanding of Microsoft’s Bing search engine and Cortana intelligent assistant. It has been revealed that he has left his job and devoted himself to starting a large-scale model business.

Another author, Jiazhan Feng, is a Peking University student. This co-authored paper was produced during his internship at Microsoft.

Project home page:

Paper address:

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)