Is the AI Agents that Silicon Valley bigwigs are talking about really hot or not?

Author|Li Han Zhu Yue

Edit|Chestnuts

Source: Jiazi Guangnian

Image source: Generated by Unbounded AI tool

After the great success of ChatGPT, OpenAI has already moved to the next goal - AI Agents (intelligent body).

"If a paper proposes a different training method, OpenAI will scoff internally, thinking that it is all left over by us. But when the new AI Agents paper comes out, we will discuss it very seriously and excitedly.** Ordinary people, entrepreneurs, and geeks have an advantage over companies like OpenAI in building AI Agents.**” said Andrej Karpathy, co-founder of OpenAI and former director of TeslaAI.

Karpathy's public speech has added a lot of heat to AI Agents. But his judgment is not exclusive.

As early as March, AutoGPT won 74,000 stars on GitHub, and quickly became the fastest-growing open source project in history; BabyAGI and AgentGPT, which were released later, sprung up like mushrooms: ordering pizza, organizing mailboxes, creating blogs, and even Throw a Valentine's Day Party...

More and more AI Agents are appearing in various scenes of people's lives, and the craze is rapidly spreading from Silicon Valley.

Self-executing and operating independently, AI Agents are given high expectations by technologists, who regard it as a "productivity tool that changes society." Some people even regard it as "the beginning of the era of general artificial intelligence (AGI)".

But the voices cannot hide the existing problems.

"A large model is the prerequisite for AI Agents. Only with a good enough hardware foundation can we develop AI Agents." Dai Yusen, managing partner of ZhenFund, told "Jiazi Guangnian".

Strictly speaking, only ChatGPT has a "qualified" large model base on the market. Restricted by the computing power of the model, there is still a lack of soil for the development of AI Agents in China.

The future is bright, but the reality is cruel. Technology research and development and venture capital are all in swing. No one knows when the dividend period of AI Agents will really come with the wave of large models. But what is certain is that change has quietly begun.

1.AI Agents: "digital assistants" that help you do things

Rather than treating AI Agents as an upgraded version of ChatGPT, it is more appropriate to regard it as a "digital assistant" for humans.

It not only tells you "how to do it", but also "helps you do it". As a medium, AI Agents replace humans and interact repeatedly with Large Language models (LLM) such as GPT. As long as a goal is given, it can simulate intelligent behavior, create tasks autonomously, re-determine the priority of the task list, and complete tasks. The first task, and loop until the goal is achieved.

Unlike traditional artificial intelligence, AI Agents can operate independently without human control. **By accessing the API, AI Agents can even browse the web, use applications, read and write files, pay with credit cards, and more.

Simply put, you only need to give it a goal, and AI **Agents can do all the rest. For example, the AI agent developed by HyperWrite can automatically order pizza for you through the control program of the Chrome browser. **

Source: HyperWrite CEO Matt Shumer Twitter account

This kind of imagination is not difficult to put in science fiction movies, but in the process of artificial intelligence exploration, it has lasted for nearly half a century.

As early as the 1980s, computer scientists began to explore how to develop an intelligent software that could interact like a human. However, due to the limitations of data and computing power, AI Agents lack the necessary realistic conditions.

Joon Park, a Ph.D. in computer science at Stanford University, once said in an interview: "We have been working in that direction, but all the methods in the past few decades have not even come close to what we are now achieving with LLM... That’s why we forgot about that vision. But when LLM came along, we realized that there was an opportunity.”

The big language model is the core brain of AI Agents. By dismantling complex tasks, complex user requirements can be disassembled into achievable task methods.

On the one hand, the training of large models is built on the basis of the Internet and contains a large amount of human behavior data, which makes up the key elements for building credible AI Agents.

On the other hand, with a considerable knowledge capacity, the large model emerges with excellent context learning ability and reasoning ability. By establishing a thinking chain to realize the continuous thinking and decision-making of the model, AI Agents can analyze complex problems and disassemble them into simple and detailed sub-tasks.

At the same time, LLM's use of language as a medium has also changed the front-end interaction form. Wen Yongteng, head of the AI application track of BV Baidu Ventures and vice president of investment, told "Jiazi Guangnian": "BV Baidu Ventures began to pay attention to the development of AI Agents very early. Through research and judgment, we believe that the original graphical user interface (GUI) It is possible to transform into a language user interface (LanguageUI), and the front-end application of AI Agents will exist in all front-end forms that may interact with humans."

It's just a dismantling task, and it's far from smart. AI driven by LLM Agents cannot do without three key components:

  • **Planning: **Decompose large-scale tasks into smaller, manageable sub-goals; conduct reflection and refinement, analyze, summarize and refine past behaviors to improve their intelligence and adaptability, Improve the quality of the final result.
  • **Memory (Memory): **Short-term memory, contextual learning; long-term memory, the ability to store and recall unlimited information for a long time, generally achieved through external carrier storage and fast retrieval.
  • Tool use: can learn to call external APIs to obtain additional information missing in the model weights.

Overview of AI Agent driven by LLM, image source: Lilian Weng personal blog

With the cooperation of the three components, AI Agents can not only think like a human, but also act like a human.

Just like humans, when engaging in complex tasks, there is often a process of reasoning between each step. AI Agents will also use ReAct components (a Java library for building user interfaces) to closely combine the reasoning capabilities of large models with behavioral decisions, so that language models can be logically planned and arranged based on knowledge.

The Reflexition framework provides AI Agents with dynamic memory and self-reflection capabilities. Strengthening Language Agents through language feedback rather than updating weights allows it to improve past action decisions and correct past mistakes to continuously improve its performance.

In the process of information acquisition, storage, retention, and retrieval, AI Agents also try to imitate the composition of human memory and build an efficient memory system.

Simulating the way of human memory, AI Agents will express sensory memory, short-term memory, and long-term memory as the learning embedding of the original input (such as text, image, etc.), context learning, and external vector storage. Tasks and results are stored in the memory module, and when the information is recalled, the information stored in the memory is returned to the dialogue with the user, thereby creating a tighter context.

One of the most distinctive human traits is the use and creation of tools. By being equipped with external tools and using APIs to call various interfaces, AI Agents can simulate human use of tools to complete more complex tasks.

Although the technical level is not fully mature, issues such as data management and long-term memory are still being solved. However, the ability of AI Agents to execute autonomously, iteratively optimize, and "free hands" also makes it inevitable to become popular.

2. Replacing LLM, AI Agents become the next AI hotspot

The birth of ChatGPT has realized the function of AI having multiple rounds of conversations with humans and providing information and suggestions. The introduction of Copilot has enabled AI to undertake the ability to complete the first draft of work for humans, such as Github Copilot, Microsoft 365 Copilot, and Midjourney, which have become people's "smart copilot" in the fields of programming, office work, and image generation.

Tell an AI to do a task, and it will do a task—write a copy, answer a question, or generate a photo that is hard for the human eye to tell is real or fake. At the same time, people often need to provide specific and clear prompts for every step of the AI.

At this time, AI is like an intern who has just arrived, has no experience, and needs to be taught by hand. However, what if you want a good employee who obeys orders, solves difficulties in execution by himself, and tries not to cause trouble to others?

In March and April, Camel, AutoGPT, BabyAGI, Westworld Township and other AI Agents exploded together, which seemed to make people see such a possibility.

Since Significant Gravitas open-sourced AutoGPT in March, within two months of its release, AutoGPT has received 130,000 stars on GitHub, making it the fastest-growing open source project in history.

Westworld town created by Stanford University

Image source: Paper "Generative Agents: Interactive Simulacra of Human Behavior"

Andrej Karpathy once said on Twitter: "The next frontier of prompt engineering (engineering) is AutoGPTs". Up to now, AutoGPT has obtained more than 140,000 stars on the code hosting platform Github, ranking 25th in history.

OpenAI co-founder and CEO Sam Altman has stated on several occasions that the era of building huge AI models is over, and that intelligent bodies are the challenge.

In an article introducing autonomous agents, the author, Matt Schlicht, co-founder and CEO of Octane AI (a data marketing platform provider), collected the views and opinions of more than a hundred people from the industry, academia, and investment circles. Experts from large companies such as Meta, Nvidia, Stability AI or AI start-ups, as well as Stanford CS faculty members and AI investors including Hugging Face, most of them expressed their expectations and prospects for the potential of AI Agents, It's even called "raw AGI".

Taking over from big models, AI Agents seem to be becoming the next big thing in AI.

But at the same time, voices of opposition are endless.

Turing Award winner Yoshua Bengio mentioned in his blog post "How AI that Harms Humans Appears" published in May this year that humans can control the general tasks and goals of AI Agents, but it does not mean that humans can control AI Agents by virtue of For the subtasks and subgoals decomposed by one's own intelligence, unless the research on AI alignment makes a breakthrough, human beings will not have a strong security guarantee.

The collective emergence of intelligent agents, the pursuit and doubts of big bosses, the wave of AI Agents is rapid and hot.

However, AI Agents is not a new term in the circle of artificial intelligence.

In 2014, the Go AI AlphaGo launched by DeepMind is actually a kind of AI Agents. Similar to this is the OpenAI Five launched by OpenAI in 2017 for playing "Dota2", and in 2019 DeepMind announced AlphaStar for playing "StarCraft 2".

The industry trend at that time was to train and improve AI Agents through reinforcement learning, which was mainly used in game scenarios, especially in some confrontational games with obvious winners and losers. But it's an open question if one wants to achieve generality in the real world.

In the next few years, OpenAI turned to large-scale language models, and the GPT series were launched one after another. Large-scale models became the track for various technology manufacturers to rush into. It is also the development of large-scale models that allow AI Agents to break through the bottleneck and re-develop. opportunity.

Compared with being limited to game scenarios a few years ago, what can AI Agents achieve on the basis of large models? Wen Yongteng, head of the BV Baidu Ventures AI application track and vice president of investment, said to "Jiazi Guangnian": "What we have seen is not only technological progress that greatly enhances AI's ability to understand user intentions, collect information, and perform tasks. More importantly, AI Agents are fully capable of reconstructing the future application ecosystem."

Shortly after the launch of AutoGPT, many netizens have used AutoGPT to build automated personal assistants. For example, Udit Goenka, the founder and CEO of FirstSales.io, posted that he used AutoGPT to build a prospecting engine that can search for companies that received seed round investment last year and describe the details of creating a list.

Yew Jin Lim, a software engineer at Google, said he used AutoGPT to create an email assistant that sends task details to AI Agents via email.

Dai Yusen, managing partner of ZhenFund, told "Jiazi Guangnian": "Agent is a direction that can really greatly improve productivity, because if people still do things, people are always limited."

"AI Agents will become a productivity tool in daily life and work." Matt Schlicht wrote, "From managing social media accounts, investing in the market, to publishing the best children's books, AI Agents will exist in every industry and every industry. tasks that can be imagined." For example, aomni is an AI agent that can search for information on any topic on the Internet, and will complete the user's goals one by one by creating a list.

In addition to productivity needs, Inflection AI's personal AI Agent Pi provides another possible application direction.

Different from the positioning of ChatGPT and Claude's general artificial intelligence, Pi focuses on high EQ, emotional companionship, and providing emotional value. Pi will also remember the historical conversations with users. In addition to participating in and assisting people's work and life, it will also learn how to contact friends and family to establish connections with users. At present, Inflection AI has received more than 1.5 billion US dollars in investment, surpassing Anthropic and second only to OpenAI.

**3. Will AI Agents be the next trend? **

"Building a kind of JARVIS (building similar to JARVIS)", this is Andrej Karpathy's latest updated profile on Twitter, JARVIS is an artificial intelligence assistant of the Marvel superhero Iron Man, who has the ability to think independently and can help the owner Handle various affairs and calculate various information.

The introduction of Karpathy also means that the starting gun for the AI Agents track has been fired.

Foreign media "The Information" pointed out that Sam Altman had privately told some developers in May that OpenAI hoped to make ChatGPT a personal work assistant, and a person familiar with the matter pointed out that OpenAI has been paying attention to how to use chatbots to create autonomous AI **Agents, related functions are likely to be deployed in the ChatGPT assistant. **

Coincidentally, Meta also sees an opportunity for AI Agents.

Back in April, Zuckerberg told investors that Meta saw an "opportunity to introduce AI agents to billions of people in a useful and meaningful way," but he didn't specify specific applications at this time.

And at an all-hands meeting with employees in June, Zuckerberg announced a series of technologies in various stages of development, one of which would bring AI Agents with different personalities and abilities to help or entertain, initially primarily for Messenger and WhatsApp.

In China, AI **Agents-related products have also been born one after another. **

At the WAIC site in early July, Alibaba Cloud released its first intelligent body, ModelScopeGPT, for the developer community, and will launch a series of intelligent bodies in the future to cope with various application scenarios. **

**Huawei is also involved in this field, but it focuses more on Embodied AI (Embodied AI), that is, the combination of large models and robots. **

In addition to big manufacturers, AI Agents is also an opportunity for entrepreneurs. OpenAI co-founder Karpathy specifically mentioned in his previous speech: "Ordinary people, entrepreneurs and geeks have more advantages than companies like OpenAI in building AI Agents."

Wen Yongteng, head of BV Baidu Venture Capital's AI application track and vice president of investment, said that the BV team is also currently optimistic about the opportunities for start-ups in the field of AI Agents.

"The future application ecosystem will be diversified, rather than dominated by a single giant. The emergence of AI Agents has brought an opportunity for a paradigm shift, and many traditional applications are facing the possibility of being disrupted. In this process, startups There are a lot of opportunities to open up new fields. For each specific task, AI Agents has a lot of room for optimization, including the construction of specific algorithms and services, user data, and product design. Startups can establish differentiation advantage."

"In addition, the current ecology of AI Agents is not clear enough, which provides favorable development opportunities for start-ups, because they do not need to compete under an established rule. From this perspective, start-ups and large companies are Standing on the same starting line, start-ups are more flexible and can quickly adjust their products.”

Relying on the knowledge accumulated over the years in the field of artificial intelligence, BV Baidu Ventures does not believe that model companies will monopolize opportunities at the application layer. Because for the underlying model companies, the significance of building an ecology is far greater than monopolizing an application. If the underlying model companies adopt an exclusive strategy to gain a competitive advantage in the application layer, it may cause harm to their own ecology. The underlying model companies may build strong AI Agents in one or two areas they focus on, but they don't necessarily have to compete with startups in all areas.

**The ecology that has not yet been determined, the arena that has not yet been formulated, and everyone is back on the same starting line. **

But it is undeniable that so far, apart from many demonstrations, AI Agents has not appeared as a real product.

Dai Yusen, managing partner of ZhenFund, compared the degree of cooperation between AI and humans to different stages of autonomous driving, and AI Agents is like the L4 stage of autonomous driving. But just like L4, AI Agents are easy to imagine and demonstrate, but difficult to realize. The real application of AI Agents is still in an uncertain future.

The degree of cooperation between AI and humans is compared to the different stages of autonomous driving. Image source: Dai Yusen instant account @yusen

Dai Yusen emphasized that in order to realize usable AI Agents, the ability of large models needs to be greatly improved. Even for the top-level OpenAI, there is still a lot of room for improvement in terms of delay and performance.

"If you use a steam engine as an analogy, steam can only be produced when water is heated to 100 degrees. If the intelligence of AI Agents has not reached a certain level, the water is only heated to 50 degrees. Even if a lot of energy has been spent, steam still cannot be produced. It's 0."

The starting gun for the AI Agents track has already started, but this is definitely not a sprint in just a few months, but a long-distance marathon that is destined to last for several years, or even span ten years.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)