Название | Artificial intelligence. Freefall |
---|---|
Автор произведения | Dzhimsher Chelidze |
Жанр | |
Серия | |
Издательство | |
Год выпуска | 0 |
isbn | 9785006509900 |
Below are statistics for LLM models from different manufacturers.
Compare the indicators LLaMa 2 70B, LLaMa 2 7B, LLaMa 2 13B. Indicators 70B, 7B and 13B conditionally demonstrate the complexity and training of models – the higher the value, the better. But as we can see, the quality of responses does not radically change, while the price and labor costs for development increase significantly.
And we can see how leaders are building up computing power, building new data centers and hurriedly solving energy supply and cooling issues for these monsters. At the same time, improving the quality of the model by a conditional 2% requires an increase in computing power by an order of magnitude.
Now a practical example to the question of maintenance and maintenance due to degradation. Tut will also be noticeable about the effect of people. Any AI, especially at an early stage, will learn based on feedback from people (their satisfaction, initial requests and tasks). For example, the same ChatGPT4 uses user requests to retrain its model in order to give more relevant answers and at the same time reduce the load on the “brain”. And at the end of 2023, there were articles that the AI model has become “more lazy”. The chatbot either refuses to answer questions, interrupts the conversation, or simply responds with excerpts from search engines and other sites. And by mid-2024, this has already become the norm, when the model simply cites excerpts from Wikipedia.
One possible reason for this is the simplification of the user requests themselves (they are becoming more primitive). After all, LLMs do not invent anything new, these models try to understand what you want them to say and adapt to it (in other words, they also form stereotypes). It also looks for the maximum efficiency of the labor-result bundle, “disabling” unnecessary neural connections. This is called function maximization. Just math and statistics.
Moreover, this problem will be typical not only for LLM.
As a result, to prevent the AI from becoming degraded, you will have to load it with complex research, while limiting its load to primitive tasks. And once it is released into the open world, the ratio of tasks will be in favor of simple and primitive user requests or solving applied problems.
Remember yourself. Do you really need to evolve to survive and reproduce? Or what is the correlation between intellectual and routine tasks in your work? What level of math problems do you solve in this job? Do you need integrals and probability theory, or just math up to 9th grade?
The second factor is the amount of data and hallucinations.
Yes, we can increase the current models by XXXX times. But the same ChatGPT5 prototype already lacks training data in 2024. They gave him everything, they had. And with a modern AI that will navigate uncertainty, there simply won’t be enough data at the current level of technology development. You need to collect metadata about user behavior, think about how to circumvent copyright and ethical restrictions, and collect user consent.
In addition, using the current LLMs as an example, we can see another trend. The more “omniscient” a model is, the more inaccuracies, errors, abstractions, and hallucinations it has. At the same time, if you take a basic model and give it a specific subject area as knowledge, then the quality of its responses increases: they are more objective, she fantasizes less (hallucinates) and makes fewer mistakes.
The third factor is vulnerability and costs.
As we discussed above, we will need to create a data-center worth a trillion US dollars. And its power consumption will exceed all current electricity generation in the United States. This means, that the creation of an energy infrastructure with a whole complex of nuclear power plants will also be required. Yes, windmills and solar panels can’t solve this problem.
Now let’s add that the AI model will be tied to its “base”, and then one successful cyber-attack on the energy infrastructure will de-energize the entire “brain”.
And why should such an AI be tied to the center, why can’t it be distributed?
First, distributed computing still loses performance and efficiency. These are heterogeneous computing capacities that are also loaded with other tasks and processes. In addition, a distributed network cannot guarantee the operation of computing power all the time. Something turns on, something turns off. The available power will be unstable.
Secondly, it is a vulnerability to attacks on communication channels and the same distributed infrastructure. Imagine that suddenly 10% of the neurons in your brain just turned off (blocking communication channels or just turned off due to an attack), and the rest are working half-heartedly (interference, etc.). As a result, we again have the risk of a strong AI that forgets who it is, where it is, for something, then just thinks for a long time.
And if everything comes to the point that a strong AI will need a mobile (mobile) body to interact with the world, then this will be even more difficult to implement. After all, how to provide all this with energy and cool it? Where do I get data processing power? Plus, you also need to add machine vision and image recognition, as well as processing other sensors (temperature, hearing, etc.). This is huge computing power and the need for cooling and energy.
That is, it will be a limited AI with a permanent wireless connection to the main center. And this is again a vulnerability. Modern communication channels give a higher speed, but this affects the reduction of range and penetration, vulnerability to electronic warfare. In other words, we get an increase in the load on the communication infrastructure and an increase in risks.
Here, of course, you can object. For example, the fact that you can take a pre-trained model and make it local. In much the same way as I suggest deploying local AI models with “additional training” in the subject area. Yes, in this form, all this can work on the same server. But such an AI will be very limited, it will be “stupid” in conditions of uncertainty, and it will still need energy and a connection to the data transmission network. That is, this story is not about the creation of human-like super-beings.
All this leads to questions about the economic feasibility of investing in this area. Especially considering two key trends in the development of generative AI:
– creating cheap and simple local models for solving specialized tasks;
– create AI orchestrators that will decompose a request into several local tasks and then redistribute it between different local models.
Thus, weak models with narrow specialization will remain more free and easier to create. At the same time, they will be able to solve our tasks. And as a result, we have a simpler and cheaper solution to work tasks than creating a strong AI.
Of course, we leave out neuromorphic and quantum systems, but we will discuss this topic little later. And, of course, there may be mistakes in my individual figures and arguments, but in general I am convinced that strong AI is not a matter of the near future.
In summary, strong AI has several fundamental problems.
– Exponential growth in the complexity of developing and countering degradation and complex models.
– Lack of data for training.
– Cost of creation and operation.
– Attachment to data centers and demanding computing resources.
– Low efficiency of current models compared to the human brain.
It is overcoming these problems