Closed-Source chatGPT Will Not Work
Advice for Enterprise AI Leaders
There is no doubt that the release of chatGPT has captured the popular imagination. It has demonstrated possibilities of generative artificial intelligence, that seem like science fiction.
“Any sufficiently advanced technology is indistinguishable from magic.” — Arthur C. Clarke
chatGPT’s clever use of a commonly understood chat-like front-end, has accelerated its usage, making it a household name. However, is there anything unique or proprietary about chatGPT?
In this article, I am not bothered about the much debated ‘short-comings’ of chatGPT such as hallucinations, data-privacy, AI-trust/transparency etc, as all of them are solvable engineering issues. I am most bothered about helping AI Leaders decide if they should invest in chatGPT or wait for other options.
Let us explore
The Roots of Competitive Advantage
In machine learning (ML) engineering, there are only 3 possible areas for creating competitive differentiation. Unique processing capabilities (e.g. nVidia), ML model design (rare breakthroughs like Transformers, provided they are not open-sourced) and foundational training data. Let’s look at chatGPT from these 3 lenses.
Processing power
Indeed, chatGPT’s creator openAI has benefited from $12B of investments from a big technology partner, which has enabled openAI’s access to highly constrained GPU resources.
It is estimated that training chatGPT requires a super-cluster of 10,000 V100 Nvidia GPUs.
It requires not just a massive GPU count, but also a specialized training infrastructure that can only be provided by hyperscaler Cloud vendors today. For example, GPT-3 (chatGPT is v3.5) required 700GB of memory (each parameter requires about 4 bytes), which is more than 2-times the memory available via Nvidia V100 GPUs. Overcoming this requires special techniques such as Model Parallel training and a lot of software management tooling. However, this too is not a sustainable advantage. As we will argue later, there are different ways of overcoming deep investments and technology resources of a single large company. Money is usually the most fungible of all resources. Hence the advantage of deep pockets and available GPUs is unlikely to be long lived.
Technology history demonstrates that necessity is the mother of invention. Many big trends are already emerging, such as:-
(1) New GPU Clouds: Nvidia and many other niche cloud providers are working overtime to make such processing power available to those who can afford to pay.
(2) Crowd-funded Training and Inference Infrastructure. Open community platforms can invest upfront and democratize amortize their costs by trading against hundreds of users. The highest cost for running the systems is in hosting an inference. It is much easier to optimize inference costs by combining a large variety of customers and payloads than for a single wall-gardened customer.
(3) Infrastructure Innovation for dramatic cost reduction. The current cost of training chatGPT is estimated around $5 million. In silicon engineering, a 10X improvement in 6 to 9 months is not far-fetched. At Half $1 million per training cycle, large language models will be well within reach of every day start-ups.
Model design
It is well established that the core concepts of large foundational models like chatGPT are based on open-source ideas like Transformers and deep neural network design. While these skills are still new, none of them are proprietary and are already being taught at universities and even on YouTube. Pre-built large language models and open-source libraries are freely available through community platforms like HuggingFace, Replicate and others. Until recently even openAI was truly “open” and was publishing most of its research.
Indeed, there are some unique aspects of ML model optimization that chatGPT must have innovated. However, none of them are unique enough to provide long-term competitive barriers. At the end of the day any large language model is primarily a gigantic matrix of additions and multiplications, following open-source patterns (algorithms) and a lot of optimization hacks. By closing access to chatGPT, openAI is betting that people will prefer to rent it’s monopoly power instead of finding ways around it. Sounds similar to the early days of Internet and the positioning for Internet explorer? Perhaps.
Technology history also demonstrates that closed, in-house insights are short-lived. The following outcomes are already happening, as has always been the history of the silicon valley:
1. Reverse engineering with the power of the crowds and venture capital. The crowds need a little more motivation than a proof of existence like chatGPT, the threat of a monopoly and the promise of a big prize like the new Internet. Evidence shows massive pivots happening in all start up accelerators and venture funds to support this outcome.
2. Basic human ambition. It is no surprise that most of the open source competitors to chatGPT, have been founded and funded by people who used to be insiders at openAI, Google DeepMind etc. Worse, these start-ups are created by innovators who have get a new insight or identify a gap, which can create a bigger play. Closed source monopolies create their own biggest enemies.
Training Data
The training data used for chatGPT is 90%+ percent open source, and hence free available on the Internet. Indeed, chatGPT has invested a significant amount of money in formatting this data in a way required for its model training format (e.g. prompt engineering). However there is nothing proprietary about the data itself.
1. Overtime, it would be impossible to compete with crowd-sourced training data and open-source benchmarks, as has been established since the early days of Machine Learning history such a large database of training sets listed here.
2. Industry groups will collaborate to build open-access data sets, that they will only share with other open-sourced partners. For example, see BMW’s effort in releasing an Industrial Training Data set.
Making a bet against open source training data is too big for any one enterprise, to support overtime.
The Less Known Fact About Training Loss
openAI itself observed in its 2022 publications, that beyond a point (which has already been passed), the quality of a foundational ML model benefits more from increased quantity and quality of data, than brute processing power. See below the famous observations on “Training Loss”, which basically says that without increasing the amount of data 10 times, there is no value of increasing the processing power by even 3 times.
This creates the startling conclusion that in the near future, the value of increased processing power will be dwarfed by that of training data.
This alone should make anyone pause before investing too much against open source and collaborative efforts, for creating large training data sets and investing in closed in-house solutions.
What About The First Mover Advantage?
Of course, there are sizeable benefits for chatGPT, being an early mover and a deep investor. If they execute well, they can capture a large share of the early adopter market. However, remember that most applications of Generative AI are likely to be in SaaS — which can easily be replaced by end-users who do not see a stickiness to the solution. Perhaps, if openAI learns from history, they may yet change their closed-source positioning.
Conclusion
All in all, my advice to large enterprises and business leaders is to be cautious about investing too much in any closed-form foundational model solutions. There is just too much of recent history to convince you, otherwise.
So how does one build and create competitive advantages in the new emerging artificial intelligence landscape? Wait for my next blog.