Generative AI Enterprise Adoption :
Real Challenges No One Is Talking About
(and some solutions) — Part 1 of 4
Deepam Mishra | www.tbicorp.com | linkedin.com/in/deepammishra
Background
Most of us have now had a chance to play with and get amazed at the capabilities of chatGPT, Midjourney, DALL-E, etc. Most business leaders I speak to, are also keen to exploit the significant automation potential of Generative AI (“GenAI”).
While there is a lot of general discussion and commentary on the ‘challenges’ of Generative-AI but most of those discussions are somewhat academic or pertain to consumer-facing aspects of the technology. Imho, most of it is the result of the sheer surprise with which Gen-AI has caught humankind. As Carl Segan said, any such big technical leap always looks like magic.
However, for me, the real pertinent discussion is how businesses can start using this powerful sledgehammer RIGHT NOW while avoiding harmful side effects. And what use cases should businesses start to focus on?
Every week I find myself talking to top business leaders who are unsure of how, where, and if to begin. So today I will tackle some of the concerns and myths surrounding the adoption of Generative AI in enterprise applications.
Enterprise Adoption Challenges
In my opinion, there are 4 real concerns and challenges that are causing enterprises to pause. Business leaders who can get a firm understanding of these can start taking advantage of Gen-AI right away, realizing sizeable competitive advantages. In this 4-part series, I will address the following:
1. Generative AI gives factually wrong or non-referenceable responses (e.g. hallucinations)
2. “We have no control over data privacy and security” — Read here
3. “Generative AI is too expensive for routine tasks”
4. “Generative AI systems cannot be trusted or may harm users (black box)”
Concern #1: Wrong Answers
“chatGPT provides almost 80% of factually wrong responses. My business case needs 100% accuracy,” said a CTO of a large bank.
True or False?
While this is the real experience for many chatGPT users, but imho such generalized statements suggest a limited exposure and surface understanding of how Gen-AI base models work. As we will see, these do not represent the best practice for using Gen-AI in enterprises.
The Two Faces of Gen-AI
In a simplistic sense, most people mix up 2 important yet different innovations that are offered by Generative-AI base models — and hence mix up the evaluation from these 2 distinct aspects:-
(1) Language Understanding and Language Generation — it is perhaps beyond debate that today’s foundational models (FMs) have a ‘smart-human’ like understanding of language and communications. These FMs can easily ‘understand’ complex queries which points to significant advancements in Natural Language Understanding (NLU) state-of-the-art. Similarly, these FMs can also generate language responses that are very natural and engaging, unlike bot-like terse or limited responses of yore. Today’s FMs not only incorporate advanced language skills and grammar, but they can also lace their responses with humor, emotion, and style. I am sure you have seen chatGPT spout meaningful poetry on suggested topics, and know what I am implying here. Overall, these FMs provide a vastly superior communication experience than erstwhile chatbots. The language and communication proficiency of FMs is one of their core strengths.
(2) Data Search and Retrieval — the ability of FMs to retrieve information based on language understanding, is a wholly different aspect and can be separated from language skills. This ‘skill’ requires that an LLM (or an FM) not only understand the essence of a query but be able to match a related text from its training corpus. While this may sound simple, this is a very complex process and is akin to the entire technology behind internet search engines like Google.
To do a good job at search and retrieval, the LLM has to know what the best ‘match’ is for a given query, and hence is not an easy task. Finding a good match may require a deeper understanding of aspects like
· Search domain — e.g. a mathematical query or a query on a specific industry term like financial arbitrage, may require a nuanced understanding of the topic. The answer cannot simply be a search-and-fetch exercise
· Unpacking query intent — sometimes the query itself may be confusing or layered and LLMs may interpret them in multiple ways. For example, if someone asks a forecasting system to predict when the ‘weather will get better’ the intent may represent different things based on location, type of personal preferences, etc.
· Lack of specific or direct answers — sometimes answers are simply not available or perhaps require multi-stage queries. This may be beyond the capability of a regularly trained LLM
· Prioritizing and/or combining multiple answers — searches such as conventional Google search often produce a ranked list of results. Gen-AI systems are often asked to provide a summarized or single response which may be more difficult
Overall, this second set of capabilities of FMs is related to in-context-search and result ranking and is wholly distinct from synthesizing language.
So How To Think About GenAI Accuracy
Users should assess the abilities of Generative AI models based on the above capabilities. Generative AI superpowers can and should be used for language understanding and human communications. This is where current tools such as IVRs, chatbots, etc. offer a very underwhelming and inefficient experience.
As a special case example for enterprise search, developers can separate the task of information search and retrieval to conventional methods and/or use medium-sized Foundational Models with custom training. Doing so will enable businesses to better design overall systems that provide better control and accuracy. In this example, the ring-fenced use of GenAI (for communications) will minimize the concerns regarding wrong answers.
This is not the only design pattern for reducing wrong answers. Better-engineered prompts, fine-tuned base models, post-search filtering policies, and other methods can also help.
Key Takeaway
As our understanding of FMs improves, we will find better ways of controlling their performance. However even today, there are proven and reliable design patterns for using FMs for serious enterprise use cases. As an example, using different FM-based systems for data search-and-retrieval and data communication (or “generation”) can reduce overall concerns.
Next up: “If I use chatGPT for business, I will have no control over my data privacy and security”
© 2023 copyright Deepam Mishra. All rights reserved