An Agile Methodology and a Roadmap to Deploy Responsible AI Products
This post comes from Saeed Elnaj, RELI Group’s Chief Information Officer. Saeed is an accomplished technology leader with 30 years of experience in transforming business strategies into digital products and solutions. Known for his expertise in Generative AI, cloud technologies and digital transformation, Saeed’s career includes notable roles such as Senior Customer Success Manager at Amazon Web Services, CIO at HealthKey Technologies, and Vice President of Information Technology and CIO at the National Council on Aging. He has successfully led major cloud migrations, CRM projects and contributed to the Human Genome Project.
It is estimated that by 2025, 750 million apps will be built using large language models (LLMs), with 50% of the digital work processes expected to be automated using current technologies. GenAI could prove to be transformative in many business processes and across the value stream for the private sector and the federal government alike. With the last weeks and days of 2024 upon us, we are approaching the end of the experimentation phase with this technology, and 2025 will prove to be the year of transformative projects in production. The time is now to identify, evaluate, prioritize, plan and budget for use cases to implement them.
Examples of GenAI in Action
We already see successful GenAI projects deployed into production and generating business value. Some of the most successful generative AI projects in the private sector include innovations by Morgan Stanley, GitHub and JPMorgan Chase, demonstrating AI’s potential in enhancing efficiency, improving decision-making, and delivering business value at scale. Morgan Stanley’s Knowledge Assistant, built on GPT-4, helps advisors rapidly access insights, reducing time spent searching through complex research, while GitHub Copilot assists developers by automating repetitive coding tasks, thus boosting productivity and quality. Coca-Cola leverages generative AI to create unique marketing content at scale, while Kroger uses AI-driven recommendations to enhance customer shopping experiences.
Various federal agencies have also successfully deployed generative AI projects that highlight significant advancements in service delivery. The IRS is utilizing generative AI technologies to analyze tax returns for patterns indicative of tax evasion among high-income earners, which is crucial for addressing the estimated $688 billion tax gap. Other federal agencies employing generative AI include the National Institutes of Health, where generative AI was applied for biomedical data analysis, enhancing the speed and accuracy of findings.
Building a Responsible AI Framework
There is a tendency within the federal government to be more cautious with implementing new technologies, especially ones that introduce risks. This is clearly a prudent approach and as the Transportation Security Administration’s CIO stated, “What we’re trying to do is take the crawl, walk, run approach. Let’s start with a very safe, generative AI use case that we can do in a controlled environment, and then — as we get confidence in the testing information to say that the performance of those systems is within our guardrails and guidelines — then we can look at how we can try and expose those to maybe passengers or industry stakeholders as tools that they can start using.”
Given the potential transformative power of GenAI and the risks that it carries, the question is how to go about deploying projects at mass scale or creating the so-called “conveyer belt” that can repeatedly, predictably and safely deliver projects into production. Such deployments need to comply with specific business, legal and ethical guardrails with high accuracy and generate business value.
A Roadmap for Deploying GenAI Use Cases
So how do we go about identifying, evaluating, prioritizing, developing, testing, deploying and operating such use cases? A critical set of AI/ML/GenAI techniques are emerging such as MLOps, FMOps and LLMOps to manage the development, testing and deployment of GenAI projects. However, these techniques do not cover the upstream stages of the use case development lifecycle (UCDLC) that start with evaluating, selecting and prioritizing use cases, then extends to collecting functional and non-functional requirements, and many other phases, steps, processes and techniques that are essential to take use case from an inception to production.
Steps in the GenAI Use Case Development Lifecycle
Agile methodologies and techniques are foundational for deploying GenAI use cases. However, they need to be extended to cover the GenAI-specific aspects of the UCDLC. The GenAI UCDLC needs to include, among other things, the following phases, steps and techniques:
- Use Case Selection and Evaluation: This is a critical step to identify, select and evaluate use cases that can move forward to the next stage. In a previous article, I wrote about a process that helps in the selection of use cases. The process includes the following basic criteria:
- Deliver business value
- Include data sets that can enhance LLM accuracy and reduce hallucinations
- Lower risks
- Lower complexity
- Have available resources to implement the use case.
- Data Readiness Assessment: During this stage, assess the necessary data, its availability and labeling, and compliance with data security and privacy policies.
- Risk Analysis: Risk analysis for GenAI use cases is different and specific risks such as biases, toxicity, misuse, IP rights, data privacy concerns and legal compliance must be examined in ways that are different from non-GenAI projects.
- Requirements Management: GenAI use cases have many similar aspects to the general agile methodologies of collecting, confirming and managing requirements. However, there are different requirements that need to be collected and managed, such as guardrails for toxicity, misuse and hallucination, and what is acceptable and what is not. One of the specific non-functional requirements that is critical to the success of GenAI projects is defining accuracy. As the technology has a high probability of not being 100% accurate in generating the desired output, the critical question becomes: what is the acceptable accuracy, how it is measured, and how will it be continuously monitored and improved? The user journey and the usability requirements are another critical aspect of defining the requirements for the use case. GenAI and its underlying LLMs require prompt engineering to provide the LLM with the specific context of the use case to increase the accuracy of the output. A properly designed UI/UX and user journey can add another level of context to the LLM through pre-developed end-user prompts. A new GenAI specific UI/UX domain will emerge to support the designs of use cases.
- Model Selection: There are over 700,000 LLMs available on the HuggingFace repository alone, and some 79 reputable LLMs that have been evaluated by the Stanford Center on Foundation Models: Holistic Evaluation of Large Models. But how do you select the model that is most relevant to the selected use case? This is a daunting task by itself, so the methodology needs to have the specific technique or the GenAI governance framework needs to guide or limit the LLMs that can or cannot be considered for the use cases. In addition, I’ve worked on projects where more than one LLM was used – one for synthetic data generation, another for implementing the use case, and a third to conduct automated testing. In a scenario where more than one LLM would add value, then a selection effort is required to decide which LLM should be used for which function, when, and at what cost.
- Design Approach: There are many aspects to the GenAI use case design. First, the design should allow for LLM swapping when the desired outcomes are not achieved with criteria that include when to decide on whether to switch to another LLM and how to go about doing so. Currently chatbots and “chatting” with the data is the dominant UI for most of the GenAI implementations, but other UI options are also possible and should be considered. How about literally “talking” to your data using conversational AI smart devices such as Alexa, Google Home or Siri? A lot of us do this daily in our private lives. Why not design business solutions using such an approach? Or how about gesturing to your data using AR/VR to visualize the LLM generated output and data and to interact with it in an unprecedented way?
- Solution Testing: Large Language Model Operations and Functional Model Operations will be required to automate the process of testing the use case and automating the testing including generating test data. The methodology needs to also answer the question of when to inject the “Human in the Loop” to ensure high quality output and proper testing.
- Operating the Solution: Operating GenAI solutions is different from traditional solution deployments into production. GenAI use case operations are even different from the traditional AI/ML solutions. The nature of LLMs and the nature of the industry evolving at a rapid pace means continuously evaluating accuracy, cost, toxicity, misuse and other guardrails. This requires the creation of a set of KPIs that are continuously used to measure the value and performance of the solution, specifically comparing cost and accuracy with other LLMs. In addition, the UI/UX needs to be monitored and evaluated to determine how to improve it based on user feedback and interactions, and possibly shifting from one UI/UX paradigm to another, such as from chatbot to conversation, gesturing or a combination of interfaces.
RELI Group’s agile GenAI methodology addresses these steps to help identify, evaluate, select, prototype, test, deploy and operate GenAI for virtually any use case. Contact us today to launch your next project!