4 top tips for building better LLM-powered apps

In the year since OpenAI released its first ChatGPT model, interest in generative AI has exploded. Apps powered by large-scale language models (LLMs) are now at the forefront of how enterprises think about productivity and efficiency, and the tools and frameworks for building generative AI apps have expanded significantly. However, there are still concerns about the accuracy of generated AI output, and developers must quickly learn how to address this and other issues in order to build powerful and reliable apps.

Here are some advice and techniques to improve the accuracy of your LLM app, as well as some considerations for choosing the right LLM. Each of these issues is complex in its own right, so we cannot address them exhaustively, but we will offer some advice to get you started so you can explore them further.

Streamlit, a free, open-source framework for rapidly building and sharing machine learning and data science web apps, recently launched 21,000 built by over 13,000 diverse developers on the Streamlit Community Cloud. We have published a report that analyzes over 100 LLM apps. This provides insight into some of the tools and techniques that developers have been using to build apps, and provides some of the advice below.

For example, vector search tools are effective at improving contextual recommendations for LLM-powered apps, but our research shows that only a minority of developers currently use vector capabilities. As it turns out, this presents great opportunities for the future.

As more developers harness the power of generative AI for app development, apps across a variety of categories and industries will begin to include AI-based search in addition to conversational and assisted experiences. Below are four tips to help developers build better LLM-powered apps that can bring true disruption to your organization.

Employ agents and orchestration for smarter apps

Orchestration frameworks like LangChain and LlamaIndex can help you complement your model with additional tools or agents that enhance the functionality of your LLM-based apps. In this context, think of an agent as a plugin system that allows you to incorporate additional functionality expressed in natural language into your app.

These agents can be combined to manage and optimize LLM features such as improving AI inference, addressing bias, and integrating external data sources. The agent can also provide a way to reflect whether LLM is in error and what steps are required to successfully complete the task.

As an analogy, consider how a developer creates an API that provides specific functionality and the documentation that describes it. APIs are expressed as code and documentation is written in natural language. The agent works in a similar way, except that the documentation is provided for the benefit of her LLM rather than other developers. Therefore, the LLM looks at the task at hand, looks at the agent’s documentation, and determines whether the agent can help complete the task.

These agents also make LLM apps more robust by providing a way for apps to reflect on and correct their mistakes. For example, let’s say your LLM app writes SQL code to perform a task such as checking inventory levels in a database, but an error occurs in the code. In a standard “simple” LLM app, the error is a dead end.

However, if your app has an agent that executes SQL, you can see the error, use that agent to determine what to do differently, and fix the error. This may be as simple as a small syntax change, but without an agent, the LLM has no way to reason about its mistakes.

Combat Hallucinations with Vector Magic and RAG

Your LLM may not have access to all the information you need to complete your desired task. Although you can insert additional information when prompted, most LLMs place limits on the size of these prompts. To work around these limitations, LLM may need to use vectors to query external databases. This is a technique called search augmentation generation (RAG).

To understand what RAG can do for LLM apps, it helps to think about the three different levels of LLM apps.

Level 1: The app can use existing knowledge within the LLM to generate results.
Level 2: Your app requires additional information that can be inserted when prompted. This is very easy if you stay within the limits of the prompt.
Level 3: LLMs require access to external information sources, such as databases, to complete their tasks.

Level 3 introduces RAG. External databases are typically indexed semantically using vectors. That’s why we’ve been hearing a lot about vector databases and vector search tools lately.

Apps with vector databases and vector searches enable fast, contextual searches by classifying large unstructured datasets (including text, images, video, or audio). This is very effective in creating faster and more powerful contextual recommendations. However, vector tools are not yet widely used. Streamlit research found that only 20% of AI-powered apps use some form of vector technology.

Chatbots offer users a powerful way to narrow down their queries

Although chatbots have brought generative AI to the mainstream, there is some skepticism about whether chatbots will become an effective interface in the future. Some argue that chatbots give users too much freedom and don’t give them enough context about how to use an LLM app. Some people are reluctant because of past failures. Clippy was a disaster, so why should chatbots succeed today?

Obviously, whether a chatbot is right for you depends in part on the intended use of the app. But chatbots have at least one very useful characteristic that you should overlook. Chatbots provide a simple and intuitive way for users to add context and refine answers through a fluid, human-like interface.

To understand why this is so powerful, think about search engines. There is typically no way for users to narrow down their search engine queries. If the results are slightly different, there’s no way to tell the search engine to, for example, “remove answers for X and try again” or to “give more weight to Y.” This is a useful and powerful feature that the chatbot provides to her LLM app.

The study found that 28% of generative AI apps built with Streamlit were chatbots, while 72% generally did not allow for conversational refinement. Conversely, the study shows that weekly usage of these chatbots has increased to nearly 40%, while usage of non-chatbot apps has decreased. Therefore, chatbots may be the preferred interface for end users. The report includes examples of apps with different modes of accepting text input, so you can take a look and see what’s possible.

Consider alternatives to GPT, including open source LLM

The basic GPT model is still the most well-known LLM and is very capable, but more options have emerged over the past year that may be better suited for your app. Factors to consider include the breadth of knowledge required for the LLM, the size of the LLM, training needs and budget, and whether it matters whether the LLM is open source or proprietary. As with many things in the technology industry, there are trade-offs.

If you’re building a generative AI app for internal use, you may need to train LLM on internal data. For most companies, sharing sensitive data with a public LLM is off-limits for security reasons, so many companies run their LLM within their existing cloud security perimeter. This often leads to choosing a smaller LLM such as AI21 or Reka.

Also, very large LLMs tend to have higher latency and are usually more expensive to run due to the computing resources they require. If your app performs relatively simple tasks such as translating text or summarizing documents, a small LLM may work well and significantly reduce usage and operational costs.

There may also be reasons to prefer open source LLMs such as Meta’s LLaMA over proprietary LLMs such as OpenAI, Anthropic, or Cohere. These LLMs typically do not expose the source code, training data, weights, or other model details. Open source LLMs require self-hosting or inference through a hosting provider, but the source code and other model details are more easily available.

Get started with generative AI today

Generative AI is still a rapidly emerging field, but the tools and technology needed are rapidly evolving, and there are many options to get started today. Developers who seize this opportunity can provide significant value to their organizations by incorporating AI apps as a regular feature of daily business operations and tasks. As generative AI continues to reshape roles and responsibilities across organizations, developers who lean towards LLM-powered apps and become experts will come out on top. The advice above should help you get started on the right track.

Adrien Treuille is Director of Product Management and Head of Streamlit at Snowflake, where he oversees the Data Cloud’s visual data products and Streamlit efforts. Prior to joining Snowflake, Adrien served as co-founder and CEO of Streamlit (which he acquired by Snowflake in March 2022).

Subscribe to Updates

What's Hot