Every Allocator Should Ask These Questions Before Hiring a Manager Using Large Language Models

Art_LLMs-for-Assets_0307 1.jpg

Illustration by II

Asset owners must not be fooled by the hype.

Since the release of OpenAI’s ChatGPT in late 2022, we’ve seen companies in every sector trip over themselves to incorporate generative artificial intelligence into their operations.

Not to be left out, investment managers too are crowing about their adoption of generative AI, typically in the form of large language models (LLMs).

Generative AI and LLMs are distinct but related concepts. Generative AI models produce content beyond textual data, such as images and music; LLMs are a subset of generative AI specializing in understanding language patterns to make accurate predictions and generate text. This article will focus on LLMs.

At this point, most managers simply use the technology to improve operational efficiency and reduce costs in areas outside of investments, including customer service, client communications, document processing, and compliance. However, a few are using LLMs to help identify patterns, trends, and sentiment that may be relevant to investment decisions. LLMs’ ability to rapidly analyze large volumes of textual data from multiple sources, such as financial reports, news articles, and social media, help with event prediction, risk management, portfolio construction, and trade execution, among other things.

Given managers’ growing affection for LLMs, I expect some will soon attempt to incorporate the models directly into the process they use to select investments. In turn, allocators will need to assess a manager’s choice of technology and use cases as part of their due diligence.

Allocators should make sure that their vetting process goes beyond managers’ demonstrations of slick user interfaces and general claims of success. Integrating LLMs into an investment process is a complex and expensive project with considerable investment and business risks and ethical considerations.

As a complement to my previous article on how allocators can vet a manager’s use of AI, I offer the following insights to help allocators understand and evaluate firms’ use of LLMs in their investment processes.

Why Use LLMs in the First Place?

Not every manager needs to use an LLM. A good rule of thumb is if you can solve the problem without AI, then so much the better. Allocators should begin by asking a manager to explain the rationale behind their decision to use an LLM. Who made the decision and how did they come to the decision? Was the decision-making process documented? How do these choices support the firm’s strategic plan? Who oversaw the development, evaluation, and implementation of this project, and does that person or team have the qualifications to undertake such a role and sufficient resources to continue developing and evaluating the project’s efficacy?

Allocators also should ask about the expected use cases, the benefits the manager believes the LLM can provide, and the implementation timeline for projects in development.

In addition, does the firm have a sufficient budget to cover the initial and ongoing costs of using an LLM, including hiring and retaining specialized data science and engineering talent, compute power, and data?

Governance is a critical business issue. As a result, allocators need to determine whether the firm has an AI policy document clearly defining the practices and processes around using the LLM and, more specifically, whether it has controls in place to protect its data. It’s not enough to hear about these controls. Managers need to ask to see the documentation for themselves.

When Regulations Can’t Keep Up With the Technology

The pace of technological development and deployment in AI outpaces the ability of regulators and policymakers to draft frameworks that support innovation and adoption while ensuring responsible use. As a result, the regulatory and policy frameworks surrounding AI, and specifically generative AI, are evolving. Proposed regulations and policies so far generally indicate an incomplete understanding of the technologies and their use cases. The demand that LLMs be interpretable is one example.

As regulated entities, managers must already adhere to a complicated set of domestic and international regulations. Using LLMs amplifies this burden. Allocators should ask what procedures and documentation are in place to ensure ongoing compliance with regulations.

Once satisfied that the firm has made a sound business decision, committed the necessary resources to execute the decision, and established a governance structure to manage and mitigate possible business risks, allocators should focus on technical matters related directly to the LLM itself.

Inside LLMs

Managers generally access LLMs in one of several ways.

  1. Use an existing LLM like ChatGPT or one bought from a different third party.
  2. Build a proprietary LLM from scratch.
  3. Fine-tune a pretrained open-source LLM for their specific purposes, either independently or with a vendor.
  4. Enrich LLM prompts.

Allocators should be familiar with these choices and their respective benefits and challenges.

Using an Open-Access LLM

The most common, accessible, and cost-effective choice is for a manager to simply use an open-access LLM like ChatGPT. A recent Alternative Investment Management Association hedge fund survey supports this view.

While this choice allows for rapid adoption, its utility is limited. For example, general tools have a limited understanding of specialized information and expose managers to business risks such as continued access and price. Open-access LLMs also have potential technical issues, including misinformation and data theft, and have the same risks as LLMs purchased from a third party like Anthropic. In addition to concerns about model quality and data privacy, allocators using vendors need to think about the scope and duration of the contract, concentration risk, and the possibility that the third party could get out of the business of providing the service at some point.

Ask the manager how it uses this model, who is permitted to use it, and what policies the firm has in place to protect its data.

Building a Proprietary LLM

While a de novo approach allows a manager to address domain- and enterprise-specific issues and increase the likelihood of accurate outputs, building an LLM is an incredibly expensive, time-consuming project that requires extensive in-house data science talent, high-volume data sets, and expensive compute cycling. (Even with Microsoft’s substantial financial and technical support, it took OpenAI years to build GPT models.) Yet sufficient time and resources do not guarantee success. Ask the manager to provide details about the project’s budget and timeline, data sources, the size of the training set, operational infrastructure, integration plans, the metrics used to determine if the model is production-ready, and the development team, including their CVs.

This is the most time-intensive and expensive option, making it suitable for only large, well-resourced managers. (BloombergGPT is an example of such an LLM.)

Fine-tuning a Pretrained LLM

In this approach, a manager applies one of the growing number of off-the-shelf, pretrained LLMs to its proprietary content and fine-tunes the many parameters that describe content patterns.

Fine-tuning an LLM is the process of further training the pretrained model on a specific data set to improve its performance on a particular task or domain. When an LLM is fine-tuned, it adjusts its internal parameters based on the new data, allowing it to specialize in the target domain or task.

Fine-tuning could improve performance if done correctly: “Fine-tuning LLMs in the finance domain can enhance domain-specific language understanding and contextual comprehension, resulting in improved performance in finance-related tasks and generating more accurate and tailored outputs,” according to a paper on LLM applications in finance by professors at Cornell University.

While fine-tuning requires smaller data volumes than building an LLM from scratch, it still requires extensive data science expertise and expensive compute power. So again, ask them to provide details on the development team, development process, and budget.

Enriching LLM Prompts

A manager may improve the performance of an existing LLM by enriching the prompts it uses to query the model. This process is known as retrieval-augmented generation (RAG). RAG uses prompt-engineering techniques to communicate effectively with an existing LLM. The augmented prompt allows LLMs to generate an accurate answer to user queries.

This is an attractive option, but “there is still a considerable journey ahead to effectively apply RAG to LLMs,” writes Gary Marcus, a leading expert on AI. In addition, a recent academic paper noted, “While building an initial RAG application is easy, making it robust is non-trivial, requiring domain knowledge expertise … and many design choices to optimize different components of the system.”

Regardless of a manager’s approach, allocators should ask the following questions about the model itself:

  • What is the model’s architecture?
  • How big is the model, and is it scalable?
  • How was it trained (e.g., zero-shot, few-shot, fine-tuning)?
  • How is the model served?
  • What is the training set?
  • How does its performance compare with LLM benchmarks? (Evaluating the performance of an LLM across various tasks, including natural language processing, general knowledge/common sense, problem-solving and advanced reasoning, and coding, and comparing it with its peers is critical to determine a model’s effectiveness, reliability, and production readiness. This article presents an accessible overview of benchmarking, while this source offers a technical explanation.)
  • Who was responsible for the final sign-off?

Managers might hesitate to share many details about their models and benchmarking, but such hesitancy is unfounded, and transparency builds trust. Managers should strive to be as transparent as Bloomberg when it published extensive information on its LLM. One caveat is that firms need to protect sensitive information and comply with data privacy regulations.

Risks and Vulnerabilities

Regardless of the manager’s choice, all LLMs have technical vulnerabilities and limitations that can taint the model’s output and the firm’s reputation. Allocators should query managers about how they deal with technical vulnerabilities, including hallucinations, data poisoning, sleeper agents, faulty self-correction, reliability, model collapse, and extractable memorization.

LLMs can also be used to generate harmful or offensive content. Allocators should be sure to ask the manager to demonstrate that it has proper controls to restrict unauthorized or malicious use.

Additionally, using an LLM could expose a manager to serious legal and reputational issues, such as charges of plagiarism, copyright infringement, misuse of intellectual property, data privacy and security violations, and breaches of regulatory compliance.

Allocators need to understand the steps the manager is aware of and has taken to monitor and manage these and other technical risks.

As the Late Ron Popeil Used to Say, “But Wait, There’s More”

There is a growing body of research, called AI ethics, that investigates the numerous risks AI poses to people and societies. The number of papers on this topic submitted to the Conference on Neural Information Processing Systems (NeurIPS), the world’s most prestigious AI and machine learning conference, doubled in 2022.

Broadly defined, “AI ethics” refers to principles and guidelines developed to help ensure AI is only used in a fair and responsible manner. To prevent potential harm, managers must consider the positive and negative impacts of their use of LLMs and prioritize and manage them accordingly.

While AI ethics is a new topic for many managers, there is a nascent movement among a small group of mainly large European institutional investors that intend to use their position as shareholders to persuade technology businesses to commit to ethical AI. (The Financial Times reports that Legal & General Investment Management is working on a stewardship code for AI.)

Common LLM ethical topics include:

  • Social biases and discrimination.
  • Labor issues, including wage stagnation, workforce disruption, and workforce diversity.
  • Human rights. (In a quest to make ChatGPT less toxic, OpenAI used outsourced Kenyan laborers earning less than $2 per hour to remove toxic content from data sets, Time reports. Laborers were repeatedly required to read and label traumatic text, leaving some mentally scarred, with some calling the work “torture.”)
  • Environmental issues such as LLMs’ CO2 emissions (BLOOM’s training consumed enough energy to power the average American home for 41 years) and water usage. (One study found “GPT-3 needs to ‘drink’ a 500ml bottle of water for every 10 to 50 responses, depending on when and where it is deployed.”)


Research on LLMs is evolving at a dizzying pace. Academics and practitioners across the globe are continuously developing solutions to the challenges outlined above while uncovering new issues. (This paper provides a nice summary of the research dedicated to specific LLM-related topics; see especially Table 2S.) Regulators and policymakers regularly modify their frameworks to account for these new insights.

For managers to effectively incorporate LLMs into their businesses and strategies, it is critical they have processes in place to capture and evaluate these developments and modify their models and business policies accordingly. At this early stage in the LLM life cycle, these systems can quickly become unstable. On February 20, ChatGPT “went berserk” and started spitting out gibberish. And while OpenAI “fixed” the problem, as TechTalks founder Ben Dickson correctly notes in a recent post, “we have yet to learn much about the risks of building applications on top of LLMs, especially closed-source proprietary models.”

Allocators, therefore, should be less concerned about how these models might disrupt the investment management industry and instead build a methodology to assess the processes and procedures implemented by managers to ensure the ongoing utility and reliability of their LLMs.

Angelo Calvello, Ph.D., is co-founder of Rosetta Analytics, an investment manager that uses deep reinforcement learning to build and manage investment strategies for institutional investors.

Opinion pieces represent the views of their authors and do not necessarily reflect the views of Institutional Investor.