Measuring outcomes is harder than you think

I have always said to my team: “The one thing we must deliver to clients is improved outcomes. If our technology can’t increase the effectiveness of client marketing then we don’t have a business. Either it works or it doesn’t.”

With this context, it’s easy to see why we put so much emphasis on benchmarking and objective analysis of the transaction data we generate. The challenge has always been separating the outcomes achieved by the AI-defined email content from the outcomes achieved by the client without the assistance of AI. Let me explain.

The algorithms that our team has developed are built around specific use cases. For example, one use case is convincing a one-time donor to become a monthly donor. Our models need a minimum of engagement with client email before they can take a supporter record and put it into a cluster of similar supporters. Separate algorithms need transaction data to find the patterns within each cluster before they can define the most effective content to write.

If a supporter does not have enough recent engagement data (e.g. email clicks and page transactions) there is no point trying to place the record into a cluster or predict responsiveness to defined email content. This is why we can’t use a general benchmark using the entirety of the client data set to measure outcomes. Therefore:

Point 1. We can only build a benchmark (pre-launch) using the same data qualification rules to select records that our models use. Otherwise, we would unfairly penalize client data by including unresponsive records in the benchmark.

Our models cluster, and re-cluster, supporters over time. The behaviour of a supporter in their first 60 days ‘on file’ is very different from the behaviour of a supporter that has been ‘on file’ for more than a year. So… our algorithms put records into a cluster and define content for 1-3 emails that get automatically sent in a short burst over a few days. There are typically 4 of these short bursts of AI-defined email over the course of 12 months. During the much longer periods in between the AI-defined email content, supporters continue to receive the client’s own marketing content. Therefore:

Point 2. Direct attribution of a conversion (AI-defined email versus client email) is most objectively done by looking only at conversions that came from the email (trackable email ID included in the transaction record). In other words, which email content prompted the engagement and the conversion.

Coming to the end now… This is why we have settled on two metrics to measure outcomes: a static benchmark of qualified records pre-launch, and an email conversion rate that compares conversions from AI-defined email content versus client content to the same supporter over 12 months. In our pilot clients thus far, the outcomes for both metrics, and for all clients, have demonstrated that the AI-defined content is producing substantially better engagement and conversions.


Can ChatGPT write non-profit email copy?

Let me immediately re-phrase the question: ‘Can ChatGPT, on its own, write email content that you would have written’? The answer to this question is unequivocally ‘no’.

Let me re-phrase the question again: Can any LLM (large language model) write email content that you would have written if the LLM also learned from your own email content library? The answer to this question is ‘yes’.

Chat GPT ignited a broad public appreciation (shock?) around the potential of large language models, and AI in the broadest sense, to transform basic economic models. Why? ChatGPT became an accessible tool that anyone could use to test the ability of AI to write academic essays, marketing copy, legal agreements, essays and poetry, and everything in between. 

ChatGPT produced some remarkable results, and everyone took notice. It is a tangible representation of the power of Machine Learning to disrupt everything we know.

Let me start again. Can ChatGPT write your non-profit’s email copy? What is the acid test to evaluate whether the copy generated by ChatGPT approximates the copy you would have written? Does your non-profit have a literal voice that is reflected in the way that you write?

LLMs learn from enormous amounts of data to formulate the content you ask them to produce. Any LLM will generate content based on the instructions you give it (prompts). But if you ask an LLM to write an email on a specific topic, with a certain number of words, with a specific purpose, does it really know how you would write that email to your supporters? No.

What if your instruction to the LLM included examples of emails, along with descriptive metadata, to help it produce content that reflects your organization’s voice? Game-changing. 

Our team at Accessible Intelligence is working to combine the understanding we record of each client’s writing voice, with the power of large language models to produce powerful AI-generated marketing content. 


Waiting on AI: The Astonishing Gains of our Second-Gen Models

Our second-generation models launched in June of 2022. Apart from a sense of satisfaction that we had done some pioneering work, I was a bit annoyed. We would have to wait 10 months for ‘real world’ outcomes. Why? Our new models deliver timed emails in short bursts over a year to the same supporter.

Was it worth the wait? Oh yeah. When evaluating the performance data two months ago, I was pleasantly shocked at what the technology had delivered.

The pilot clients represented a range of non-profits in size and program type. Mercy Home for Boys & Girls provides a home and support for young people, Amnesty Canada advocates for human rights around the world, and Rainforest Action Network works to protect a critical ecosystem for the planet. The number of supporters that received the AI-defined content ranged from 2,000 records for one client to over 40,000 records for the largest.

For all three pilot clients, the outcomes showed significant gains for the emails with content defined by our Machine Learning models (the comparator was email sent to the same supporters created by the non-profit without any AI involvement). AI improved the use case conversion rate between from 246% for one client, to a staggering 4,583% for the top performer. These models continue to run and the results keep improving as more supporters receive the AI-defined content.

My personal lesson learned: patience is a virtue.

Please visit our resources section to read each case study in detail.