How we implemented GPT4 in our product

During our daily Zoom meetings (I’m in Italy, he’s in the States), he started mentioning AI more and more every day: “ChatGPT this”, “Sam Altman that”, “GPT4 there”, “Check out this startup here”. He was consumed by GPTfever. Together we watched as new “AI startups” propagated like fungi after a hard rainfall in an open field. And if I’m being honest, most of these “startups” looked more like half-assed OpenAI API wrappers IMO…

At the beginning, I thought that he was just getting caught up in the Twitter hype-cycle. But he was persistent… he wanted to find an actual valuable use case for using AI in our niche: Financial Planning and Analysis (FP&A) for SMBs.

This is where I’m a bit ashamed of myself, I should have been happy about his intention of adding AI to our product, I would have had the perfect excuse to learn a new technology, play with AI for a bit and chase the current shiny object, but instead I fought back.

There was definitely some pride at play here… cough maybe because I suggested adding AI 12 months ago but got rejected cough cough, but my main criticism was that I couldn’t justify a viable business case. I didn’t want to pivot our startup into one of those that chases the current trend (looking at you NFT-everything startups).

I told him he was getting caught in the hype and that GPT and LLMs weren’t a good fit for our problem space. In all of the demos, threads and videos that he sent me, no-one was adding AI to FP&A… and if no one is attempting something, there must be a good reason for it, right? right??

‍

Me: “Give me ONE good use case for AI, what would you use it for?”‍

John: “[…] I’m sick and tired of answering the same questions. I’m sick of holding their [customers] hand as they attempt to do what-if-analysis by themselves…I feel like I’m their fractional CFO helping them to interpret their financials…but I’m not their CFO, I’m not scalable…what if we could replace me with an AI version of me? Don’t tell me that shit wouldn’t be epic…”

‍Me: “That would be SICK…”

‍

Now if there’s one thing you need to know about me, it’s that I only have one operating gear, fifth gear…for all you American’s that don’t know how to drive a manual, it means that I like to go fast, Ferrari fast 🤌. Especially when I have a chip on my shoulder…

So, I dove head first into the AI deep end with the goal of creating the first AI CFO that’s available 24/7. After countless of YouTube videos, academic papers, blog articles and tutorials, I managed to pull together the first MVP for the new feature.

I remember thinking, “Wait, this thing is crazy. Could this actually replace the jobs of fractional CFOs and financial analysts around the globe…shit…I think it can”. Bold claim? 100%. Will I regret writing this? Maybe. Does it sound cool? Damn right it does…

The big problem with sailing unchartered waters

Although I was able to build an MVP relatively quickly, it doesn’t mean it was easy…and it doesn’t mean I’m done (more on this later).

Have you ever wondered why people refer to new tech as the “Bleeding Edge”? Because it’s painful… the Bleeding Edge means using technology that is new, unproven, and potentially unreliable, no standards, no best-practices, no rules. Pretty stressful, but also incredibly rewarding when done right.

Yes, the OpenAI documentation is great, but there aren’t many examples of people that have actually implemented it for a real product. No one’s making videos about the problems that they faced, GitHub repos as examples or even Stack Overflow threads.

There were three main problems that I had to solve to make this work:

Passing tabular financial data (numbers) to an LLM (GPT) that’s used to ingest just text
Dealing with dynamic data (the data of the company changes at the very least once a day)
Cost management (tokens ain’t cheap)

‍

Problem #1: How to pass context to GPT

Our customers are all unique…and I don’t mean it in the “yay, everyone’s beautiful and unique in their own special way” unique, I mean that their businesses are truly unique…we deal with SaaS, eCommerce, Chemical Manufactures, Agencies, Marketplaces and everything in between. You name it, we’ve probably built a financial model for it. Every business has different assumptions, systems of records, and KPIs; and we need to pass ALL of that data as context to the OpenAI API.

What do I mean by context? Put it simply, the context is what the AI should base it’s answers upon, for example, if I ask it to plot my revenue growth, then it needs to know the revenue for each month, it needs to have the context, to be able to respond to me with the correct info.

Unfortunately, it’s not as easy as copy/pasting spreadsheets as context. There’s a hard limit on the number of tokens that you can pass (think of a token as around 4 characters), so we had to use embeddings…and man o man, there is a LOT of confusion and mysticism around embeddings and vector storage (which I want to discuss in a future post).

“Well Filippo, can you at least ELI5 embeddings for us?”

Think of embeddings as a numeric representation of a piece of text, to help computers (which are good with numbers) understand the meaning of words in a particular context.

Pieces of text with similar content or meaning will have similar number representations. Of course it’s not as easy as assigning a single number which is why OpenAI uses a list of 1536 numbers (don’t worry they generate those numbers, you don’t have to do anything).

Make sense? No? Great, let’s move on.

The main problem with embeddings was that each and every example or use case at the disposal of the internet shows embeddings being used on paragraphs of TEXT data, but we don’t have text…we have numbers…how the heck do you embed those?

I can’t give away alllll the secret sauce because (to my knowledge) we’re one of the first company’s to crack the code…but we managed to find a way. If you subscribe to our YouTube channel or share this post I may be persuaded to give you some hints!

‍

Problem #2: How to use embeddings with dynamic data

Once again…bleeding edge here…ALL of the examples on the internet for using embeddings are for static data: PDFs, web pages, Wikipedias, blog posts. But for us, our data is real-time and it changes daily…with any business, things like customer counts, site visitors, churned subscribers, new revenue, new expenses, new orders…it’s all evolving every day and our AI CFO needs this data as new context…but we can’t recompute the embeddings (for the non-technical, you can think of this as re-training or feeding an updated set of context to the model) each and every day…The process takes too long and the user experience becomes trash…imagine being scared of asking questions about your business to your CFO in fear that he/she doesn’t have the latest data, not a good experience.

Well, I managed to come up with a creative way to avoid having to recompute embeddings…and I’m serious, if you subscribe and share this article, I’ll give you some tips on how we achieved this.

‍

Problem #3: How to not break bank

As a long time bootstrapper, cutting costs is of the upmost importance to me… Embeddings are not that expensive, but GPT-4 and vector databases are. Since we are trying to make a viable business here, not some “money burning-VC-funded-hype-train-vaporware-raise then sell” startup, solid profitable unit-economics are crucial.

This means being smart, calculating embeddings the least amount of time possible, storing them on our own database (performance is not an issue here since we don’t have tens of thousands) and passing only the bare minimum context to GPT.

‍

The result

An image is worth a thousand words:

Screenshot from my first interaction with the CFO Chat, powered by GPT4

‍

I understand that this looks staged, but I can guarantee you that it’s not. The first time we seriously used it we were on a call and we asked a question about bank balance, and then a followup question and the result just blew us away.

And this is not even the craziest interaction that we’ve had with our new virtual CFO. For instance, you can have subscribers and lost subscribers, and ask it to calculate your churn rate, or ask it to calculate your ARPA, or ROI, or even the Pearson Correlation Coefficient and it just spits out the answer, like a beast.

Unfortunately we had some issues while we were using GPT-3.5-turbo (the same used by ChatGPT).

In short, GPT-3.5-turbo sucked at dates, here’s an example conversation:

Me: what is my revenue?
CFOChat: as of April 2023, your revenue is X
Me: what was my revenue in February 2023?
CFOChat: sorry, I only have data up to April 2023, so I can’t answer your question
ME: April is after February, so you have the data…
CFOChat: I’m sorry, you are right
ME: so…what was my revenue in Feb?
CFOChat: sorry, I don’t have the data for February 2023, since the data only goes up to April 2023
ME: f*** you

As you can see, we are not really in danger of AI taking over the world. Luckily GPT-4 fixed this problem all together (who would have thought that spending more money would fix more problems?).

I was genuinely surprised

Even though I’m the “new-tech guy”, I was blown away by the responses GPT-4 was able to deliver when we managed to pass context correctly. I thought that the answers would be incorrect, the math would be off, or the context completely missing… I mean, we are talking about finance and operations for companies, how can an AI understand that so well? I’ve always seen generative-AI used to generate “new” things, like blog posts, essays, marketing material, images, but never used to understand existing data, extract insights, and create forecasts.

I was wrong.

It performed better than anything I could have imagined. The quality, the insight, the math, everything was spot on. For instance, we once asked CFOChat to calculate the LTV, a metric that wasn’t in the provided data. It understood that in order to calculate it, it first needed churn rate and MRR, which also weren’t given in the data! So first, it calculated churn rate (with subscribers and lost subscribers), then calculated MRR (with subscribers and ARPA) and finally it calculated the LTV, all of this with perfect math and explaining the need for churn rate and MRR. Those were 2 hidden requests in the question that it managed to figure out and complete on its own. Honestly, pretty crazy.

As crazy, or even more so, was when we asked to calculate runway without giving the burn rate. It figured it out on its own, but then also asked to include a new hire with a salary of 5k monthly: it recalculated the burn and gave us the new runway, but it doesn’t end here! We then asked if we should make the new hire, now this is interesting… this is no longer just a question on existing data, but one that requires judgement, but our new CFO did not disappoint: in short it said that by the numbers, we can afford it, since the runway would still be several months long, but that the ultimate choice should be made by the CEO after considering the business-value the new hire would add, and that the CFO shouldn’t be the ultimate decision maker. Perfect response!

Overdelivering

As said in the beginning, I couldn’t stop here, so I had to do more. Right now CFO Chat was amazing, but it lacked something: the communication was only one way, he was using data from our core product, but had no way to communicate back and interact with our platform. So this is exactly what I did.

CFOChat creating a charts based on company data

‍

Now you can ask the CFOChat to calculate new metrics, surface existing ones, or discover insights and THEN ask it to put it in a chart, or maybe you want to save the formula for a new variable?

CFOChat creating a new variable based on company data

‍

Oh, you prefer a table? Sure!

CFOChat creating a table based on company data

‍

With two-way communication it really feels like the CFO is an integral part of the product and not just a feature slapped on.

The real secret sauce

This is of course amazing, I mean, it’s pretty undeniable, but the real power is not in the AI, GPT-4, embeddings, or in the vector storage. Don’t get me wrong all of those are incredible and make the whole process feel like magic, but the real power, the unsung hero, is the underlying system that provides up-to-date categorized data as context for CFOChat to do it’s magic. You can theoretically replicate the CFOChat with a PDF or CSV, but without the data being always up-to-date and automatically categorized and passed correctly to CFOChat, then it’s just a pretty proof of concept, not a tool that helps founders and CEOs run their business… and that could potentially replace fractional CFOs and financial analysts.

The main benefits over the traditional workflow of working with consultants or fractional CFOs:

Speed: instead of sending a mail to a fractional CFO, waiting for him to read it, then waiting for him to respond, you get the answer to your question in second
Completeness: both can make mistakes, but the AI always has up-to-date data while the fractional CFO might not be working with the latest numbers and might not even know it
Cost: fractional CFOs are crazy expensive, which makes them unavailable for most companies, while we can offer a product for a fraction of the cost. Instead of hundreds per hours, it’s hundreds per month.

Average price of a fractional CFO as of April 2023

‍

So, to answer the question that you had from the very beginning: “is this just another AI startup?” No. We are not. Our platform works even without AI, but we use AI to make it easier to use and feel like magic.

What did I learn from all this?

I guess I need to write something to wrap this up and I think that the lesson learned was to listen to the stupid-crazy-unreasonable requests of your non-technical co-founder ahahaha. Sometimes…still gotta keep their ego’s in check!

Jokes aside, sometimes technical people (like me) are bound by our perception of what we think tech can do. We go as far as we know we can go and even though I was consciously aware of this tendency, I still fell into the trap and limited what I thought was possible. It took an “tech-ignorant outsider” (sorry John if you’re reading this) to push me to try and find new ways to apply the cool tech that is generative AI, to a new field like FP&A, and to finally create something even better than what I was imagining in the beginning.

He wanted an AI to respond to user messages, I created an AI that knows more about the company finance and data than the CEO of the company.

To me that’s a win, or maybe this is just the classical problem of managers asking for A and developers creating D, ‘cause B and C were taken and A was stupid.