By Joyce Shen in Artificial Intelligence — Oct 26, 2023

Evaluating Gen AI Applications - Part I

When I worked in mergers & acquisitions way back, one of the things to do was generating project code names to help preserve the confidentiality of these highly sensitive projects. I remember that we have gone through all kinds of names, drawing inspiration from mountains, park names, bird names, car names. Tonight, for fun, I gave a prompt to ChatGPT to help me find some names. I was curious.

"Generate fun but appropriate investment project code names around the theme of wine but not name of wine. Start with Project and give me one word names"

Here is what I received:

Project Grape
Project Vinify
Project Cellar
Project Terroir
Project Cork
Project Vino
Project Bubbly
Project Meritage
Project Terroir
Project Vinum
Project Magnum
Project Vinify
Project Decant
Project Cabernet
Project Terroir

What do you think? How would you evaluate the answers?

I asked ChatGPT to do a self-evaluation. 😁

"Evaluate using relevance"

Project Grape - Highly Relevant
Project Vinify - Highly Relevant
Project Cellar - Relevant
Project Terroir - Highly Relevant
Project Cork - Relevant
Project Vino - Highly Relevant
Project Bubbly - Relevant
Project Meritage - Relevant
Project Terroir - Highly Relevant
Project Vinum - Highly Relevant
Project Magnum - Relevant
Project Vinify - Highly Relevant
Project Decant - Relevant
Project Cabernet - Relevant
Project Terroir - Highly Relevant

Without prompting, it provides a justification for the evaluation output:

The relevance of the code names to the wine theme varies. Some, like "Project Grape," "Project Vinify," "Project Terroir," "Project Vino," and "Project Vinum," are highly relevant as they directly relate to wine-related terms or concepts. Others, such as "Project Cellar," "Project Cork," "Project Bubbly," and "Project Meritage," are relevant but may not be as specific to wine. "Project Magnum" and "Project Decant" are somewhat relevant due to their connection to wine-related activities, and "Project Cabernet" is relevant because it's a wine varietal. Overall, the list contains a mix of highly relevant and relevant names to the wine theme.

One of the big things to tackle in generative AI is evaluation. For a fun and creative exercise such as the above, we don't have a need for formal and rigorous evaluation metrics. I can pick what I like in the moment as long as the name is appropriate and does not overlap with another existing code name.

As organizations experiment and implement generative AI applications that are even more non-deterministic and varied (and less explainable in how a particular instance of " generation engine" works), we have an opportunity to elevate and innovate on how we consider methods of evaluating generative AI applications going forward. I am excited to see opportunities in this space.

Subscribe to Joyce J. Shen