Bigger is not better in machine learning

I heard an interview today featuring the Chief Scientist of Mosaic ML. One of the things Jonathan Frankle said is that bigger is not better. He, of course, is talking about machine model size (e.g., number of parameters).

I posit that we can agree that bigger models may not be better and that when it comes to model size, we generally have a good mental model and a set of metrics to understand this concept relating to the size of a machine learning model. But what about the "better" concept relating to a machine learning model? It's much harder to define, and we have to unpack this onion a lot more.

One way to define it is that the model results are driving superior results. Another way to define it is that the model is more explainable. Another way to define it is that the model is performant enough for the use case but the cost is much lower. The list goes on and on. You can visualize this as a step function or a "build."

In my view, bigger can be better for some use cases but there will be a significant diminishing return over time. Diminishing return on time, financial investment, performance gain, and interoperability. On the other hand, there is some minimum model complexity needed for most enterprise use cases. So finding that equilibrium or balance of size and performance for a specific use case is a key part of the "art of data science and AI."
Also in my view, there is a lot of work to be done and opportunities to unpack this "better" notion --- both for enterprises looking to establish and mature their AI strategy and program and for startups that are looking to build AI tools for enterprise customers.

Exciting times!

Subscribe to Joyce J. Shen

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.