Minimum Viable Data for Building AI Applications

Manoj Bapat
Dec 28, 2021
2 min read

Updated: Jan 15, 2022

Main idea: The democratization of Artificial Intelligence by large cloud vendors such as Google and IBM and and the early successes of AI-first startups and enterprises are driving tremendous interest in the adoption of AI applications across multiple industries. Availability of company and use case specific data is a key dependency for training AI models to generate accurate and relevant predictions and recommendations. While the availability of training data is a smaller concern for companies like Amazon with access to large user and interaction data, it becomes a major stumbling block for AI initiatives at companies that are early in user adoption or lack mature data practices.

Why it matters: Inadequate training data can result in sub-par early results from AI applications that disappoint both end users and business decision makers. User disappointment at the outset can impede the flywheel of user engagement that is essential for continuously improving AI model performance. Similarly, dismal AI product performance can dampen the business enthusiasm and investments that are needed through the AI product development lifecycle.

Sizing the data need: The answer is the cliched, yet, reasonable- "it depends". The dependencies range from the nature of the AI use case such as image classification or natural language processing (NLP) or regression. It also depends on the the user expectations around the accuracy of the predictions and the risks involved in making incorrect predictions, which will vary by domain- think diagnosing an incorrect disease for a patient vs recommending the wrong movie to a viewer.

Industry and research trends: While the impressive recent breakthroughs in large data trained models such as GPT-3 take up most of our attention, industry trends also point to an increasing realization for the need to focus on small data techniques such as few-shot learning and the generation of synthetic data to augment the scarce existing data. For example, a new e-commerce website that lacks Amazon or Etsy scale data to power recommendations for its users can explore the use of novel techniques to overcome the early training data scarcity problem.

What you should do today to overcome the "small data" challenge:

Assess the training data needs specific to your AI use case
Explore existing as well as innovative approaches for augmenting training data
Actively manage the user experience and business expectations throughout the AI product lifecycle

Home

Minimum Viable Data for Building AI Applications

Recent Posts

Comments

Subscribe Form