Background

Developing and maintaining artificial intelligence (AI) and machine learning (ML) systems takes time and costs money. From prototype to production, there are visible and hidden costs of AI systems or ML models. Understanding real costs (or true costs) of machine learning technologies allows executives and decision-makers to make better strategic decisions.

There are many costs in developing and deploying machine learning models. Some costs are more visible, such as research costs, development costs, and production costs. Other costs are less visible, such as opportunity costs.

The real costs of ML systems include:

  1. Research costs
  2. Development costs
  3. Production costs

    1. Infrastructure costs (cloud compute, data storage)
    2. Maintenance costs
    3. Integration costs (data pipeline development, API development, documentation)
    4. Data costs (data labeling)
  4. Opportunity costs

Research and development costs are low. In the research stage, ML practitioners (Researchers, Engineers, Developers, and Data Scientists) decide on which algorithms, datasets, models, or methods to use. In the development stage, machine learning practitioners build prototypes, label datasets, and launch proof of concepts that aim to validate the usefulness of ML models. In the research & development (R&D) stages, ML practitioners move fast. R&D projects are typically completed in days or weeks, depending on project complexity, the number of moving parts, and the availability of data.

Production costs are larger than the R&D costs. In production, ML practitioners shift from development to operations. Production costs include infrastructure costs (cloud compute and data storage), maintenance costs, integration costs (data pipeline development, API development, and documentation), and data costs (data labeling). In the production stage, ML practitioners slow down. Production projects are typically completed in weeks or months, depending on architectural choices, code complexity, and technology choices.

Opportunity costs of ML systems are larger than production costs. The opportunity costs of machine learning technologies are the loss of potential gain from the pursuit of other ML models. Opportunity costs for organizations and companies can be answered with the question: "Are we pursuing the right machine learning model?"

Problems

High Maintenance Costs of Machine Learning Systems

In the R&D stage, ML practitioners focus on development tasks. Shifting from prototype to production, ML practitioners accumulate technical debt.

The top three causes of technical debt are:

  1. Bad architectural choices
  2. Overfly complex code
  3. Obsolete technology

Technical debt increases the maintenance costs of ML systems.

When shifting from prototype to production, development tasks are combined with operations tasks. In production, ML practitioners spend more time on operations than development. Developers spend 42.1% of their workweek on maintenance, according to Stripe.

Developers spend over 17 hours every week dealing with maintenance issues like debugging and refactoring, and about a quarter of that time is spent fixing bad code. That’s nearly $300B in lost productivity every year. It’s not how many software engineers a company has; it’s how their talent is being utilized.

The more time is spent on maintaining or managing machine learning models in production, the less time is spent on developing new models that can impact business revenue and growth.

High Infrastructure Costs of Machine Learning Systems

In production, machine learning practitioners write more non-ML code than ML code, according to Google.

It may be surprising to the academic community to know that only a tiny fraction of the code in many machine learning systems is actually doing “machine learning”. When we recognize that a mature system might end up being (at most) 5% machine learning code and (at least) 95% glue code, reimplementation rather than reuse of a clumsy API looks like a much better strategy.

Machine learning systems have a reputation for having enormous computing costs due to over-provisioning. Paying $20,000 per month on Amazon Elastic Compute Cloud (Amazon EC2) servers which run ML models in production is not uncommon. Infrastructure costs (cloud compute costs and data storage costs) quickly add up when developers rely on traditional server infrastructure (Amazon EC2).

Solution

Serverless Machine Learning

Organizations and companies can reduce the maintenance costs of AI systems by using serverless machine learning.

Serverless infrastructure is a cloud computing paradigm that shifts the burden of infrastructure or scaling from developers to cloud vendors. With serverless computing, developers spend their time writing code and their favorite cloud vendor (Google Cloud Platform, Amazon Web Services, Microsoft Azure, IBM) does the rest. Serverless computing relies on the cloud infrastructure, not the users, to automatically address the challenges of resource provisioning and management.

As ML practitioners adopt serverless, they lower technical debt (opinionated architecture and simplified code) and simplify ML workflows (managed infrastructure, automatic scaling, and availability/fault-tolerance built-in). As companies adopt serverless, they reduce the maintenance of ML systems, give teams more time to pursue new ideas, and reduce production costs (infrastructure costs and maintenance costs).

With serverless machine learning, companies can run machine learning models at scale - without servers, devops, or costly infrastructure.

What are the cost savings of using serverless machine learning?

Let's calculate the salary costs of ML practitioners.

Hiring machine learning practitioners, companies pay between:

  • $40 and $75 per Machine Learning Developer per hour.
  • $40 and $80 per Artificial Intelligence Developer per hour.
  • $45 and $80 per Data Scientist per hour.
  • $80 and $180 per DevOps per hour.

Let’s say the average ML practitioner works 40 hours per week and 52 weeks per year.

By reducing maintenance costs, organizations free up 42.1% of the developers' workweek.

Using serverless machine learning, companies save between:

  • $35,000 and $66,000 per Machine Learning Developer per year.
  • $35,000 and $70,000 per Artificial Intelligence Developer per year.
  • $39,000 and $70,000 per Data Scientist per year.
  • $70,000 and $158,000 per DevOps per year.

Interested to use serverless machine learning?

Tell us about your project and a member of our team will get back to you. Get started!