Kubernetes and Artificial Intelligence/Machine Learning (AI/ML) — Four Things to Understand Today

Lee Hylton / May 17, 2021

For many businesses, navigating the alphabet soup of tech buzzwords and new technologies can be daunting. In particular, deep learning, machine learning, and AI tend to be the three trickiest to pin down. All three solutions are relatively complex, driven by cutting-edge technology, and highly dependent on digital transformation and tech-forward business models. Despite machine learning and AI embedding into nearly every industry, both technologies are still extremely modern — especially in the context of business fit.

So, this leaves CEOs and CTOs with a few understandable questions. Is ML applicable to my business? What types of technology fuel ML? And how can I apply ML to my current workflow and workloads to find tangibility?

Here are 5 things every CEO and CTO should understand about machine learning and AI.

1. Machine Learning is Use-case Drenched

Many business owners put machine learning in a “future” box. It’s an amazing technology, but it seems purpose-built for future problems — not today’s pesky woes. Alternatively, some business owners assume machine learning is for digitally-native brands with tech-soaked backgrounds — not an All-American trucking company with traditional roots. Here’s the secret: machine learning has nearly unlimited use cases for businesses in every industry and of every size. There is, without a doubt, a tangible use case for machine learning in your current business. From process automation and personalization to behavioral analysis and customer support, machine learning is deploy-ready and instantly applicable.

For example, oil and gas companies can use machine learning to create safety platforms that identify worker stress levels, equipment health, and proximity to dangerous areas. Paper companies can use machine learning to build sophisticated call center workflows that streamline customer requests and boost satisfaction ratings. Even construction companies — who have a tendency to struggle with change management — are finding value in machine learning across a variety of use cases, including supply chain management, inventory control, and safety management.

You have a machine learning use case. And applying machine learning to these use cases can completely transform your business. For example:

Banks applying machine learning to build recommendation earnings have boosted sales by 10 percent, saved 20 percent in CAPEX, increased cash collections by 20 percent, and reduced churn by 20 percent.
Brick-and-mortar retailers see a 1 to 2 percent sales jump when using AI and ML to guide customer personalization.
Advanced manufacturers see 5 percent reductions in inventory costs and revenue increases of 2 to 3 percent when using ML and AI in forecasting.
Companies in producing industries (e.g., consumer goods, energy, healthcare, logistics, automotive, etc.) see conversion cost reductions of up to 20 percent when applying AI to core business processes.

No matter the industry, machine learning has value — today.

2. Machine Learning is Held Back by Infrastructure & GPUs

So, machine learning and AI are both incredibly powerful solutions capable of transforming nearly any business. But what’s holding them back in the real world? After all, 87 percent of ML projects never make it into production. What’s going on?

Deep learning algorithms are incredibly intelligent. They rapidly learn very complex workflows, and apply an inhuman amount of intelligence to a variety of use cases. But they’re held back by four specific issues:

Talent
Typical project frictions
Processing power
Infrastructure woes

Most companies don’t have the raw liquidity to single-handedly launch business-wide machine learning workloads. Even if they do, many projects get stuck in the ever-so-fragile SDLC. Many go over budget, over time, and get trapped in the bottomless pit of scalability. Most of these issues are solved by leveraging an outside firm. For example, we help companies launch incredible ML workloads on AWS. But our entire company is geared towards this specific vertical. We have container experts, tons of internal tech like IaC and IaC security, and a wealth of ML professionals on our payroll. Most companies don’t. So, successful ML launches are usually outsourced to a degree simply due to feasibility.

But what about those last two issues?

How do you increase the amount of processing power at your disposal without throwing money into the never-ending technology shredder?
How will you feasibly manage ML infrastructure when traditional methods require significant human capital?

To solve these two issues, we need to look towards Kubernetes and IaC — two cutting-edge solutions aimed at maximizing the effectiveness of machine learning workloads.

3. Cloud Computing Holds the Key to ML Computing Woes

The first challenge (i.e., increasing computing power) is quickly solved by AWS. Amazon EC2 P4d instances use custom-built silicone ML chips, and you can train ML at 60 percent lower cost and with 2.5x better performance than other leading platforms. You can scale and cluster these resources at will. In fact, you can use hyperscale clusters with +4,000 GPUs, Petabit-scale networking, and insanely low-latency storage. So, the issue of computing is entirely solved by cloud computing. You just need to find a company to help you integrate AWS into your core.

Here’s the great thing: you can use AWS for both storage and data mining. You can build out stable data lakes for dumping and leverage hyperscale clusters to compute that ocean of data using ML effectively. To learn more about using AWS for ML, get in touch with an AWS Partner.

4. ML & Infrastructure: IaC and Kubernetes Alleviate Sysadmin Headaches

Time to solve the infrastructure issue. To do this, we use two specific solutions: Kubernetes and IaC.

Kubernetes & ML

When we look at ML deployments, there are a ton of different platform and resource considerations to manage, and CI/CD (Continuous Integration & Continuous Delivery) teams are often managing all of these resources across a variety of different microservices (i.e., Docker containers)— while simultaneously dumping code into Git and deploying regularly. It’s a nightmare. On the infrastructure side, things aren’t much better. You have all of these different environments, systems, and stages (e.g., data ingestion, analysis, transformation, splitting, modeling, training, serving, logging, etc.) to contend with. So, you build out all of these containers, leverage deep learning solutions like TensorFlow, and create these amazing microservices that allow you to embrace the principles of CI/CD. Now you have to manage your avalanche of microservices to enable those machine learning workflows you’ve always dreamed about.

This is where Kubernetes comes into play. Not only does Kubernetes help you orchestrate microservices at scale (with must-have’s like load balancing, autoscaling, and ingress-management), but it provides superb failover protection to ensure that workloads run smoothly across all microservices. While this is immensely helpful for app development, it can also help you orchestrate complex and scale-ready machine learning workloads using tools like Kubeflow. In other words, Kubernetes abstracts some of the infrastructure layer, allows ML workloads to take advantage of containerized GPUs, and standardized your data source ingestion. If TensorFlow is the brains of your ML operation, Kubernetes is the heart. It keeps things pumping.

IaC & ML

One of the largest headaches associated with ML is infrastructure spins. Traditional infrastructure management involves a ton of manual touchpoints — like server deployment, network configuration, routing table configuration, and database installation. While that worked perfectly in the relatively slow tech ecosystem of the past, forcing sysadmins to spin up and down a bundle of microservices throughout the day in today’s hyper-speed DevSecOps landscape is simply unfeasible. Things get even more complicated when we add a bunch of Kubernetes stacks into the equation. Suddenly, you’re dealing with a complex ML infrastructure built on Kubernetes stacks, which may need to be paired with VMs, legacy systems, and other software or data feeds. In fact, infrastructure frictions consume 60 to 80 percent of data scientist’s time. Around 4 percent of their time is spent on actual testing.

So, how do you manage this type of infrastructure at scale without tearing your hair out? You use Infrastructure as Code (IaC). Instead of launching infrastructure using traditional, highly configurative workflows, IaC lets you automate and codify infrastructure spins. You can go in and configure the end state of the infrastructure, and the rest of the entire process is codified for you. Again, AWS tools like CloudFormation can help with this, and you may also use Chef or SaltStack.

The easiest way to think about IaC is to imagine it as code deployment. It makes infrastructure deployments rapid, standardized, and cost-efficient at scale.

Ready to Build Your Machine Learning Solution?

At Blue Sentry Cloud, we help companies build hyper-intelligent machine learning solutions that solve tangible business problems. We go beyond the grain. Our experts don’t just build machine learning programs — we solve real issues using world-class technology. Contact us to learn more.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.