article

Regulated Compute: Chasing H100s for a Real-Time Avatar

Frustrated engineer staring at a laptop

The reality of waiting for GPU quotas

In less than two weeks, I built a real-time AI Conversational Speech Avatar. The project is the prototype for the business that my TI:GER group and I built. I deviated a bit from using the Speech Avatar by Azure and decided to “roll my own”. The model that produces the real-time Avatar requires five H100 GPUs, enter the big cloud providers.

I have been trying to get capacity on AWS for the past five days. The perception (real/imagined) is that H100 capacity is in so high demand that capacity is limited and “hard” to come by. Ok, and unlike other instances p5.48xlarge instances are not readily available to the independent entrepreneur. That's right! If you went to AWS and created an account today, you would not be able to spin up that instance class without begin granted permission first.

I have spent the last five days making support case after support case, appeal after appeal, email after email, phone call after phone call, to get the proper quota to be able to create an on-demand p5.48xlarge EC2 instance and still do not have the quota for each US availability zone. After three days they did finally approve me for us-east-1and us-west-2. Now for the lay person this means that they have not approved me to use their computers across all of their US regions.

I got one message the other day that said... well, you read it.

Screenshot of a cloud provider quota response

One of several quota replies

Like I need your help. There was even one instance where they approved me for a quota of 8 but the required quota for the instance that I need is 192. It makes no sense at all. That was one that I had to appeal. I started to include the following in the original quota request. We'll see what that gets me.

Screenshot of a quota request explanation

The statement I began adding to requests

Regulated Compute

I am a credentialed user, and my account age will show that I have years using the platform. I can handle whatever I take on. But hey, this isn't just about me or AWS. I experienced this type of gate keeping from all of the major cloud providers and some of the smaller ones. AWS was the most frustrating because I've used their services for almost a decade. For whatever reason it is done. The technology is not as democratized as one AWS executive preaches. And to get what I need, I got to continue learning (silver lining).

Compute is regulated. I had to get it from a smaller provider, and their process was more streamlined. I did not have to jump through so many hoops. I have the money to pay for what I use, and I use what I need. AWS, GCP, and Azure will not miss the few hundred dollars that I spend developing my application but for the engineers out there for some things we have to find better options and they are out there.

Final Takeaway

It helps to include in the request the use case like I included above, but that is still no guarantee that they will allow you to use their computers to do your work.

You may recognize the above image. It's the idle video like real-time speech avatars use during inference pauses or simple idle time. That one was created using my real-time avatar model from my profile picture. My children love it. My five year old refers to it as “daddy's talking picture.”

Another gotcha when first attempting to use a large instance at AWS is that if terraforming the instance, you may get an error that is extremely misleading. Take a look at the following error message:

Terraform apply error response for a large instance

Terraform error after requesting a large instance

Just reading that you would assume that well, I can just come back another time and try again. But the truth could be that you may not have enough quota to make the instance. And it took AWS an hour to give that message. Talk about a long feedback loop. After trying this in all of the availability zones, I contacted AWS and was told that I indeed did not have sufficient quota to make the instance. But I didn't know that. That was four hours wasted. I was super unhappy.

I now have a source of compute, I will keep you updated on the real-time inference. There are still some things to work out in the cloud regarding distributed inference. That's what I'm currently working through. It cost me about $200 yesterday for a few hours. It ain't cheap, so give me some time.

Regulated Compute
H100
AWS
GCP
Azure
GPU
Cloud Infrastructure
Startups
MLOps
Quotas