· Sachin Subramanian, CEO

FedRAMP AI Models: How to Access GPT-5, Claude Opus 4.6, and Gemini 3.1 Pro Without Leaving a FedRAMP Boundary

A guide to accessing frontier AI models (GPT-5, Claude Opus 4.6, Gemini 3.1 Pro) within FedRAMP authorization boundaries for CMMC and ITAR workloads.

Every AI quickstart guide starts the same way: sign up for an API key, set it as an environment variable, start making requests.

That workflow is completely incompatible with handling Controlled Unclassified Information.

The commercial API endpoints that power most AI applications are not FedRAMP authorized. If your platform processes CUI and sends prompts to these endpoints, you have a compliance gap regardless of how good your encryption is in transit. The data is leaving the FedRAMP authorization boundary.

Every major frontier model is available through FedRAMP-authorized infrastructure. Using them that way is a meaningful engineering project, not a configuration change. And depending on whether your compliance requirement is CMMC or ITAR, the models available to you are very different.

CMMC vs. ITAR: Two Different Worlds of Model Access

This is the distinction most people miss when they hear “FedRAMP AI.”

CMMC Level 2 requires that cloud service providers handling CUI meet FedRAMP Moderate equivalent security standards. This is achievable on commercial cloud infrastructure: Azure commercial, GCP with Assured Workloads, standard AWS regions. Because it’s commercial cloud, you get access to the latest models as soon as they’re deployed. Today, that means GPT-5 and GPT-5.2 on Azure AI Foundry, Claude Opus 4.6 on Azure and Amazon Bedrock, and Gemini 3.1 Pro on Vertex AI. No compromises on model quality, no waiting for government region rollouts.

ITAR is a different story. ITAR-controlled technical data requires that all personnel with access are U.S. persons, and that data is stored and processed exclusively within U.S. jurisdiction with additional access restrictions. In practice, this means Azure Government, AWS GovCloud, or GCP Assured Workloads with ITAR controls. These environments are more restrictive by design, and model availability lags behind commercial cloud by a generation or more.

As of early 2026, Azure Government’s latest available models are GPT-4.1 and o3-mini. AWS GovCloud Bedrock offers Claude Sonnet 4.5 and Llama 3. Google Cloud doesn’t offer Vertex AI generative models under ITAR-scoped Assured Workloads at all. The frontier models (GPT-5, Opus 4.6, Gemini 3.1 Pro) are not yet available in these isolated government environments.

CMMC (FedRAMP Moderate)ITAR (GovCloud / Gov Regions)
AzureGPT-5, GPT-5.2, Claude Opus 4.6, Gemini 3.1 Pro via AI FoundryGPT-4.1, o3-mini via Azure Government
AWSClaude Opus 4.6, Sonnet 4.6, Llama, Nova via BedrockClaude Sonnet 4.5, Llama 3 via GovCloud Bedrock
GCPGemini 3.1 Pro, Claude Opus 4.6 via Vertex AI + Assured WorkloadsNo generative AI models under ITAR controls
Frontier models?All available1-2 generations behind, varies by provider
Self-hostedFull flexibility within your enclaveFull flexibility within your enclave

This gap matters. For CMMC contractors, there’s no reason to accept worse AI capabilities in the name of compliance. For ITAR contractors, the tradeoff is real, and your platform architecture needs to account for it.

The Three Paths

Azure AI Foundry

Microsoft’s Azure AI Foundry provides access to OpenAI models (GPT-5, GPT-5.2, GPT-4.1) and Anthropic models (Claude Opus 4.6, Claude Sonnet 4.6) within Azure’s FedRAMP authorization boundary. CMMC workloads on commercial Azure get the full model catalog. ITAR workloads on Azure Government get a more limited but expanding set.

Setting this up requires more than changing a base URL. You need to provision model deployments, configure authentication through workload identity federation (not static keys), and handle the differences in API surface area between Azure’s deployment model and the commercial APIs your code was probably written against.

Google Vertex AI

Google Cloud’s Vertex AI provides access to Gemini models (including Gemini 3.1 Pro) and Anthropic Claude models within GCP’s FedRAMP authorization boundary. Google uses Assured Workloads to enforce data residency and compliance controls. You select a control package (FedRAMP Moderate, FedRAMP High, or ITAR) and Assured Workloads restricts your project to compliant services and regions.

For CMMC workloads, Assured Workloads with FedRAMP Moderate or High controls gives you access to frontier Gemini and Claude models. For ITAR workloads, Vertex AI generative models are not available under Google Cloud’s ITAR control package, making GCP a CMMC play for AI inference, not an ITAR play.

Authentication to Vertex AI works through Google’s workload identity federation. Your compute environment presents a token to Google’s STS service, which exchanges it for a short-lived GCP credential. No static service account keys required.

AWS Bedrock

Amazon Bedrock provides access to Anthropic Claude models (including Opus 4.6 and Sonnet 4.6 in commercial regions), Meta Llama, and Amazon’s own Nova models within AWS’s FedRAMP boundary. If your infrastructure already runs on AWS, Bedrock has the lowest integration friction because authentication uses the same IAM role mechanisms as every other AWS service.

For ITAR workloads, AWS GovCloud Bedrock keeps all data within the GovCloud partition. Model availability in GovCloud is more limited: Claude Sonnet 4.5 and Llama 3 are available today, with the model catalog expanding over time.

Why Multi-Cloud Architecture Matters

If your platform is locked to a single cloud provider, your model access is constrained by that provider’s government region rollout timeline. A new frontier model might be available on Azure commercial within days but take months to reach AWS GovCloud, or vice versa.

A multi-cloud architecture lets you route inference to whichever provider has the best model available for a given compliance tier. CMMC customer needs Opus 4.6? Route through Azure AI Foundry. ITAR customer needs the best available Claude model in GovCloud? Route through AWS Bedrock. New Gemini model drops on Vertex AI first? You can adopt it immediately for customers whose compliance posture allows it.

This is the architecture we built at Sweetspot. We maintain provider integrations across Azure, GCP, and AWS, and we can adopt whatever security posture a customer’s compliance requirements demand, from FedRAMP Moderate on commercial cloud with the full frontier model catalog to GovCloud deployments for ITAR workloads.

The result for CMMC contractors: zero sacrifice in model quality. You get GPT-5, Claude Opus 4.6, and Gemini 3.1 Pro inside a FedRAMP-authorized boundary.

Authentication Without Static Keys

This is where most implementations go sideways. The path of least resistance for any of these providers is to generate an API key, store it somewhere, and call it done. That works, but it creates problems that compound over time.

  • Every static credential needs a rotation policy, a rotation mechanism, and a rollback plan. Multiply by the number of services that need model access, and rotation becomes a full-time job.
  • A leaked API key grants access to every model in the deployment until it’s rotated, with no per-workload scoping.
  • API key usage tells you which key was used, not which workload used it. Attributing model calls to specific services requires additional correlation.

The alternative is workload identity federation. Each service instance authenticates using a short-lived token issued by the orchestration platform. This token is exchanged with the cloud provider for a scoped credential that expires automatically, typically within an hour.

No API keys in environment variables. No secrets to rotate. No credentials stored on disk. If a workload is compromised, the attacker gets a token that expires in minutes to hours, scoped to a single service’s permissions.

Setting this up across all three providers, each with their own federation protocol, token exchange mechanism, and IAM model, is where the real engineering investment lives. It’s plumbing. But it’s the difference between “we use FedRAMP models” and “we use FedRAMP models with zero static credentials.”

Self-Hosted Models

Some organizations want to self-host models for maximum control. For certain workloads, it’s the right call. Self-hosted models running in your own secure enclave give you complete control over data residency, eliminate third-party inference dependencies, and let you run specialized models that aren’t available through managed services.

We self-host embedding models in our secure infrastructure for this reason. Embeddings are generated constantly as documents are ingested and indexed, and running them locally means raw document content never leaves our environment for this high-volume workload.

For large language models, self-hosting is a heavier lift. You’re responsible for the full inference stack: GPU node security, model serving framework patching, artifact integrity, scaling, and redundancy. For most govcon use cases, the managed AI services from Azure, GCP, and AWS offer a better tradeoff for LLM inference. The cloud provider carries the FedRAMP compliance burden for the inference infrastructure, and you focus on securing the data flows in and out.

The ideal architecture supports both: managed cloud inference for frontier LLMs where compliance allows it, and self-hosted models for workloads where full control is required. That flexibility lets you meet each customer’s compliance requirements without forcing a one-size-fits-all tradeoff between model quality and security posture.

The Multiplier Effect

Supporting a single cloud AI provider is straightforward. Supporting all three, which you need to if you want the best model available for every compliance tier, multiplies the identity federation work, the API integration work, and the ongoing maintenance.

Each provider has different authentication mechanisms, API schemas, model versioning, rate limiting, and failure modes. You can do this yourself. We did. It took months of identity federation work across multiple cloud providers, and it requires ongoing maintenance as providers evolve their APIs and security models.


Sweetspot routes 100% of AI inference through FedRAMP-authorized providers. No commercial API endpoints. No static API keys anywhere in the system. Every model call is authenticated with short-lived federated credentials, scoped to the minimum required permissions, and fully auditable to the originating workload.

For CMMC contractors, this means frontier models (GPT-5, Claude Opus 4.6, Gemini 3.1 Pro) with zero compliance compromises. For customers with stricter requirements, our multi-cloud architecture adapts to whatever security posture is needed, ensuring the best available models at every compliance tier.

We built this so defense contractors don’t have to. Your team should be focused on winning contracts, not debugging workload identity federation across three cloud providers.

Frequently Asked Questions

GPT-5 is available through Azure AI Foundry, which operates within Azure's FedRAMP authorization boundary. For CMMC workloads on commercial Azure, GPT-5 is available today. For ITAR workloads requiring Azure Government, GPT-5 is not yet available. GPT-4.1 is the latest model in government regions as of early 2026.

Claude Opus 4.6 is available through Azure AI Foundry, Amazon Bedrock, and Google Vertex AI, all within FedRAMP authorization boundaries. For CMMC contractors, Opus 4.6 is fully accessible. For ITAR workloads in AWS GovCloud, Claude Sonnet 4.5 is currently the latest available Claude model.

No. CMMC Level 2 requires cloud service providers to meet FedRAMP Moderate equivalent standards, which commercial cloud regions (Azure commercial, GCP with Assured Workloads, standard AWS regions) satisfy. GovCloud and Azure Government are required for ITAR workloads, which have stricter data residency and personnel access requirements beyond what CMMC mandates.

Not for generative AI, as of early 2026. While Google Cloud's Assured Workloads supports ITAR control packages with data residency and CMEK requirements, Vertex AI generative models are not available under the ITAR control package. For ITAR AI inference, Azure Government and AWS GovCloud are currently the only options.

Assured Workloads is Google Cloud's compliance framework that enforces data residency, personnel access controls, and encryption requirements on top of standard GCP services. You select a control package (FedRAMP Moderate, FedRAMP High, or ITAR) and Assured Workloads restricts your project to services and regions that meet those requirements.

Ready to get started?

Join hundreds of government contractors winning more contracts with Sweetspot.