About Foundry Map
Foundry Map answers one question Microsoft makes painful to answer: which Azure AI models are available in which regions, for which deployment type, at what price?
How it works
A Python pipeline calls Microsoft.CognitiveServices/locations/{location}/models
for every Azure region that hosts AI Foundry, merges with the Azure Retail Prices API, and commits
the normalised JSON to this repo. The static site you're reading is rebuilt on every commit.
Data freshness
Currently the pipeline is run manually. Daily automation via GitHub Actions + Azure OIDC is on the roadmap. The timestamp on each page shows when the data was last generated.
Model lifecycle
Each model in this catalogue carries a lifecycle badge sourced from Azure's ARM
Microsoft.CognitiveServices/locations/{location}/models response. Microsoft's API
enumerates four values; we add a fifth (Unknown) for partner models where the publisher
hasn't populated the field. Definitions below come from the
Azure AI Foundry model-retirements and
models-sold-directly-by-azure docs.
Generally Available
Production-ready. Covered by Azure SLAs and supported by Microsoft. Available
for at least 12 months from launch; Microsoft commits to at least 60 days' notice
before retirement. Safe default for new workloads. ARM enum: GenerallyAvailable.
Preview
Released for evaluation — Microsoft explicitly does not recommend Preview for
production. No GA SLA. Typical lifespan 90–120 days, with at least 30 days' notice
before Microsoft auto-upgrades existing deployments to a newer Preview or GA version.
ARM enum: Preview.
Retiring
A retirement date has been announced and the deprecation window is now open: existing
deployments continue to function, but new deployments cannot be created by
customers who haven't deployed the model before. Plan a migration before the retirement
date — after that point requests return errors. ARM enum: Deprecating.
Deprecated
Past the deprecation date and approaching or at retirement. No new deployments can be
created. Existing deployments may continue until the published retirement date, after
which Azure OpenAI returns error responses. Microsoft reserves the right to issue
emergency retirements for security or compliance, bypassing the standard notice
window. ARM enum: Deprecated.
Unknown
Not a Microsoft enum value. We use this when ARM returns no lifecycle_status
for a model — typically Foundry Models from partners and the community where the publisher
(not Microsoft) controls lifecycle, support and billing. Don't assume GA semantics or
Azure SLA coverage; check the publisher's terms.
Caveats Microsoft mentions: not every model passes through Retiring before
Deprecated — some are retired directly. The ARM REST schema enumerates the four values
above but ships them with no descriptions; semantic definitions live in the linked retirements
docs. The naming inconsistency between the API enum (Deprecating) and the docs
(Retiring / Deprecation) refers to the same phase.
Deployment types
Each model in this catalogue lists the deployment types (SKUs) it supports. Microsoft's 11 SKU codes collapse into a 3 × 3 matrix — three billing/latency modes (Standard, Provisioned Managed, Batch) crossed with three data-routing scopes (single region, Data Zone, Global) — plus two special cases. Definitions below come from the Foundry deployment types and PTU onboarding docs.
| Mode ↓ Scope → | Single region | Data Zone (US / EU) | Global |
|---|---|---|---|
| Standard (pay-per-token, sync) | Standard | DataZoneStandard | GlobalStandard |
| Provisioned Managed (reserved PTU, low-variance latency) | ProvisionedManaged | DataZoneProvisionedManaged | GlobalProvisionedManaged |
| Batch (async, 24h SLA, 50% discount) | Batch | DataZoneBatch | GlobalBatch |
Pick the routing scope by residency
- Single region — inference and data at rest stay in one Azure region. Use when a single-region commitment (e.g. switzerlandnorth only) is a hard compliance requirement. Tightest quota, most variable latency at burst.
- Data Zone — request can be processed in any Azure region inside the same Microsoft data zone (US data zone = any US region; EU data zone = any EU member nation). Higher quotas than single-region; data stays inside the boundary. The natural fit for GDPR-bound EU workloads.
- Global — request may be routed to any Azure region worldwide where the model is deployed. Highest quotas, broadest model availability, and gets new models first. Data at rest still stays in your resource's geography. Avoid only when residency policy forbids cross-geo processing.
Pick the mode by traffic shape
Standard — pay-per-token, real-time
The default sync inference mode. You pay per input/output token; throughput is governed
by quota (TPM/RPM). Best for low-to-medium volume and bursty workloads. At sustained high
RPS you'll see latency variability — that's the cue to look at Provisioned Managed.
Available as Standard, DataZoneStandard, GlobalStandard.
Provisioned Managed (PTU) — reserved capacity
You buy a fixed number of Provisioned Throughput Units, billed per PTU/hour or via Azure
Reservations (1-month / 1-year discounts). In return you get guaranteed throughput and
much lower latency variance. Minimums vary by model and scope (e.g. gpt-5: 50 PTU regional,
15 PTU global/data zone). Reservations for Global, Data Zone and Regional are
not interchangeable. Best for production workloads with predictable load.
Available as ProvisionedManaged, DataZoneProvisionedManaged,
GlobalProvisionedManaged.
Batch — async, half price
Submit a JSONL file of requests, get results back within 24 hours. Pricing is roughly
50% of Standard. Uses a separate enqueued-token quota so it doesn't
compete with your online traffic. No real-time SLA — Microsoft says jobs "might take
longer" than 24 h. Perfect for embeddings backfills, bulk classification, document
summarisation. Never use for anything interactive.
Available as Batch, DataZoneBatch, GlobalBatch.
Special tiers
Developer — fine-tune evaluation only
Pay-per-token tier designed exclusively for testing fine-tuned models cheaply. Significant constraints: no data residency guarantees, no SLA, and a fixed 24-hour lifetime after which the deployment is auto-deleted. Routing is global. Use it for hourly-cost-sensitive smoke tests of a custom fine-tune before promoting to a real deployment type. Never use for production, regulated data, or anything requiring uptime.
Provisioned (no suffix) — legacy
ARM still returns a bare Provisioned SKU on some older models. This is the
pre-August-2024 Commitment-payment-model Provisioned offering, semantically equivalent to
today's ProvisionedManaged (single-region, reserved capacity) but using the
deprecated commitment purchase model. Not available to new customers or for models
introduced after August 2024 — treat as legacy and migrate to ProvisionedManaged.
Caveats Microsoft mentions: deploying a model isn't the same as having capacity — for Provisioned tiers, quota and capacity are separate concepts; you may need to try multiple regions. Data Zone definitions: the US zone covers all Azure US regions; the EU zone covers EU member nations (Microsoft's broader Azure EU Data Boundary also includes the EFTA states — Iceland, Liechtenstein, Norway, Switzerland — but the Foundry deployment-types page itself only explicitly mentions EU member nations).
Known limitations
- Price coverage is best-effort (~13%). Retail Prices meter names use an ad-hoc encoding per model family. Expanding coverage is a roadmap item.
- Anthropic, Mistral, Cohere, and some other Marketplace-billed models don't appear in the Retail Prices API at all.
- Default quota (TPM/RPM) isn't surfaced yet — pending a stable source.
- Fine-tuning availability is out of scope for MVP.
Source
Code + data live at github.com/waynegoosen/foundry-map. Issues and PRs welcome.
Not affiliated with Microsoft. "Azure" is a trademark of Microsoft Corporation; used here descriptively.