Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

7.2.4. Identity and Other Pre-built Models

Pre-built models for ID documents, business cards, health insurance, and other specialized document types.

Building on the input-output framework from Section 1.3, Document Intelligence handles document input (forms, invoices, receipts) and produces structured data output.

💡 First Principle: OCR extracts text; Document Intelligence extracts meaning. An invoice has text ("$500"), but Document Intelligence knows that's the "Total Amount" field because it understands document structure—headers, tables, key-value pairs, checkboxes. This is why Document Intelligence uses specialized models (Invoice, Receipt, W-2): each document type has a different layout grammar. The exam tests whether you know to use Document Intelligence for structured extraction, not just OCR for text.

đź”§ Implementation Reference: Document Intelligence
ItemValue
Packageazure-ai-documentintelligence
ClassDocumentIntelligenceClient
Methodbegin_analyze_document()
HeaderOcp-Apim-Subscription-Key
EndpointPOST /documentintelligence/documentModels/{model}:analyze
Pre-built Models:
Model IDDocument Type
prebuilt-invoiceInvoices
prebuilt-receiptReceipts
prebuilt-idDocumentIDs, passports
prebuilt-layoutAny document (structure only)
prebuilt-readAny document (text only)

Use this decision tree to select the appropriate Document Intelligence model:

Testable Pattern:
from azure.ai.documentintelligence import DocumentIntelligenceClient

client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
poller = client.begin_analyze_document("prebuilt-invoice", analyze_request=file_stream, content_type="application/pdf")
result = poller.result()

for doc in result.documents:
    vendor = doc.fields.get("VendorName").value_string
    total = doc.fields.get("InvoiceTotal").value_currency.amount
Error Handling Pattern:
from azure.core.exceptions import HttpResponseError

try:
    poller = client.begin_analyze_document("prebuilt-invoice", analyze_request=file_stream)
    result = poller.result()
except HttpResponseError as e:
    if "InvalidPasswordProtectedDocument" in str(e):
        # Password-protected PDF - cannot process
        logging.error("Document is password protected")
    elif e.status_code == 400:
        # Invalid document format
        logging.error("Invalid document format")
CLI Equivalent (REST):
curl -X POST "https://{endpoint}/documentintelligence/documentModels/prebuilt-invoice:analyze?api-version=2024-02-29-preview" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/pdf" \
  --data-binary @invoice.pdf
Limitations:
FactorLimit
File size500 MB
Page count2,000 pages
Supported typesPDF, JPEG, PNG, TIFF, BMP

⚠️ Exam Trap: Password-protected PDFs cannot be processed—causes failures regardless of size.

Document Intelligence Documentation

Alvin Varughese
Written byAlvin Varughese
Founder•15 professional certifications