Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

5.1.2. OCR and Text Extraction

The Read API extracts text from images, handling both printed and handwritten content. Unlike simple OCR, it understands document structure—paragraphs, lines, words—and returns bounding box coordinates for each element.

Read API capabilities:
  • Printed text extraction (documents, signs, screens)
  • Handwritten text recognition
  • Mixed content (printed + handwritten together)
  • Multi-language support (120+ languages)
Output structure:
  • PagesLinesWords hierarchy
  • Each element includes bounding polygon coordinates
  • Confidence scores for accuracy assessment

Exam tip: Read API is asynchronous—submit the image, get an operation ID, poll for results. Don't confuse with the synchronous OCR endpoint (legacy).

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications