✨
AI Summary
- DeepSeek-OCR is end-to-end Vision-Language Model specifically for OCR tasks using DeepEncoder architecture that minimizes vision tokens via serial connection of local (SAM) and global (CLIP) attention components
- Achieves near-lossless OCR performance at approximately 10x compression ratio through 16× convolutional compressor, enabling efficient ultra-long context processing
- Supports multi-resolution modes (Tiny to Gundam) with comprehensive data engine covering OCR 1.0, OCR 2.0 (charts, geometry), and general vision data