Skip to content

Model Configuration - Vision

Overview

The image understanding feature empowers Fuwang with "visual capabilities," allowing it to comprehend the images you share.

Function Example

Usage

Simple three-step operation:

  1. Click the "+" button in the chat interface to upload an image.
  2. Enter relevant questions or instructions.
  3. Obtain model analysis results.

You can also directly drag and drop images into the input box, or use the paste function to quickly add images.

Configuration

Model setting methods:

  1. Local Model: Settings → Model → Select the target model → Enable "Vision" in the capability options.
  2. Cloud Model: Enable the "Vision" option in the capability section of the model editing page.

Note: If the local model supports vision but this option is not enabled, it may cause the model to fail to load. Forcibly enabling this feature for models that do not support vision may cause errors.

Operation Screenshot

Preprocessing Mechanism

Preprocessing Flow

  • General Information Extraction: Identify the main content and elements of the image.
  • Optical Character Recognition (OCR): Extract text from the image.
  • QR Code Parsing: Automatically recognize and process QR codes in the image.

Model Selection Mechanism

  • Prioritize task models configured with vision support.
  • If there is no task model, use the visual backup model.

Precautions

  • The current dialogue model does not participate in preprocessing.
  • Extended data field information (EXIF) in the image will be cleared, including shooting time, location, etc.

Complete Dialogue

After preprocessing, the system will:

  1. Add the extracted text information to the dialogue context.
  2. If the main dialogue model supports vision, the original image will also be added to the context.

Suggestions

  • Choose Professional Models: Cloud-based large-parameter visual models provide more accurate image understanding.
  • Enable Image Compression: Enable image compression in the general settings to reduce transmission time and traffic.
  • Ask Questions Step-by-Step: After confirming that the model has understood the image content, ask relevant questions.

With reasonable configuration and use of image analysis functions, Fuwang can efficiently handle diverse image tasks from object recognition to chart analysis and document interpretation.