Image Analysis Enhancement Pipeline

Build a metadata extraction system for images that automatically detects colors, subjects, and resolution. This enhancement pipeline combines AI vision capabilities with structured data output.

Pipeline Setup

Create a targeted pipeline that processes only image files through MIME type filtering, ensuring efficient resource usage within the ImageBlock collection:

const imageEnhancementPipeline = new Enhancement.Pipeline(
  "enrich_image",  // Unique identifier for the pipeline
  "ImageBlock",    // Target collection for processing
  {
    // Conditional filter to process only image files
    if: (ctx) => ctx.record.mimeType?.startsWith("image/"),
  }
);

Metadata Extraction Step

Define the enhancement pipeline step that extracts structured metadata from images. Using the StructuredGenerator action with Zod schema validation, we process image properties, captions, and OCR text into typed metadata fields:

const extractImageInfo = new Enhancement.Step(
  "extract_image_info",
  new Enhancement.Action.StructuredGenerator({
    // Use advanced AI model for intelligent image analysis
    model: "openai-gpt-4o",
    
    // Dynamic prompt generation using image context
    prompt: (ctx) => `
      Analyze this image and extract colors and subjects:
      Image caption: ${ctx.record.autoCaption || "None"}
      OCR text: ${ctx.record.autoOCR || "None"}
      Width: ${ctx.record.width}
      Height: ${ctx.record.height}
      size in bytes: ${ctx.record.size}
    `,
    
    // Define structured schema for extracted metadata
    schema: (ctx, { z }) =>
      z.object({
        colors: z.array(z.string()).describe("Main colors in the image"),
        subjects: z.array(z.string()).describe("Main subjects or objects"),
        resolution: z.string().describe("resolution of the image"),
      }),
  }),
  {
    // Transform extracted data for storage
    output: {
      xColors: (ctx) => JSON.stringify(ctx.result.json.colors),
      xSubjects: (ctx) => JSON.stringify(ctx.result.json.subjects),
      xResolution: (ctx) => ctx.result.json.resolution,
    },
  }
);
 
// Add the extraction step to the pipeline
imageEnhancementPipeline.add(extractImageInfo);

Project Setup

Set up project settings with necessary vectorizers and custom schema:

Configure text and image vectorization models
Add vision processing capabilities
Extend ImageBlock schema with metadata fields

const projectSettings = new ProjectSettings();
projectSettings
  // Configure text embedding for semantic understanding
  .set(new TextVectorizer(TextVectorizer.OpenAI.TextEmbedding3Small))
  
  // Set up image vectorization for visual similarity
  .set(new ImageVectorizer(ImageVectorizer.Img2VecNeural.Default))
  
  // Add generative AI capabilities
  .set(new Generative(Generative.OpenAI.GPT4o))
  
  // Enable automatic summarization
  .set(new AutoSummary(AutoSummary.OpenAI.GPT4o))
  
  // Implement advanced vision processing
  .set(new AutoVision(AutoVision.OpenAI.GPT4o))
  
  // Add the custom image enhancement pipeline
  .set(enhancement);
 
// Creating the project with the project settings
const project = admin.projects.ref({
    name: "Image Analysis Project",
    settings: projectSettings,
});
 
// Extend ImageBlock with those extracted fields and set the custom scheme in project settings
project.settings.set(
    new CustomSchema().extend(
    new CustomSchema.Collection("ImageBlock")
        .add(
        new CustomSchema.Field.Text(
            "xColors",
            "Main colors detected in the image"
        )
        )
        .add(
        new CustomSchema.Field.Text(
            "xSubjects",
            "Main subjects or objects detected in the image"
        )
        )
        .add(
        new CustomSchema.Field.Text(
            "xResolution",
            "Resolution information of the image"
        )
        )
    )
);
 
// Save the project
await project.save();

Data Retrieval

Use TypeScript interfaces to ensure type safety when querying enhanced images, retrieving structured metadata about colors, subjects, and resolution:

import { StringField } from "unbody";
 
// Define extended interface for enhanced image metadata
interface ExtendedImageBlock extends IImageBlock {
  xColors: StringField;
  xSubjects: StringField;
  xResolution: StringField;
}
 
// Fetch images with extracted metadata
const {
  data: { payload },
} = await unbody.get
  .collection<ExtendedImageBlock>("ImageBlock")
  .select("originalName", "xColors", "xSubjects", "xResolution")
  .exec();

Example Response

Example of the structured metadata extracted from processed images:

[
    {
        "originalName": "5.jpg",
        "xColors": "[\"blue\",\"green\",\"black\"]",
        "xResolution": "1440x960",
        "xSubjects": "[\"classic car\",\"country road\",\"person\"]"
    },
    {
        "originalName": "images (2).jpeg",
        "xColors": "[\"transparent\"]",
        "xResolution": "275x183",
        "xSubjects": "[\"futuristic ride-sharing vehicle\"]"
    },
    {
        "originalName": "Honda_e-3.jpeg",
        "xColors": "[\"lime green\"]",
        "xResolution": "1920x1168",
        "xSubjects": "[\"Honda e electric car\"]"
    },
    // ... more image entries
]

Learn More

Advanced Data Schema Build FAQ Generator From PDFs