Vectorizers
Vectorizers are a core component of Unbody, converting unstructured data like text or images into numerical vectors that AI systems can understand. This transformation enables powerful operations like semantic search, recommendations, and more.
Unbody supports multiple vectorization models, ensuring flexibility and performance for various use cases. You can configure vectorizers programmatically using the Admin API or through the dashboard.
Why Configure Vectorizers?
Each application has unique requirements. Some need highly accurate vectorization for large datasets, while others prioritize speed or compatibility with specific languages or content types. By configuring vectorizers, you control how data is represented and processed.
Types of Vectorizers
Unbody supports two main types of vectorizers:
- Text Vectorizers
- Converts textual data into vectors.
- Supported models include:
- Transformers: Open-source model for general-purpose vectorization.
- OpenAI: Advanced proprietary models like
ada-002
andtext-embedding-3-small
. - Cohere: High-performance multilingual models.
- Contextionary: Lightweight vectorizer optimized for fast lookups.
- Image Vectorizers
- Converts image data into vectors.
- Supported models include:
- Img2Vec-Neural: Default vectorizer for image processing.
- Multi2Vec-Clip: Advanced multimodal image-to-text vectorizer (coming soon).
For a complete list of available vectorizers, refer to the Vectorizers Reference.
Configuring Vectorizers via Admin API
To configure vectorizers for your project, use the Admin API. Follow these steps:
1. Initialize the Admin Client
import { UnbodyAdmin, TextVectorizer } from 'unbody/admin';
const admin = new UnbodyAdmin({
auth: {
username: '[admin-key-id]',
password: '[admin-key-secret]',
},
});
2. Set a Text Vectorizer
import { ProjectSettings, TextVectorizer } from 'unbody/admin';
const settings = new ProjectSettings();
settings.set(new TextVectorizer(TextVectorizer.OpenAI.Ada002));
// Apply vectorizer settings to your project
const project = admin.projects.ref({ name: 'My Project', settings });
await project.save();
console.log(`Configured text vectorizer for project: ${project.name}`);
3. Set an Image Vectorizer
import { ImageVectorizer } from 'unbody/admin';
const settings = new ProjectSettings();
settings.set(new ImageVectorizer(ImageVectorizer.Img2VecNeural.Default));
// Apply image vectorizer settings to your project
const project = admin.projects.ref({ name: 'My Image Project', settings });
await project.save();
console.log(`Configured image vectorizer for project: ${project.name}`);
Configuring Vectorizers via Dashboard
You can also configure vectorizers directly from the Unbody dashboard:
- Navigate to Project Settings.
- Select Vectorizers under the configuration section.
- Choose the vectorizer model for text or image data.
- Save your changes.
For detailed instructions, see the Dashboard Configuration Guide.
Advanced Configuration
Using Multiple Vectorizers
Unbody allows assigning different vectorizers to specific data types or workflows. For example:
settings.set(new TextVectorizer(TextVectorizer.Cohere.MultilingualV3));
settings.set(new ImageVectorizer(ImageVectorizer.Multi2VecClip));
Dynamic Model Selection
You can dynamically select vectorizers based on runtime conditions:
const vectorizer = new TextVectorizer(ctx => {
return ctx.language === 'en'
? TextVectorizer.OpenAI.TextEmbedding3Small
: TextVectorizer.Cohere.MultilingualV3;
});
settings.set(vectorizer);
Key Considerations
- Performance: Proprietary models like OpenAI may offer better accuracy but require cloud resources.
- Cost: Some models are free (open-source), while others are paid.
- Scalability: Choose vectorizers optimized for your dataset size and operational scale.
Next Steps
Once vectorizers are configured:
- Test vectorization with sample data using the GraphQL API.
- Optimize pipeline performance by combining vectorizers with rerankers.
For further details, explore: