From Data to Knowledge

The Backend Layer is the foundation of Unbody. It handles everything required to make your data AI-ready: from ingestion to processing, enrichment, and storage.

Traditional backends are built to handle static data. Unbody’s backend operates with an AI-native mindset: we treat data as knowledge to be understood, not just raw information to be stored. Every step in the pipeline is modular, scalable, and designed to integrate seamlessly into your workflows.

How It Works:

1. Data Ingestion

The pipeline starts with ingestion—bringing data into Unbody from wherever it lives. Data is often fragmented, stored in different formats, or scattered across platforms. To address this, Unbody’s ingestion is agnostic to location, format, and structure.

Unbody supports two main ingestion methods:

Prebuilt Integrations: These are ready-made connectors for platforms like Google Drive, Discord, and GitHub, allowing you to quickly sync your data.
Push API: For custom data sources, you can define your schema and send data programmatically to Unbody.

2. Data Processing

Once data is ingested, it needs to be prepared for further use. Processing ensures that raw data becomes consistent, manageable, and ready for downstream workflows.

This involves:

Cleaning: Removing duplicates, irrelevant data, and inconsistencies.
Chunking: Splitting data into smaller, logical units. For example, breaking a document into paragraphs or extracting pages.
Standardization: Normalizing data formats so they can flow seamlessly through the pipeline.

3. Data Enrichment

Enrichment turns raw data into knowledge by adding context and meaning. For example, you can enrich images by applying OCR or generating captions, or enrich PDF files by extracting metadata or performing Q&A tasks on the content.

This process is carried out using Unbody’s Enhancer module, which is plugin-based. In general, there are two types of plugins:

Prebuilt Enhancers

These are prebuilt plugins that Unbody offers right out of the box. Examples include AutoSummary and AutoKeywords for textual enrichment, or AutoVision for processing images by generating captions, applying OCR, and tagging objects.
Custom Enhancers

You can build your own plugins tailored to your specific needs, chaining tasks into a pipeline for more complex workflows. For example, a custom pipeline could extract metadata, summarize content, and tag entities—all in one sequence.

With enriched data, your applications don’t just store information—they understand it and generate insights. This forms the foundation for a generative feedback loop, where enriched outputs refine and improve future inputs.

4. Data Vectorization

After enrichment, data is transformed into vectors—numerical representations that AI models can process. Vectorization converts text, images, and other types of data into high-dimensional embeddings that enable advanced AI tasks.

In Unbody, vectorization works in two ways:

Prebuilt Vectorizers: Unbody supports ready-to-use vectorizers like OpenAI embeddings or Transformer-based models.
Custom Models: If you have specific requirements, you can plug in your own vectorizer to encode data as needed.

Vectors are essential for powering semantic search, recommendation engines, and generative workflows.

5. Database

Unbody relies on Weaviate as its database, which powers vector storage, retrieval, and semantic search. While the system is modular in almost every step, Unbody has not yet seen a convincing reason to make the database modular as well. Weaviate has proven to be highly reliable, flexible, and performant for AI-native workflows.

That said, Unbody’s architecture remains open to change. If future requirements arise that call for database modularity, we’re prepared to adapt this part of the stack as well. We value feedback, so if you have thoughts on this topic, we’d love to hear from you.

For developers familiar with Weaviate, you’ll find Unbody’s database integration straightforward, as it builds on the same principles and functionality.

6. CDN and Caching

To make enriched and vectorized data accessible globally, Unbody uses a built-in CDN. This ensures fast retrieval of processed data with minimal latency. Advanced caching features, like expiration rules and invalidation, are under development to provide even more control.

Why Unbody’s Backend is Different

Traditional backends were designed for static storage and retrieval. Unbody’s backend stands out because it’s:

AI-Native by Design: Enrichment, vectorization, and delivery are optimized for AI applications.
Flexible: Works with fragmented data from any platform or format.
Modular: Each part of the pipeline is independent, so you can use only what you need.

What’s Next?

The Backend Layer transforms your data. To interact with it in your application, explore the API Layer.

Overview Api Layer