Admin API

Overview

UnbodyAdmin is a TypeScript/JavaScript client for interacting with Unbody’s Admin API. It provides a convenient and unified interface to manage projects, sources, webhooks, keys, and more within your Unbody environment.

Installation

# using npm
npm install unbody
 
# using yarn
yarn add unbody
 

Initialization

To start using UnbodyAdmin, you need to create an instance of it. You’ll need to provide valid authentication credentials, such as Admin keys or a JWT access token.

Example Initialization:

import { UnbodyAdmin } from 'unbody/admin'
 
const admin = new UnbodyAdmin({
  auth: {
    // credentials go here
  },
})
 

Authentication Methods

  1. Admin Keys (Basic Auth) (recommended)

Generate an admin key from the Unbody dashboard and use it as username and password:

const admin = new UnbodyAdmin({
  auth: {
    username: '[admin-key-id]',
    password: '[admin-key-secret]',
  },
})
 
  1. Access Token (Bearer Token) If you have a JWT access token, you can provide it directly:
const admin = new UnbodyAdmin({
  auth: {
    accessToken: 'Bearer [your-access-token]',
  },
})
 

Usage

Creating a Project

Creating a project in Unbody requires configuring its settings and saving the project instance. You can use the provided helper classes to define the project settings easily.

Here’s an example of how to create a new project with a specific configuration:

import { ProjectSettings, TextVectorizer } from 'unbody/admin'
 
// Define project settings
const settings = new ProjectSettings().set(
  new TextVectorizer(TextVectorizer.Transformers.Default),
)
 
// Create a project reference
const project = admin.projects.ref({
  name: 'New Project',
  settings,
})
 
// Save the project
await project.save()
 
console.log(`Created "${project.name}"; id: ${project.id}`)
 

For more details on configuring project settings, see the Project Settings Guide

Creating a Source

To create a source within a project, you first need to reference the project. Here’s an example:

// Fetch the project instance
const project = await admin.projects.get({
  id: '[project id]',
})
 
// Alternatively, reference the project directly
const project = admin.projects.ref({
  id: '[project id]',
})
 
// Create a source within the project
const source = project.sources.ref({
  name: '[source name]',
  type: SourceTypes['[source type]'],
})
 
// Save the source
await source.save()
 

Ensure the SourceTypes corresponds to a valid type supported by the API. For a list of available source types, see the [Source Types Reference](Link to be added later)

Creating a Project API Key

You can create an API key for a specific project to enable secure access to its resources. Here’s an example:

const apiKey = await project.apiKeys
  .ref({
    name: 'API Key',
  })
  .save()
 
console.log(apiKey.id, apiKey.key)
 

The apiKey.key should be stored securely, as it will not be retrievable after creation. If the key is lost, it must be deleted, and a new one should be created.

Listing Resources

You can list resources such as projects or sources using the provided API methods. Here’s how to retrieve a list of projects or sources:

// List projects
const { projects, pagination } = await admin.projects.list({})
 
// List sources for a specific project
const project = admin.projects.ref({ id: ['project id'] })
const { sources, pagination } = await project.sources.list({})
 

Optional Properties

Both methods accept the following optional properties:

  • limit: number - Maximum number of results to return.
  • offset: number - Number of results to skip before starting the list.
  • sort: Object - Sorting configuration with the following fields:
    • field: string - The field to sort by.
    • order: "asc" | "desc" - Sorting order.
  • filter: Object - Filtering criteria specific to the resource type. See the Filterable Properties for Projects and Filterable Properties for Sources.

Returned Values

Each method returns:

  1. An array of resources (e.g., projects or sources).
  2. A pagination object with the following properties:
  • count: number - The number of results returned.
  • total: number - The total number of results matching the filter criteria.
  • offset: number - The offset value used in the request.
  • limit: number - The limit value used in the request.

Endpoints

Projects

  • Create Project

    await admin.projects.ref({ ...params }).save()
     
    • Method: POST
    • Path: /projects
  • List Projects

    const { projects } = await admin.projects.list({ ...params })
     
    • Method: GET
    • Path: /projects
    • Filterable Fields: Reference
    • Pagination Parameters: limit, offset, sort
  • Delete Project

    await admin.projects.delete(projectRef) // or
    await admin.projects.delete({ id: '[project-id]' })
     
    • Method: DELETE
    • Path: /projects/{projectId}

Project API Keys

  • Create API Key

    await projectRef.apiKeys.ref({ ...params }).save()
     
    • Method: POST
    • Path: /projects/{projectId}/api-keys
  • Delete API Key

    await projectRef.apiKeys.delete({ id: '[api-key-id]' })
     
    • Method: DELETE
    • Path: /projects/{projectId}/api-keys/{apiKeyId}

Project Webhooks

  • Create Webhook

    await project.webhooks.ref({ ...params }).save()
     
    • Method: POST
    • Path: /projects/{projectId}/webhooks
  • List Webhooks

    const { webhooks } = await project.webhooks.list({ ...params })
     
    • Method: GET
    • Path: /projects/{projectId}/webhooks
    • Filterable Fields: Reference
  • Delete Webhook

    await project.webhooks.delete({ id: '[webhook-id]' })
     
    • Method: DELETE
    • Path: /projects/{projectId}/webhooks/{webhookId}

Project Sources

  • Create Source

    import { SourceTypes } from 'unbody/admin'
     
    await project.sources
      .ref({
        name: 'New Source',
        type: SourceTypes.PushApi,
      })
      .save()
     
    • Method: POST
    • Path: /projects/{projectId}/sources
  • List Sources

    const { sources } = await project.sources.list({ ...params })
     
    • Method: GET
    • Path: /projects/{projectId}/sources
    • Filterable Fields: Reference
  • Delete Source

    await project.sources.delete({ id: '[source-id]' })
     
    • Method: DELETE
    • Path: /projects/{projectId}/sources/{sourceId}
  • Update Source

    await project.sources.update(sourceRef) // or
    await project.sources.update({ id: '[source-id]' })
     
    • Method: PATCH
    • Path: /projects/{projectId}/sources/{sourceId}
  • Initialize Source

    await project.sources.initialize(sourceRef) // or
    await project.sources.initialize({ id: '[source-id]' })
     
    • Method: POST
    • Path: /projects/{projectId}/sources/{sourceId}/indexing/initialize
  • Rebuild Source

    await project.sources.rebuild(sourceRef) // or
    await project.sources.rebuild({ id: '[source-id]' })
     
    • Method: POST
    • Path: /projects/{projectId}/sources/{sourceId}/indexing/rebuild

Admin API Keys

  • Create Admin Key

    await admin.adminKeys.ref({ ...params }).save()
     
    • Method: POST
    • Path: /api-keys
  • Delete Admin Key

    await admin.adminKeys.delete({ id: '[admin-key-id]' })
     
    • Method: DELETE
    • Path: /api-keys/{apiKeyId}

Filterable Fields

Project Filterable Fields

FieldTypeFilter Operators
namestringString Filter Operators
stateenumEnum Filter Operators
pausedAtDateDate Filter Operators
restoredAtDateDate Filter Operators
createdAtDateDate Filter Operators
updatedAtDateDate Filter Operators

Project State Possible Values

To use these states in your code, import the ProjectStates enum:

import { ProjectStates } from 'unbody/admin'
 

These are the possible states a project can be in, along with their corresponding TypeScript enumeration values:

StateTypeScript Enum
createdProjectStates.Created
initializingProjectStates.Initializing
initializedProjectStates.Initialized
pausedProjectStates.Paused
pausingProjectStates.Pausing
restoringProjectStates.Restoring

Source Filterable Fields

FieldTypeFilter Operators
namestringString Filter Operators
initializedbooleanBoolean Filter Operators
stateenumEnum Filter Operators
typeenumEnum Filter Operators
pausedAtDateDate Filter Operators
restoredAtDateDate Filter Operators
createdAtDateDate Filter Operators
updatedAtDateDate Filter Operators

Source State Possible Values

To use these states in your code, import the SourceStates enum:

import { SourceStates } from 'unbody/admin'
 

These are the possible states a source can be in, along with their corresponding TypeScript enumeration values:

StateTypeScript Enum
initializingSourceStates.Initializing
updatingSourceStates.Updating
deletingSourceStates.Deleting
pausedSourceStates.Paused
idleSourceStates.Idle

Source Type Possible Values

To use these types in your code, import the SourceTypes enum:

import { SourceTypes } from 'unbody/admin'
 

These are the possible source types, along with their corresponding TypeScript enumeration values:

TypeTypeScript Enum
discordSourceTypes.Discord
github_branchSourceTypes.GithubBranch
github_issuesSourceTypes.GithubIssues
google_calendarSourceTypes.GoogleCalendar
google_driveSourceTypes.GoogleDrive
push_apiSourceTypes.PushApi

Webhook Filterable Fields

FieldTypeFilter Operators
createdAtDateDate Filter Operators

Filter Operators

String and Enum Filter Operators

  • $eq, $ne, $in, $nin, $null, $like

Number Filter Operators

  • $eq, $ne, $gt, $lt, $gte, $lte, $null

Boolean Filter Operators

  • $eq, $ne, $null

Date Filter Operators

  • $gt, $gte, $lt, $lte, $null

Project Settings Guide

The ProjectSettings class serves as a factory for creating and managing project settings. It provides methods to define and retrieve specific settings, which can then be used to configure a project’s behavior.

Using .set Method

Use the .set method to define a project setting. For example:

import { ProjectSettings, TextVectorizer } from 'unbody/admin'
 
const settings = new ProjectSettings()
 
settings.set(new TextVectorizer(TextVectorizer.Transformers.Default))
 

Using .get Method

Use the .get method to retrieve a specific setting. For example:

const textVectorizer = settings.get(TextVectorizer) // returns an instance of TextVectorizer
 

TextVectorizer Configuration

The TextVectorizer defines the text vectorization model used for representing text as vectors, which is essential for search operations.

Usage Example

To configure a project with the default TextVectorizer:

import { TextVectorizer } from 'unbody/admin'
 
const textVectorizer = new TextVectorizer(TextVectorizer.Transformers.Default)
console.log(textVectorizer.name) // "text2vec-transformers"
 

Available Models

The table below lists the available text vectorization models, their providers, and the corresponding details:

TypeScript TypeNameProviderModel
TextVectorizer.OpenAI.Ada002text2vec-openai-ada-002OpenAIOpenAI Ada 002
TextVectorizer.OpenAI.TextEmbedding3Largetext2vec-openai-text-embedding-3-largeOpenAIOpenAI Text Embedding 3 Large
TextVectorizer.OpenAI.TextEmbedding3Smalltext2vec-openai-text-embedding-3-smallOpenAIOpenAI Text Embedding 3 Small
TextVectorizer.Cohere.MultilingualV3text2vec-cohere-multilingual-v3.0CohereCohere Multilingual V3
TextVectorizer.Cohere.MultilingualLightV3text2vec-cohere-multilingual-light-v3.0CohereCohere Multilingual Light V3
TextVectorizer.Cohere.EnglishV3text2vec-cohere-english-v3.0CohereCohere English V3
TextVectorizer.Cohere.EnglishLightV3text2vec-cohere-english-light-v3.0CohereCohere English Light V3
TextVectorizer.Cohere.EnglishV2text2vec-cohere-english-v2.0CohereCohere English V2
TextVectorizer.Cohere.EnglishLightV2text2vec-cohere-english-light-v2.0CohereCohere English Light V2
TextVectorizer.Contextionary.Defaulttext2vec-contextionaryUnbodyContextionary
TextVectorizer.Transformers.Defaulttext2vec-transformersUnbodyTransformers Default

ImageVectorizer Configuration

The ImageVectorizer sets the model used for representing images as vectors.

Usage Example

To configure a project with the ImageVectorizer:

import { ImageVectorizer } from 'unbody/admin'
 
const imageVectorizer = new ImageVectorizer(
  ImageVectorizer.Img2VecNeural.Default,
)
console.log(imageVectorizer.name) // "img2vec-neural"
 

Available Models

The table below lists the available image vectorization model:

TypeScript TypeNameProviderModel
ImageVectorizer.Img2VecNeural.Defaultimg2vec-neuralUnbodyImg2Vec Neural

QnA Configuration

The QnA configuration defines the provider used for Q&A capabilities within the project.

Usage Example

To configure a project with the QnA provider:

import { QnA } from 'unbody/admin'
 
const qna = new QnA(QnA.Transformers.Default)
console.log(qna.name) // "qna-transformers"
 

Available Providers

The table below lists the available QnA providers:

TypeScript TypeNameProviderModel
QnA.Transformers.Defaultqna-transformersUnbodyTransformers Default
QnA.OpenAI.GPT3_5TurboInstructqna-openai-gpt-3.5-turbo-instructOpenAIOpenAI GPT-3.5 Turbo Instruct

Generative Configuration

The Generative configuration enables RAG (Retrieval-Augmented Generation) capabilities within the project.

Usage Example

To configure a project with the Generative provider:

import { Generative } from 'unbody/admin'
 
const generative = new Generative(Generative.OpenAI.GPT4o)
console.log(generative.options.model) // "gpt-4o"
 

Available Models

The table below lists the available models for Generative configuration:

TypeScript TypeNameProviderModel
Generative.OpenAI.GPT3_5Turbogpt-3.5-turboOpenAIGPT-3.5 Turbo
Generative.OpenAI.GPT4gpt-4OpenAIGPT-4
Generative.OpenAI.GPT4Turbogpt-4-turboOpenAIGPT-4 Turbo
Generative.OpenAI.GPT4ogpt-4oOpenAIGPT-4o
Generative.OpenAI.GPT4oMinigpt-4o-miniOpenAIGPT-4o Mini
Generative.Cohere.CommandcommandCohereCommand
Generative.Cohere.CommandLightcommand-lightCohereCommand Light
Generative.Cohere.CommandRcommand-rCohereCommand R
Generative.Cohere.CommandRPluscommand-r-plusCohereCommand R Plus
Generative.Mistral.OpenMistral7bopen-mistral-7bMistralOpen Mistral 7b
Generative.Mistral.OpenMixtral8x7bopen-mixtral-8x7bMistralOpen Mixtral 8x7b

Reranker Configuration

The Reranker specifies the reranking model used to prioritize search results, ensuring that the most relevant results appear at the top.

Usage Example

To configure a reranker for a project:

import { Reranker } from 'unbody/admin'
 
const reranker = new Reranker(Reranker.Transformers.Default)
console.log(reranker.name) // "reranker-transformers"
 

Available Models

The table below lists the available reranking models:

TypeScript TypeNameProviderModel
Reranker.Cohere.MultilingualV3reranker-cohere-multilingual-v3.0CohereMultilingual V3
Reranker.Cohere.MultilingualV2reranker-cohere-multilingual-v2.0CohereMultilingual V2
Reranker.Cohere.EnglishV3reranker-cohere-english-v3.0CohereEnglish V3
Reranker.Cohere.EnglishV2reranker-cohere-english-v2.0CohereEnglish V2
Reranker.Transformers.Defaultreranker-transformersUnbodyTransformers Default

Spellcheck Configuration

The Spellcheck configuration specifies the model used for text spellchecking within the project.

Usage Example

To configure spellchecking for a project:

import { Spellcheck } from 'unbody/admin'
 
const spellcheck = new Spellcheck(Spellcheck.TextSpellcheck.Default)
console.log(spellcheck.name) // "text-spellcheck"
 

Available Models

The table below lists the available spellchecking models:

TypeScript TypeNameProviderModel
Spellcheck.TextSpellcheck.Defaulttext-spellcheckUnbodyText Spellcheck Default

AutoSummary Configuration

Automatically generates summaries for text content.

Usage Example

import { AutoSummary } from 'unbody/admin'
 
const autoSummary = new AutoSummary(AutoSummary.OpenAI.GPT4o)
console.log(autoSummary.name) // "autosum-openai-gpt-4o"
 

Available Models

TypeScript TypeNameModel ProviderModel
AutoSummary.OpenAI.GPT3_5Turboautosum-openai-gpt-3.5-turboOpenAIGPT-3.5 Turbo
AutoSummary.OpenAI.GPT4oautosum-openai-gpt-4oOpenAIGPT-4o
AutoSummary.OpenAI.GPT4oMiniautosum-openai-gpt-4o-miniOpenAIGPT-4o Mini
AutoSummary.Cohere.CommandRautosum-cohere-command-rCohereCommand R

AutoKeywords Configuration

Generates keywords automatically from text content to enhance search and categorization.

Usage Example

import { AutoKeywords } from 'unbody/admin'
 
const autoKeywords = new AutoKeywords(AutoKeywords.OpenAI.GPT4o)
console.log(autoKeywords.name) // "autokeywords-openai-gpt-4o"
 

Available Models

TypeScript TypeNameModel ProviderModel
AutoKeywords.OpenAI.GPT3_5Turboautokeywords-openai-gpt-3.5-turboOpenAIGPT-3.5 Turbo
AutoKeywords.OpenAI.GPT4oautokeywords-openai-gpt-4oOpenAIGPT-4o
AutoKeywords.OpenAI.GPT4oMiniautokeywords-openai-gpt-4o-miniOpenAIGPT-4o Mini

AutoEntities Configuration

Automatically identifies and extracts named entities from text.

Usage Example

import { AutoEntities } from 'unbody/admin'
 
const autoEntities = new AutoEntities(AutoEntities.OpenAI.GPT4o)
console.log(autoEntities.name) // "autoentities-openai-gpt-4o"
 

Available Models

TypeScript TypeNameModel ProviderModel
AutoEntities.OpenAI.GPT3_5Turboautoentities-openai-gpt-3.5-turboOpenAIGPT-3.5 Turbo
AutoEntities.OpenAI.GPT4oautoentities-openai-gpt-4oOpenAIGPT-4o
AutoEntities.OpenAI.GPT4oMiniautoentities-openai-gpt-4o-miniOpenAIGPT-4o Mini

AutoTopics Configuration

Automatically generates topics from the content.

Usage Example

import { AutoTopics } from 'unbody/admin'
 
const autoTopics = new AutoTopics(AutoTopics.OpenAI.GPT4o)
console.log(autoTopics.name) // "autotopics-openai-gpt-4o"
 

Available Models

TypeScript TypeNameModel ProviderModel
AutoTopics.OpenAI.GPT3_5Turboautotopics-openai-gpt-3.5-turboOpenAIGPT-3.5 Turbo
AutoTopics.OpenAI.GPT4oautotopics-openai-gpt-4oOpenAIGPT-4o
AutoTopics.OpenAI.GPT4oMiniautotopics-openai-gpt-4o-miniOpenAIGPT-4o Mini

AutoVision Configuration

Generates captions, labels, and extracts texts from image files.

Usage Example

import { AutoVision } from 'unbody/admin'
 
const autoVision = new AutoVision(AutoVision.OpenAI.GPT4o)
console.log(autoVision.name) // "autovision-openai-gpt-4o"
 

Available Models

TypeScript TypeNameModel ProviderModel
AutoVision.OpenAI.GPT4oautovision-openai-gpt-4oOpenAIGPT-4o
AutoVision.OpenAI.GPT4oMiniautovision-openai-gpt-4o-miniOpenAIGPT-4o Mini
AutoVision.OpenAI.GPT4Turboautovision-openai-gpt-4-turboOpenAIGPT-4 Turbo

PdfParser Configuration

The PdfParser configuration enables parsing and extracting content from PDF files.

Usage Example

import { PdfParser } from 'unbody/admin'
 
settings.set(new PdfParser(PdfParser.NlmSherpa.Default))
 
console.log(settings.get(PdfParser).name) // "pdfparser-nlmsherpa"
 

Available Parsers

TypeScript TypeNameDescription
PdfParser.NlmSherpa.Defaultpdfparser-nlmsherpaNLM Sherpa; extracts text from PDF files.
PdfParser.Pdf2Image.Defaultpdfparser-pdf2imagepdf2image; converts PDF files to images.

CustomSchema Configuration

The CustomSchema configuration allows defining custom data schemas for projects. It enables the creation of custom collections and fields, which can be used to store, organize, and extend data structures within a project. This section provides an updated guide with clear examples and detailed explanations.

Usage Example

Below is a comprehensive example that demonstrates how to define a CustomSchema, add collections and fields, and extend built-in schemas:

import { CustomSchema, ProjectSettings } from 'unbody/admin'
 
// Create a new custom schema
const customSchema = new CustomSchema()
 
// Define a custom collection with a text field
const customCollection = new CustomSchema.Collection('CustomCollection')
customCollection.add(new CustomSchema.Field.Text('name', 'Name'))
 
// Extend a built-in schema (e.g., VideoFile) with an additional field
const videoFileCollection = new CustomSchema.Collection('VideoFile').add(
  new CustomSchema.Field.Text(
    'xChapters', // Field name
    'Extracted Chapters', // Field description
    true, // Indicates the field is an array
    'word', // Tokenization strategy
  ),
)
 
// Add the custom collection and set the schema
customSchema.add(customCollection).extend(videoFileCollection)
 
// Apply the schema to project settings
const settings = new ProjectSettings()
settings.set(customSchema)
 

Field Definition

When defining fields, the following arguments are used:

  • name: string – Field name in camelCase format.
  • description: string – Description of the field.
  • array?: boolean – Indicates whether the field is an array. Defaults to false.
  • tokenization?: "field" | "word" | "lowercase" | "whitespace" – Tokenization strategy for text fields.
  • skipVectorization?: boolean – If true, skips vectorization for the field.

Supported Field Types

The following field types are available for use:

Field TypeDescription
Field.IntRepresents an integer field.
Field.NumberRepresents a numerical field.
Field.TextRepresents a text field, with optional tokenization and vectorization.
Field.UUIDRepresents a universally unique identifier field.
Field.DateRepresents a date field.
Field.BooleanRepresents a boolean field.
Field.ObjectRepresents a nested object field.
Field.PhoneNumberRepresents a phone number field with advanced formatting.
Field.GeoCoordinatesRepresents a geographical coordinates field (latitude and longitude).
Field.CrefRepresents a cross-reference field pointing to another collection.

Examples of Advanced Field Definitions

Cref Field Example

// Define a cref field referencing the VideoFile collection's document field
const crefField = new CustomSchema.Field.Cref(
  'videos',
  'References Video Files',
)
crefField.add('VideoFile', 'document')
 

Object Field Example

// Define a complex object field for an address
const objectField = new CustomSchema.Field.Object('address', 'Address')
objectField
  .add(new CustomSchema.Field.Text('street', 'Street'))
  .add(new CustomSchema.Field.Text('city', 'City'))
  .add(new CustomSchema.Field.Text('state', 'State'))
  .add(new CustomSchema.Field.Text('zip', 'Zip'))
 

Notes and Important Details

  1. Naming Conventions:
  • Custom collection names must follow PascalCase and end with Collection (e.g., CustomCollection).
  • Field names must be in camelCase format.
  1. Built-in Collections:
  • You can extend the following built-in collections with custom fields:
  • ImageBlock
  • AudioFile
  • VideoFile
  • TextDocument
  • SpreadsheetDocument
  • SubtitleFile
  1. Extension Rules:
  • Extended field names must start with x and use camelCase format (e.g., xCustomField).
  1. Field Specific Notes:
  • Text Field:
  • Supports tokenization strategies: word, field, lowercase, whitespace.
  • Can skip vectorization using the skipVectorization flag.
  • PhoneNumber Field:
  • Input/Output Schema:
{
  "input": "020 1234567", // Required. Raw input in string format
  "defaultCountry": "nl", // Required if only a national number is provided, ISO 3166-1 alpha-2 country code. Only set if explicitly set by the user.
  "internationalFormatted": "+31 20 1234567", // Read-only string
  "countryCode": 31, // Read-only unsigned integer, numerical country code
  "national": 201234567, // Read-only unsigned integer, numerical representation of the national number
  "nationalFormatted": "020 1234567", // Read-only string
  "valid": true // Read-only boolean. Whether the parser recognized the phone number as valid
}
 
  • GeoCoordinates Field:
  • Stores latitude and longitude as numbers.
  • Input/Output Schema:
{
  "latitude": 52.366667,
  "longitude": 4.9
}
 
  • Object Field:
  • Can contain nested fields of types int, number, text, date, boolean, or uuid.
  • Objects are not vectorized, and filter operators are unsupported.
  • Cref Field The Cref field enables creating cross-references between collections. It is crucial to understand the rules and requirements for using cross-references effectively: 1. Built-in Collections: You can reference only the following built-in collections, and exclusively their document field:
    - `ImageBlock` → `document`
    - `AudioFile` → `document`
    - `VideoFile` → `document`
    - `TextDocument` → `document`

    These are the only built-in collections you can reference, and the `document` field is the only field available for cross-referencing in these collections.

2. **Custom Cross-References**:

    When referencing custom collections, it is mandatory to define the cross-reference relationship in both collections. This means:

    - If `CollectionA` has a `Cref` field pointing to `CollectionB`, then `CollectionB` must also have a `Cref` field explicitly defined to reference `CollectionA`.

    Built-in collections are an exception to this rule—they do not require you to define reciprocal references when referencing their `document` field.

Enhancement Configuration

The Enhancement configuration allows you to define custom pipelines to enrich your data with generated or transformed data. While Unbody provides built-in automatic enhancement pipelines, such as AutoSummary, AutoVision, AutoKeywords, and more, you can create custom pipelines tailored to your needs. These custom pipelines enable tasks like extracting or generating data using LLMs, crawling websites (coming soon), and customizing chunking strategies with different approaches.

Custom pipelines are defined to process records in specific collections and can include conditional logic, reusable variables, and multiple steps for complex workflows.


Defining a Custom Enhancement Pipeline

Here’s how you can define a custom pipeline:

import { Enhancement } from 'unbody/admin'
 
const enrichVideoPipeline = new Enhancement.Pipeline(
  'enrich_video', // Unique pipeline name
  'VideoFile', // Target collection
)
 

This example defines a custom enhancement pipeline that will be run for records in the VideoFile collection.

Optional Parameters for a Pipeline

  • if: A condition that determines whether the pipeline should run for a given record. It should return true or false. If false, the pipeline will not execute for that record. Example:

    if: (ctx) => ctx.source.type === 'google_drive'
     
  • vars: A reusable map of variables that can be accessed in the pipeline’s steps. These variables are helpful for injecting dynamic instructions or configurations into the pipeline. Example:

    vars: {
      transcription: (ctx) =>
        JSON.stringify(ctx.record.subtitles[0].entries),
    }
     

Pipeline Context (ctx)

The ctx object provides essential information about the current pipeline execution. Below is the structure of the context:

export type EnhancementContext = {
  source: {
    id: string
    type: string
    name: string
    createdAt: Date
    updatedAt: Date
  }
 
  project: {
    id: string
    name: string
    createdAt: Date
    updatedAt: Date
    settings: Record<string, any>
  }
 
  vars: Record<string, any>
 
  record: Record<string, any> // The record being enhanced
  steps: Record<
    string,
    {
      output: Record<string, any> // Output of the step
      result: Record<string, any> // Result of the step
      skipped?: boolean
      failedAt?: Date
      startedAt?: Date
      finishedAt?: Date
    }
  >
  result: Record<string, any> // Current step's result after execution
}
 

Key Context Components

  • record: The record being enhanced. Includes all GraphQL record properties.
  • steps: A map of step names to their output, result, and execution details (e.g., start and finish times).
  • result: The result of the current step after it has been executed.
  • vars: Custom variables defined in the pipeline.
  • source and project: Metadata about the source and project associated with the pipeline.

Steps in a Pipeline

Each pipeline must include at least one step. A step is defined by:

  • A unique name within the pipeline.
  • An action specifying the operation to perform.
  • An output map that defines how the results will be stored.
  • Optional parameters such as if and onFailure for advanced control.

Optional Parameters for a Step

  • if: A condition that determines whether the step should run for a given record. It should return true or false. If false, the step will not execute. Example: if: (ctx) => ctx.vars.transcription.length > 0
  • onFailure: Specifies the behavior when the step fails. Possible values:
    • "continue": Ignore errors and proceed to the next step.
    • "stop": Stop the pipeline execution on failure.

Step Example

const action = new Enhancement.Action.StructuredGenerator({
  model: 'openai-gpt-4o', // Action model
  prompt: (ctx) =>
    `Extract chapters from the video's transcription:\\\\n ${ctx.vars.transcription}`,
  schema: (ctx, { z }) =>
    z
      .object({
        chapters: z.array(
          z.object({
            start: z.string().describe('Start time of the chapter'),
            end: z.string().describe('End time of the chapter'),
            title: z.string().describe('Title of the chapter'),
          }),
        ),
      })
      .describe('Extracted info'),
})
 
const extractChapters = new Enhancement.Step('extract_chapters', action, {
  output: {
    xChapters: (ctx) => ctx.result.json.chapters || [],
  },
  if: (ctx) => ctx.vars.transcription.length > 0,
  onFailure: 'continue',
})
 
enrichVideoPipeline.add(extractChapters)
 

Explanation of the Example

  1. Action: Specifies the operation to perform using a model (e.g., openai-gpt-4o) and includes parameters like a prompt and a schema.
  2. Step: Ties the action to the pipeline and specifies how the action’s result (ctx.result) is stored in the output.

Setting the Enhancement in Project Settings

Once the pipeline is defined, it can be added to the enhancement configuration and applied to the project settings:

const enhancement = new Enhancement()
 
enhancement.add(enrichVideoPipeline)
 
settings.set(enhancement)
 

Available Actions

The Enhancement configuration supports various built-in actions. Each action performs specific operations based on the inputs provided and generates a result. This section details the TextGenerator action, its input parameters, and its result structure.


TextGenerator

The TextGenerator action uses a language model to generate text based on the provided prompt. It accepts the following input parameters:

ParameterTypeRequiredDescription
modelstring | FunctionYes
promptstring | FunctionYes
temperaturenumber | FunctionNo
maxTokensnumber | FunctionNo
topPnumber | FunctionNo
frequencyPenaltynumber | FunctionNo
presencePenaltynumber | FunctionNo

For all parameters, you can provide a literal value (e.g., 'openai-gpt-4') or a function that dynamically determines the value based on the pipeline context (ctx).


Result Structure

The result object generated by the TextGenerator action contains the following property:

PropertyTypeDescription
contentstringThe text generated by the language model.

Example: Summarizing Video Transcriptions

Below is an example of using the TextGenerator action to summarize video transcriptions:

const summarizeVideo = new Enhancement.Step(
  'summarize_video',
  new Enhancement.Action.TextGenerator({
    model: (ctx) => ctx.vars.preferredModel || 'openai-gpt-4', // Dynamically choosing the model
    prompt: (ctx) =>
      `Summarize the video transcription:\\\\n${JSON.stringify(
        ctx.record.subtitles?.[0]?.entries || [],
      )}`,
    temperature: 0.7,
  }),
  {
    output: {
      autoSummary: (ctx) => ctx.result.content, // Storing the generated summary
    },
  },
)
 

StructuredGenerator

The StructuredGenerator action generates structured JSON data based on a prompt and schema.


Input Parameters

ParameterTypeRequiredDescription
modelstring | FunctionYes
promptstring | FunctionYes
schemaT | FunctionYes
temperaturenumber | FunctionNo
maxTokensnumber | FunctionNo
topPnumber | FunctionNo
frequencyPenaltynumber | FunctionNo
presencePenaltynumber | FunctionNo
images{ url: string }[] | FunctionNo

For all parameters, you can provide a literal value (e.g., 'openai-gpt-4o') or a function that dynamically determines the value based on the pipeline context (ctx) or helpers.


Result Structure

The result object generated by the StructuredGenerator action contains the following property:

PropertyTypeDescription
jsonRecord<string, any>The structured JSON data produced by the action.

Helpers

The z (Zod) helper library is included to define and validate the schema for the output. Zod is a validation library that ensures the generated JSON matches the expected structure.


Example: Extracting Chapters from a Video Transcription

const extractChapters = new Enhancement.Step(
  'extract_chapters',
  new Enhancement.Action.StructuredGenerator({
    model: 'openai-gpt-4o', // Language model
    prompt: (ctx) =>
      `Extract chapters from the video's transcription:\\\\n ${JSON.stringify(
        ctx.record.subtitles,
      )}`,
    schema: (ctx, { z }) =>
      z
        .object({
          chapters: z.array(
            z.object({
              start: z.string().describe('Start time of the chapter'),
              end: z.string().describe('End time of the chapter'),
              title: z.string().describe('Title of the chapter'),
            }),
          ),
        })
        .describe('Extracted chapter information'),
    temperature: 0.7,
  }),
  {
    output: {
      xChapters: (ctx) => ctx.result.json.chapters || [],
    },
  },
)
 

Summarizer

The Summarizer action is designed to summarize long texts using AI models. It employs a map-reduce method to efficiently handle large content by breaking it into smaller chunks and generating a cohesive summary.


Input Parameters

ParameterTypeRequiredDescription
modelstring | FunctionYes
textstring | FunctionYes
metadatastring | FunctionNo
promptstring | FunctionNo

Prompt Guidelines

When providing a custom prompt, ensure it includes the {text} and {metadata} placeholders, which will be dynamically replaced during execution:

  • {text}: Represents the actual long content to be summarized.
  • {metadata}: Includes metadata for additional context. If not provided, this placeholder will be omitted.

Example Prompt:

Summarize the following text based on the provided metadata:\\\\n
Text: {text}\\\\n
Metadata: {metadata}

Output Structure

The result object generated by the Summarizer action contains the following property:

PropertyTypeDescription
summarystringThe summarized text produced by the action.

Example: Summarizing a Long Text

const summarizeText = new Enhancement.Step(
  'summarize_text',
  new Enhancement.Action.Summarizer({
    model: 'openai-gpt-4o', // Language model
    text: (ctx) => ctx.record.content, // Long text content
    metadata: '- title: document title...', // Metadata for context
    prompt: `Summarize the following content:\\\\nText: {text}\\\\nMetadata: {metadata}`, // Custom prompt
  }),
  {
    output: {
      autoSummary: (ctx) => ctx.result.summary, // Storing the summary
    },
  },
)
 
©2024 Unbody