Overview
UnbodyAdmin
is a TypeScript/JavaScript client for interacting with Unbody’s Admin API. It provides a convenient and unified interface to manage projects, sources, webhooks, keys, and more within your Unbody environment.
Installation
# using npm
npm install unbody
# using yarn
yarn add unbody
Initialization
To start using UnbodyAdmin
, you need to create an instance of it. You’ll need to provide valid authentication credentials, such as Admin keys or a JWT access token.
Example Initialization:
import { UnbodyAdmin } from 'unbody/admin'
const admin = new UnbodyAdmin({
auth: {
// credentials go here
},
})
Authentication Methods
- Admin Keys (Basic Auth) (recommended)
Generate an admin key from the Unbody dashboard and use it as username and password:
const admin = new UnbodyAdmin({
auth: {
username: '[admin-key-id]',
password: '[admin-key-secret]',
},
})
- Access Token (Bearer Token) If you have a JWT access token, you can provide it directly:
const admin = new UnbodyAdmin({
auth: {
accessToken: 'Bearer [your-access-token]',
},
})
Usage
Creating a Project
Creating a project in Unbody requires configuring its settings and saving the project instance. You can use the provided helper classes to define the project settings easily.
Here’s an example of how to create a new project with a specific configuration:
import { ProjectSettings, TextVectorizer } from 'unbody/admin'
// Define project settings
const settings = new ProjectSettings().set(
new TextVectorizer(TextVectorizer.Transformers.Default),
)
// Create a project reference
const project = admin.projects.ref({
name: 'New Project',
settings,
})
// Save the project
await project.save()
console.log(`Created "${project.name}"; id: ${project.id}`)
For more details on configuring project settings, see the Project Settings Guide
Creating a Source
To create a source within a project, you first need to reference the project. Here’s an example:
// Fetch the project instance
const project = await admin.projects.get({
id: '[project id]',
})
// Alternatively, reference the project directly
const project = admin.projects.ref({
id: '[project id]',
})
// Create a source within the project
const source = project.sources.ref({
name: '[source name]',
type: SourceTypes['[source type]'],
})
// Save the source
await source.save()
Ensure the SourceTypes
corresponds to a valid type supported by the API. For a list of available source types, see the [Source Types Reference](Link to be added later)
Creating a Project API Key
You can create an API key for a specific project to enable secure access to its resources. Here’s an example:
const apiKey = await project.apiKeys
.ref({
name: 'API Key',
})
.save()
console.log(apiKey.id, apiKey.key)
The apiKey.key
should be stored securely, as it will not be retrievable after creation. If the key is lost, it must be deleted, and a new one should be created.
Listing Resources
You can list resources such as projects or sources using the provided API methods. Here’s how to retrieve a list of projects or sources:
// List projects
const { projects, pagination } = await admin.projects.list({})
// List sources for a specific project
const project = admin.projects.ref({ id: ['project id'] })
const { sources, pagination } = await project.sources.list({})
Optional Properties
Both methods accept the following optional properties:
limit
:number
- Maximum number of results to return.offset
:number
- Number of results to skip before starting the list.sort
:Object
- Sorting configuration with the following fields:field
:string
- The field to sort by.order
:"asc" | "desc"
- Sorting order.
filter
:Object
- Filtering criteria specific to the resource type. See the Filterable Properties for Projects and Filterable Properties for Sources.
Returned Values
Each method returns:
- An array of resources (e.g.,
projects
orsources
). - A pagination object with the following properties:
count
:number
- The number of results returned.total
:number
- The total number of results matching the filter criteria.offset
:number
- The offset value used in the request.limit
:number
- The limit value used in the request.
Endpoints
Projects
-
Create Project
await admin.projects.ref({ ...params }).save()
- Method:
POST
- Path:
/projects
- Method:
-
List Projects
const { projects } = await admin.projects.list({ ...params })
- Method:
GET
- Path:
/projects
- Filterable Fields: Reference
- Pagination Parameters:
limit
,offset
,sort
- Method:
-
Delete Project
await admin.projects.delete(projectRef) // or await admin.projects.delete({ id: '[project-id]' })
- Method:
DELETE
- Path:
/projects/{projectId}
- Method:
Project API Keys
-
Create API Key
await projectRef.apiKeys.ref({ ...params }).save()
- Method:
POST
- Path:
/projects/{projectId}/api-keys
- Method:
-
Delete API Key
await projectRef.apiKeys.delete({ id: '[api-key-id]' })
- Method:
DELETE
- Path:
/projects/{projectId}/api-keys/{apiKeyId}
- Method:
Project Webhooks
-
Create Webhook
await project.webhooks.ref({ ...params }).save()
- Method:
POST
- Path:
/projects/{projectId}/webhooks
- Method:
-
List Webhooks
const { webhooks } = await project.webhooks.list({ ...params })
- Method:
GET
- Path:
/projects/{projectId}/webhooks
- Filterable Fields: Reference
- Method:
-
Delete Webhook
await project.webhooks.delete({ id: '[webhook-id]' })
- Method:
DELETE
- Path:
/projects/{projectId}/webhooks/{webhookId}
- Method:
Project Sources
-
Create Source
import { SourceTypes } from 'unbody/admin' await project.sources .ref({ name: 'New Source', type: SourceTypes.PushApi, }) .save()
- Method:
POST
- Path:
/projects/{projectId}/sources
- Method:
-
List Sources
const { sources } = await project.sources.list({ ...params })
- Method:
GET
- Path:
/projects/{projectId}/sources
- Filterable Fields: Reference
- Method:
-
Delete Source
await project.sources.delete({ id: '[source-id]' })
- Method:
DELETE
- Path:
/projects/{projectId}/sources/{sourceId}
- Method:
-
Update Source
await project.sources.update(sourceRef) // or await project.sources.update({ id: '[source-id]' })
- Method:
PATCH
- Path:
/projects/{projectId}/sources/{sourceId}
- Method:
-
Initialize Source
await project.sources.initialize(sourceRef) // or await project.sources.initialize({ id: '[source-id]' })
- Method:
POST
- Path:
/projects/{projectId}/sources/{sourceId}/indexing/initialize
- Method:
-
Rebuild Source
await project.sources.rebuild(sourceRef) // or await project.sources.rebuild({ id: '[source-id]' })
- Method:
POST
- Path:
/projects/{projectId}/sources/{sourceId}/indexing/rebuild
- Method:
Admin API Keys
-
Create Admin Key
await admin.adminKeys.ref({ ...params }).save()
- Method:
POST
- Path:
/api-keys
- Method:
-
Delete Admin Key
await admin.adminKeys.delete({ id: '[admin-key-id]' })
- Method:
DELETE
- Path:
/api-keys/{apiKeyId}
- Method:
Filterable Fields
Project Filterable Fields
Field | Type | Filter Operators |
---|---|---|
name | string | String Filter Operators |
state | enum | Enum Filter Operators |
pausedAt | Date | Date Filter Operators |
restoredAt | Date | Date Filter Operators |
createdAt | Date | Date Filter Operators |
updatedAt | Date | Date Filter Operators |
Project State Possible Values
To use these states in your code, import the ProjectStates
enum:
import { ProjectStates } from 'unbody/admin'
These are the possible states a project can be in, along with their corresponding TypeScript enumeration values:
State | TypeScript Enum |
---|---|
created | ProjectStates.Created |
initializing | ProjectStates.Initializing |
initialized | ProjectStates.Initialized |
paused | ProjectStates.Paused |
pausing | ProjectStates.Pausing |
restoring | ProjectStates.Restoring |
Source Filterable Fields
Field | Type | Filter Operators |
---|---|---|
name | string | String Filter Operators |
initialized | boolean | Boolean Filter Operators |
state | enum | Enum Filter Operators |
type | enum | Enum Filter Operators |
pausedAt | Date | Date Filter Operators |
restoredAt | Date | Date Filter Operators |
createdAt | Date | Date Filter Operators |
updatedAt | Date | Date Filter Operators |
Source State Possible Values
To use these states in your code, import the SourceStates
enum:
import { SourceStates } from 'unbody/admin'
These are the possible states a source can be in, along with their corresponding TypeScript enumeration values:
State | TypeScript Enum |
---|---|
initializing | SourceStates.Initializing |
updating | SourceStates.Updating |
deleting | SourceStates.Deleting |
paused | SourceStates.Paused |
idle | SourceStates.Idle |
Source Type Possible Values
To use these types in your code, import the SourceTypes
enum:
import { SourceTypes } from 'unbody/admin'
These are the possible source types, along with their corresponding TypeScript enumeration values:
Type | TypeScript Enum |
---|---|
discord | SourceTypes.Discord |
github_branch | SourceTypes.GithubBranch |
github_issues | SourceTypes.GithubIssues |
google_calendar | SourceTypes.GoogleCalendar |
google_drive | SourceTypes.GoogleDrive |
push_api | SourceTypes.PushApi |
Webhook Filterable Fields
Field | Type | Filter Operators |
---|---|---|
createdAt | Date | Date Filter Operators |
Filter Operators
String and Enum Filter Operators
$eq
,$ne
,$in
,$nin
,$null
,$like
Number Filter Operators
$eq
,$ne
,$gt
,$lt
,$gte
,$lte
,$null
Boolean Filter Operators
$eq
,$ne
,$null
Date Filter Operators
$gt
,$gte
,$lt
,$lte
,$null
Project Settings Guide
The ProjectSettings
class serves as a factory for creating and managing project settings. It provides methods to define and retrieve specific settings, which can then be used to configure a project’s behavior.
Using .set
Method
Use the .set
method to define a project setting. For example:
import { ProjectSettings, TextVectorizer } from 'unbody/admin'
const settings = new ProjectSettings()
settings.set(new TextVectorizer(TextVectorizer.Transformers.Default))
Using .get
Method
Use the .get
method to retrieve a specific setting. For example:
const textVectorizer = settings.get(TextVectorizer) // returns an instance of TextVectorizer
TextVectorizer
Configuration
The TextVectorizer
defines the text vectorization model used for representing text as vectors, which is essential for search operations.
Usage Example
To configure a project with the default TextVectorizer
:
import { TextVectorizer } from 'unbody/admin'
const textVectorizer = new TextVectorizer(TextVectorizer.Transformers.Default)
console.log(textVectorizer.name) // "text2vec-transformers"
Available Models
The table below lists the available text vectorization models, their providers, and the corresponding details:
TypeScript Type | Name | Provider | Model |
---|---|---|---|
TextVectorizer.OpenAI.Ada002 | text2vec-openai-ada-002 | OpenAI | OpenAI Ada 002 |
TextVectorizer.OpenAI.TextEmbedding3Large | text2vec-openai-text-embedding-3-large | OpenAI | OpenAI Text Embedding 3 Large |
TextVectorizer.OpenAI.TextEmbedding3Small | text2vec-openai-text-embedding-3-small | OpenAI | OpenAI Text Embedding 3 Small |
TextVectorizer.Cohere.MultilingualV3 | text2vec-cohere-multilingual-v3.0 | Cohere | Cohere Multilingual V3 |
TextVectorizer.Cohere.MultilingualLightV3 | text2vec-cohere-multilingual-light-v3.0 | Cohere | Cohere Multilingual Light V3 |
TextVectorizer.Cohere.EnglishV3 | text2vec-cohere-english-v3.0 | Cohere | Cohere English V3 |
TextVectorizer.Cohere.EnglishLightV3 | text2vec-cohere-english-light-v3.0 | Cohere | Cohere English Light V3 |
TextVectorizer.Cohere.EnglishV2 | text2vec-cohere-english-v2.0 | Cohere | Cohere English V2 |
TextVectorizer.Cohere.EnglishLightV2 | text2vec-cohere-english-light-v2.0 | Cohere | Cohere English Light V2 |
TextVectorizer.Contextionary.Default | text2vec-contextionary | Unbody | Contextionary |
TextVectorizer.Transformers.Default | text2vec-transformers | Unbody | Transformers Default |
ImageVectorizer
Configuration
The ImageVectorizer
sets the model used for representing images as vectors.
Usage Example
To configure a project with the ImageVectorizer
:
import { ImageVectorizer } from 'unbody/admin'
const imageVectorizer = new ImageVectorizer(
ImageVectorizer.Img2VecNeural.Default,
)
console.log(imageVectorizer.name) // "img2vec-neural"
Available Models
The table below lists the available image vectorization model:
TypeScript Type | Name | Provider | Model |
---|---|---|---|
ImageVectorizer.Img2VecNeural.Default | img2vec-neural | Unbody | Img2Vec Neural |
QnA
Configuration
The QnA
configuration defines the provider used for Q&A capabilities within the project.
Usage Example
To configure a project with the QnA
provider:
import { QnA } from 'unbody/admin'
const qna = new QnA(QnA.Transformers.Default)
console.log(qna.name) // "qna-transformers"
Available Providers
The table below lists the available QnA providers:
TypeScript Type | Name | Provider | Model |
---|---|---|---|
QnA.Transformers.Default | qna-transformers | Unbody | Transformers Default |
QnA.OpenAI.GPT3_5TurboInstruct | qna-openai-gpt-3.5-turbo-instruct | OpenAI | OpenAI GPT-3.5 Turbo Instruct |
Generative
Configuration
The Generative
configuration enables RAG (Retrieval-Augmented Generation) capabilities within the project.
Usage Example
To configure a project with the Generative
provider:
import { Generative } from 'unbody/admin'
const generative = new Generative(Generative.OpenAI.GPT4o)
console.log(generative.options.model) // "gpt-4o"
Available Models
The table below lists the available models for Generative
configuration:
TypeScript Type | Name | Provider | Model |
---|---|---|---|
Generative.OpenAI.GPT3_5Turbo | gpt-3.5-turbo | OpenAI | GPT-3.5 Turbo |
Generative.OpenAI.GPT4 | gpt-4 | OpenAI | GPT-4 |
Generative.OpenAI.GPT4Turbo | gpt-4-turbo | OpenAI | GPT-4 Turbo |
Generative.OpenAI.GPT4o | gpt-4o | OpenAI | GPT-4o |
Generative.OpenAI.GPT4oMini | gpt-4o-mini | OpenAI | GPT-4o Mini |
Generative.Cohere.Command | command | Cohere | Command |
Generative.Cohere.CommandLight | command-light | Cohere | Command Light |
Generative.Cohere.CommandR | command-r | Cohere | Command R |
Generative.Cohere.CommandRPlus | command-r-plus | Cohere | Command R Plus |
Generative.Mistral.OpenMistral7b | open-mistral-7b | Mistral | Open Mistral 7b |
Generative.Mistral.OpenMixtral8x7b | open-mixtral-8x7b | Mistral | Open Mixtral 8x7b |
Reranker
Configuration
The Reranker
specifies the reranking model used to prioritize search results, ensuring that the most relevant results appear at the top.
Usage Example
To configure a reranker for a project:
import { Reranker } from 'unbody/admin'
const reranker = new Reranker(Reranker.Transformers.Default)
console.log(reranker.name) // "reranker-transformers"
Available Models
The table below lists the available reranking models:
TypeScript Type | Name | Provider | Model |
---|---|---|---|
Reranker.Cohere.MultilingualV3 | reranker-cohere-multilingual-v3.0 | Cohere | Multilingual V3 |
Reranker.Cohere.MultilingualV2 | reranker-cohere-multilingual-v2.0 | Cohere | Multilingual V2 |
Reranker.Cohere.EnglishV3 | reranker-cohere-english-v3.0 | Cohere | English V3 |
Reranker.Cohere.EnglishV2 | reranker-cohere-english-v2.0 | Cohere | English V2 |
Reranker.Transformers.Default | reranker-transformers | Unbody | Transformers Default |
Spellcheck
Configuration
The Spellcheck
configuration specifies the model used for text spellchecking within the project.
Usage Example
To configure spellchecking for a project:
import { Spellcheck } from 'unbody/admin'
const spellcheck = new Spellcheck(Spellcheck.TextSpellcheck.Default)
console.log(spellcheck.name) // "text-spellcheck"
Available Models
The table below lists the available spellchecking models:
TypeScript Type | Name | Provider | Model |
---|---|---|---|
Spellcheck.TextSpellcheck.Default | text-spellcheck | Unbody | Text Spellcheck Default |
AutoSummary
Configuration
Automatically generates summaries for text content.
Usage Example
import { AutoSummary } from 'unbody/admin'
const autoSummary = new AutoSummary(AutoSummary.OpenAI.GPT4o)
console.log(autoSummary.name) // "autosum-openai-gpt-4o"
Available Models
TypeScript Type | Name | Model Provider | Model |
---|---|---|---|
AutoSummary.OpenAI.GPT3_5Turbo | autosum-openai-gpt-3.5-turbo | OpenAI | GPT-3.5 Turbo |
AutoSummary.OpenAI.GPT4o | autosum-openai-gpt-4o | OpenAI | GPT-4o |
AutoSummary.OpenAI.GPT4oMini | autosum-openai-gpt-4o-mini | OpenAI | GPT-4o Mini |
AutoSummary.Cohere.CommandR | autosum-cohere-command-r | Cohere | Command R |
AutoKeywords
Configuration
Generates keywords automatically from text content to enhance search and categorization.
Usage Example
import { AutoKeywords } from 'unbody/admin'
const autoKeywords = new AutoKeywords(AutoKeywords.OpenAI.GPT4o)
console.log(autoKeywords.name) // "autokeywords-openai-gpt-4o"
Available Models
TypeScript Type | Name | Model Provider | Model |
---|---|---|---|
AutoKeywords.OpenAI.GPT3_5Turbo | autokeywords-openai-gpt-3.5-turbo | OpenAI | GPT-3.5 Turbo |
AutoKeywords.OpenAI.GPT4o | autokeywords-openai-gpt-4o | OpenAI | GPT-4o |
AutoKeywords.OpenAI.GPT4oMini | autokeywords-openai-gpt-4o-mini | OpenAI | GPT-4o Mini |
AutoEntities
Configuration
Automatically identifies and extracts named entities from text.
Usage Example
import { AutoEntities } from 'unbody/admin'
const autoEntities = new AutoEntities(AutoEntities.OpenAI.GPT4o)
console.log(autoEntities.name) // "autoentities-openai-gpt-4o"
Available Models
TypeScript Type | Name | Model Provider | Model |
---|---|---|---|
AutoEntities.OpenAI.GPT3_5Turbo | autoentities-openai-gpt-3.5-turbo | OpenAI | GPT-3.5 Turbo |
AutoEntities.OpenAI.GPT4o | autoentities-openai-gpt-4o | OpenAI | GPT-4o |
AutoEntities.OpenAI.GPT4oMini | autoentities-openai-gpt-4o-mini | OpenAI | GPT-4o Mini |
AutoTopics
Configuration
Automatically generates topics from the content.
Usage Example
import { AutoTopics } from 'unbody/admin'
const autoTopics = new AutoTopics(AutoTopics.OpenAI.GPT4o)
console.log(autoTopics.name) // "autotopics-openai-gpt-4o"
Available Models
TypeScript Type | Name | Model Provider | Model |
---|---|---|---|
AutoTopics.OpenAI.GPT3_5Turbo | autotopics-openai-gpt-3.5-turbo | OpenAI | GPT-3.5 Turbo |
AutoTopics.OpenAI.GPT4o | autotopics-openai-gpt-4o | OpenAI | GPT-4o |
AutoTopics.OpenAI.GPT4oMini | autotopics-openai-gpt-4o-mini | OpenAI | GPT-4o Mini |
AutoVision
Configuration
Generates captions, labels, and extracts texts from image files.
Usage Example
import { AutoVision } from 'unbody/admin'
const autoVision = new AutoVision(AutoVision.OpenAI.GPT4o)
console.log(autoVision.name) // "autovision-openai-gpt-4o"
Available Models
TypeScript Type | Name | Model Provider | Model |
---|---|---|---|
AutoVision.OpenAI.GPT4o | autovision-openai-gpt-4o | OpenAI | GPT-4o |
AutoVision.OpenAI.GPT4oMini | autovision-openai-gpt-4o-mini | OpenAI | GPT-4o Mini |
AutoVision.OpenAI.GPT4Turbo | autovision-openai-gpt-4-turbo | OpenAI | GPT-4 Turbo |
PdfParser
Configuration
The PdfParser
configuration enables parsing and extracting content from PDF files.
Usage Example
import { PdfParser } from 'unbody/admin'
settings.set(new PdfParser(PdfParser.NlmSherpa.Default))
console.log(settings.get(PdfParser).name) // "pdfparser-nlmsherpa"
Available Parsers
TypeScript Type | Name | Description |
---|---|---|
PdfParser.NlmSherpa.Default | pdfparser-nlmsherpa | NLM Sherpa; extracts text from PDF files. |
PdfParser.Pdf2Image.Default | pdfparser-pdf2image | pdf2image; converts PDF files to images. |
CustomSchema
Configuration
The CustomSchema
configuration allows defining custom data schemas for projects. It enables the creation of custom collections and fields, which can be used to store, organize, and extend data structures within a project. This section provides an updated guide with clear examples and detailed explanations.
Usage Example
Below is a comprehensive example that demonstrates how to define a CustomSchema
, add collections and fields, and extend built-in schemas:
import { CustomSchema, ProjectSettings } from 'unbody/admin'
// Create a new custom schema
const customSchema = new CustomSchema()
// Define a custom collection with a text field
const customCollection = new CustomSchema.Collection('CustomCollection')
customCollection.add(new CustomSchema.Field.Text('name', 'Name'))
// Extend a built-in schema (e.g., VideoFile) with an additional field
const videoFileCollection = new CustomSchema.Collection('VideoFile').add(
new CustomSchema.Field.Text(
'xChapters', // Field name
'Extracted Chapters', // Field description
true, // Indicates the field is an array
'word', // Tokenization strategy
),
)
// Add the custom collection and set the schema
customSchema.add(customCollection).extend(videoFileCollection)
// Apply the schema to project settings
const settings = new ProjectSettings()
settings.set(customSchema)
Field Definition
When defining fields, the following arguments are used:
name: string
– Field name in camelCase format.description: string
– Description of the field.array?: boolean
– Indicates whether the field is an array. Defaults tofalse
.tokenization?: "field" | "word" | "lowercase" | "whitespace"
– Tokenization strategy for text fields.skipVectorization?: boolean
– Iftrue
, skips vectorization for the field.
Supported Field Types
The following field types are available for use:
Field Type | Description |
---|---|
Field.Int | Represents an integer field. |
Field.Number | Represents a numerical field. |
Field.Text | Represents a text field, with optional tokenization and vectorization. |
Field.UUID | Represents a universally unique identifier field. |
Field.Date | Represents a date field. |
Field.Boolean | Represents a boolean field. |
Field.Object | Represents a nested object field. |
Field.PhoneNumber | Represents a phone number field with advanced formatting. |
Field.GeoCoordinates | Represents a geographical coordinates field (latitude and longitude). |
Field.Cref | Represents a cross-reference field pointing to another collection. |
Examples of Advanced Field Definitions
Cref
Field Example
// Define a cref field referencing the VideoFile collection's document field
const crefField = new CustomSchema.Field.Cref(
'videos',
'References Video Files',
)
crefField.add('VideoFile', 'document')
Object
Field Example
// Define a complex object field for an address
const objectField = new CustomSchema.Field.Object('address', 'Address')
objectField
.add(new CustomSchema.Field.Text('street', 'Street'))
.add(new CustomSchema.Field.Text('city', 'City'))
.add(new CustomSchema.Field.Text('state', 'State'))
.add(new CustomSchema.Field.Text('zip', 'Zip'))
Notes and Important Details
- Naming Conventions:
- Custom collection names must follow PascalCase and end with
Collection
(e.g.,CustomCollection
). - Field names must be in camelCase format.
- Built-in Collections:
- You can extend the following built-in collections with custom fields:
ImageBlock
AudioFile
VideoFile
TextDocument
SpreadsheetDocument
SubtitleFile
- Extension Rules:
- Extended field names must start with
x
and use camelCase format (e.g.,xCustomField
).
- Field Specific Notes:
Text
Field:- Supports tokenization strategies:
word
,field
,lowercase
,whitespace
. - Can skip vectorization using the
skipVectorization
flag. PhoneNumber
Field:- Input/Output Schema:
{
"input": "020 1234567", // Required. Raw input in string format
"defaultCountry": "nl", // Required if only a national number is provided, ISO 3166-1 alpha-2 country code. Only set if explicitly set by the user.
"internationalFormatted": "+31 20 1234567", // Read-only string
"countryCode": 31, // Read-only unsigned integer, numerical country code
"national": 201234567, // Read-only unsigned integer, numerical representation of the national number
"nationalFormatted": "020 1234567", // Read-only string
"valid": true // Read-only boolean. Whether the parser recognized the phone number as valid
}
GeoCoordinates
Field:- Stores
latitude
andlongitude
as numbers. - Input/Output Schema:
{
"latitude": 52.366667,
"longitude": 4.9
}
Object
Field:- Can contain nested fields of types
int
,number
,text
,date
,boolean
, oruuid
. - Objects are not vectorized, and filter operators are unsupported.
Cref
Field TheCref
field enables creating cross-references between collections. It is crucial to understand the rules and requirements for using cross-references effectively: 1. Built-in Collections: You can reference only the following built-in collections, and exclusively theirdocument
field:
- `ImageBlock` → `document`
- `AudioFile` → `document`
- `VideoFile` → `document`
- `TextDocument` → `document`
These are the only built-in collections you can reference, and the `document` field is the only field available for cross-referencing in these collections.
2. **Custom Cross-References**:
When referencing custom collections, it is mandatory to define the cross-reference relationship in both collections. This means:
- If `CollectionA` has a `Cref` field pointing to `CollectionB`, then `CollectionB` must also have a `Cref` field explicitly defined to reference `CollectionA`.
Built-in collections are an exception to this rule—they do not require you to define reciprocal references when referencing their `document` field.
Enhancement
Configuration
The Enhancement
configuration allows you to define custom pipelines to enrich your data with generated or transformed data. While Unbody provides built-in automatic enhancement pipelines, such as AutoSummary
, AutoVision
, AutoKeywords
, and more, you can create custom pipelines tailored to your needs. These custom pipelines enable tasks like extracting or generating data using LLMs, crawling websites (coming soon), and customizing chunking strategies with different approaches.
Custom pipelines are defined to process records in specific collections and can include conditional logic, reusable variables, and multiple steps for complex workflows.
Defining a Custom Enhancement Pipeline
Here’s how you can define a custom pipeline:
import { Enhancement } from 'unbody/admin'
const enrichVideoPipeline = new Enhancement.Pipeline(
'enrich_video', // Unique pipeline name
'VideoFile', // Target collection
)
This example defines a custom enhancement pipeline that will be run for records in the VideoFile
collection.
Optional Parameters for a Pipeline
-
if
: A condition that determines whether the pipeline should run for a given record. It should returntrue
orfalse
. Iffalse
, the pipeline will not execute for that record. Example:if: (ctx) => ctx.source.type === 'google_drive'
-
vars
: A reusable map of variables that can be accessed in the pipeline’s steps. These variables are helpful for injecting dynamic instructions or configurations into the pipeline. Example:vars: { transcription: (ctx) => JSON.stringify(ctx.record.subtitles[0].entries), }
Pipeline Context (ctx
)
The ctx
object provides essential information about the current pipeline execution. Below is the structure of the context:
export type EnhancementContext = {
source: {
id: string
type: string
name: string
createdAt: Date
updatedAt: Date
}
project: {
id: string
name: string
createdAt: Date
updatedAt: Date
settings: Record<string, any>
}
vars: Record<string, any>
record: Record<string, any> // The record being enhanced
steps: Record<
string,
{
output: Record<string, any> // Output of the step
result: Record<string, any> // Result of the step
skipped?: boolean
failedAt?: Date
startedAt?: Date
finishedAt?: Date
}
>
result: Record<string, any> // Current step's result after execution
}
Key Context Components
record
: The record being enhanced. Includes all GraphQL record properties.steps
: A map of step names to theiroutput
,result
, and execution details (e.g., start and finish times).result
: The result of the current step after it has been executed.vars
: Custom variables defined in the pipeline.source
andproject
: Metadata about the source and project associated with the pipeline.
Steps in a Pipeline
Each pipeline must include at least one step. A step is defined by:
- A unique name within the pipeline.
- An action specifying the operation to perform.
- An
output
map that defines how the results will be stored. - Optional parameters such as
if
andonFailure
for advanced control.
Optional Parameters for a Step
if
: A condition that determines whether the step should run for a given record. It should returntrue
orfalse
. Iffalse
, the step will not execute. Example:if: (ctx) => ctx.vars.transcription.length > 0
onFailure
: Specifies the behavior when the step fails. Possible values:"continue"
: Ignore errors and proceed to the next step."stop"
: Stop the pipeline execution on failure.
Step Example
const action = new Enhancement.Action.StructuredGenerator({
model: 'openai-gpt-4o', // Action model
prompt: (ctx) =>
`Extract chapters from the video's transcription:\\\\n ${ctx.vars.transcription}`,
schema: (ctx, { z }) =>
z
.object({
chapters: z.array(
z.object({
start: z.string().describe('Start time of the chapter'),
end: z.string().describe('End time of the chapter'),
title: z.string().describe('Title of the chapter'),
}),
),
})
.describe('Extracted info'),
})
const extractChapters = new Enhancement.Step('extract_chapters', action, {
output: {
xChapters: (ctx) => ctx.result.json.chapters || [],
},
if: (ctx) => ctx.vars.transcription.length > 0,
onFailure: 'continue',
})
enrichVideoPipeline.add(extractChapters)
Explanation of the Example
- Action: Specifies the operation to perform using a model (e.g.,
openai-gpt-4o
) and includes parameters like aprompt
and aschema
. - Step: Ties the action to the pipeline and specifies how the action’s result (
ctx.result
) is stored in theoutput
.
Setting the Enhancement in Project Settings
Once the pipeline is defined, it can be added to the enhancement configuration and applied to the project settings:
const enhancement = new Enhancement()
enhancement.add(enrichVideoPipeline)
settings.set(enhancement)
Available Actions
The Enhancement
configuration supports various built-in actions. Each action performs specific operations based on the inputs provided and generates a result. This section details the TextGenerator
action, its input parameters, and its result structure.
TextGenerator
The TextGenerator
action uses a language model to generate text based on the provided prompt. It accepts the following input parameters:
Parameter | Type | Required | Description |
---|---|---|---|
model | string | Function | Yes | |
prompt | string | Function | Yes | |
temperature | number | Function | No | |
maxTokens | number | Function | No | |
topP | number | Function | No | |
frequencyPenalty | number | Function | No | |
presencePenalty | number | Function | No |
For all parameters, you can provide a literal value (e.g., 'openai-gpt-4'
) or a function that dynamically determines the value based on the pipeline context (ctx
).
Result Structure
The result object generated by the TextGenerator
action contains the following property:
Property | Type | Description |
---|---|---|
content | string | The text generated by the language model. |
Example: Summarizing Video Transcriptions
Below is an example of using the TextGenerator
action to summarize video transcriptions:
const summarizeVideo = new Enhancement.Step(
'summarize_video',
new Enhancement.Action.TextGenerator({
model: (ctx) => ctx.vars.preferredModel || 'openai-gpt-4', // Dynamically choosing the model
prompt: (ctx) =>
`Summarize the video transcription:\\\\n${JSON.stringify(
ctx.record.subtitles?.[0]?.entries || [],
)}`,
temperature: 0.7,
}),
{
output: {
autoSummary: (ctx) => ctx.result.content, // Storing the generated summary
},
},
)
StructuredGenerator
The StructuredGenerator
action generates structured JSON data based on a prompt and schema.
Input Parameters
Parameter | Type | Required | Description |
---|---|---|---|
model | string | Function | Yes | |
prompt | string | Function | Yes | |
schema | T | Function | Yes | |
temperature | number | Function | No | |
maxTokens | number | Function | No | |
topP | number | Function | No | |
frequencyPenalty | number | Function | No | |
presencePenalty | number | Function | No | |
images | { url: string }[] | Function | No |
For all parameters, you can provide a literal value (e.g., 'openai-gpt-4o'
) or a function that dynamically determines the value based on the pipeline context (ctx
) or helpers.
Result Structure
The result object generated by the StructuredGenerator
action contains the following property:
Property | Type | Description |
---|---|---|
json | Record<string, any> | The structured JSON data produced by the action. |
Helpers
The z
(Zod) helper library is included to define and validate the schema for the output. Zod is a validation library that ensures the generated JSON matches the expected structure.
Example: Extracting Chapters from a Video Transcription
const extractChapters = new Enhancement.Step(
'extract_chapters',
new Enhancement.Action.StructuredGenerator({
model: 'openai-gpt-4o', // Language model
prompt: (ctx) =>
`Extract chapters from the video's transcription:\\\\n ${JSON.stringify(
ctx.record.subtitles,
)}`,
schema: (ctx, { z }) =>
z
.object({
chapters: z.array(
z.object({
start: z.string().describe('Start time of the chapter'),
end: z.string().describe('End time of the chapter'),
title: z.string().describe('Title of the chapter'),
}),
),
})
.describe('Extracted chapter information'),
temperature: 0.7,
}),
{
output: {
xChapters: (ctx) => ctx.result.json.chapters || [],
},
},
)
Summarizer
The Summarizer
action is designed to summarize long texts using AI models. It employs a map-reduce method to efficiently handle large content by breaking it into smaller chunks and generating a cohesive summary.
Input Parameters
Parameter | Type | Required | Description |
---|---|---|---|
model | string | Function | Yes | |
text | string | Function | Yes | |
metadata | string | Function | No | |
prompt | string | Function | No |
Prompt Guidelines
When providing a custom prompt, ensure it includes the {text}
and {metadata}
placeholders, which will be dynamically replaced during execution:
{text}
: Represents the actual long content to be summarized.{metadata}
: Includes metadata for additional context. If not provided, this placeholder will be omitted.
Example Prompt:
Summarize the following text based on the provided metadata:\\\\n
Text: {text}\\\\n
Metadata: {metadata}
Output Structure
The result object generated by the Summarizer
action contains the following property:
Property | Type | Description |
---|---|---|
summary | string | The summarized text produced by the action. |
Example: Summarizing a Long Text
const summarizeText = new Enhancement.Step(
'summarize_text',
new Enhancement.Action.Summarizer({
model: 'openai-gpt-4o', // Language model
text: (ctx) => ctx.record.content, // Long text content
metadata: '- title: document title...', // Metadata for context
prompt: `Summarize the following content:\\\\nText: {text}\\\\nMetadata: {metadata}`, // Custom prompt
}),
{
output: {
autoSummary: (ctx) => ctx.result.summary, // Storing the summary
},
},
)