Interface LlamaCppEmbeddingsParams

Note that the modelPath is the only required parameter. For testing you can set this in the environment variable LLAMA_PATH.

Hierarchy

LlamaBaseCppInputs
EmbeddingsParams
- LlamaCppEmbeddingsParams

Properties

modelPath

modelPath: string

Path to the model on the filesystem.

`Optional` batchSize

batchSize?: number

Prompt processing batch size.

`Optional` contextSize

contextSize?: number

Text context size.

`Optional` embedding

embedding?: boolean

Embedding mode only.

`Optional` f16Kv

f16Kv?: boolean

Use fp16 for KV cache.

`Optional` gpuLayers

gpuLayers?: number

Number of layers to store in VRAM.

`Optional` logitsAll

logitsAll?: boolean

The llama_eval() call computes all logits, not just the last one.

`Optional` maxConcurrency

maxConcurrency?: number

The maximum number of concurrent calls that can be made. Defaults to Infinity, which means no limit.

`Optional` maxRetries

maxRetries?: number

The maximum number of retries that can be made for a single call, with an exponential backoff between each attempt. Defaults to 6.

`Optional` maxTokens

maxTokens?: number

`Optional` onFailedAttempt

onFailedAttempt?: FailedAttemptHandler

Custom handler to handle failed attempts. Takes the originally thrown error object as input, and should itself throw an error if the input error is not retryable.

`Optional` prependBos

prependBos?: boolean

Add the begining of sentence token.

`Optional` seed

seed?: null | number

If null, a random seed will be used.

`Optional` temperature

temperature?: number

The randomness of the responses, e.g. 0.1 deterministic, 1.5 creative, 0.8 balanced, 0 disables.

`Optional` threads

threads?: number

Number of threads to use to evaluate tokens.

`Optional` topK

topK?: number

Consider the n most likely tokens, where n is 1 to vocabulary size, 0 disables (uses full vocabulary). Note: only applies when temperature > 0.

`Optional` topP

topP?: number

Selects the smallest token set whose probability exceeds P, where P is between 0 - 1, 1 disables. Note: only applies when temperature > 0.

`Optional` trimWhitespaceSuffix

trimWhitespaceSuffix?: boolean

Trim whitespace from the end of the generated text Disabled by default.

`Optional` useMlock

useMlock?: boolean

Force system to keep model in RAM.

`Optional` useMmap

useMmap?: boolean

Use mmap if possible.

`Optional` vocabOnly

vocabOnly?: boolean

Only load the vocabulary, no weights.

Interface LlamaCppEmbeddingsParams

Hierarchy

Index

Properties

Properties

modelPath

`Optional` batchSize

`Optional` contextSize

`Optional` embedding

`Optional` f16Kv

`Optional` gpuLayers

`Optional` logitsAll

`Optional` maxConcurrency

`Optional` maxRetries

`Optional` maxTokens

`Optional` onFailedAttempt

`Optional` prependBos

`Optional` seed

`Optional` temperature

`Optional` threads

`Optional` topK

`Optional` topP

`Optional` trimWhitespaceSuffix

`Optional` useMlock

`Optional` useMmap

`Optional` vocabOnly

Settings

Member Visibility

Theme

On This Page

Interface LlamaCppEmbeddingsParams

Hierarchy

Index

Properties

Properties

modelPath

Optional batchSize

Optional contextSize

Optional embedding

Optional f16Kv

Optional gpuLayers

Optional logitsAll

Optional maxConcurrency

Optional maxRetries

Optional maxTokens

Optional onFailedAttempt

Optional prependBos

Optional seed

Optional temperature

Optional threads

Optional topK

Optional topP

Optional trimWhitespaceSuffix

Optional useMlock

Optional useMmap

Optional vocabOnly

Settings

Member Visibility

Theme

On This Page

`Optional` batchSize

`Optional` contextSize

`Optional` embedding

`Optional` f16Kv

`Optional` gpuLayers

`Optional` logitsAll

`Optional` maxConcurrency

`Optional` maxRetries

`Optional` maxTokens

`Optional` onFailedAttempt

`Optional` prependBos

`Optional` seed

`Optional` temperature

`Optional` threads

`Optional` topK

`Optional` topP

`Optional` trimWhitespaceSuffix

`Optional` useMlock

`Optional` useMmap

`Optional` vocabOnly