API Reference
ggml
This module is the core of the ggml-python library, it exposes a low-level ctypes-based interface for ggml.
Structures and functions in the ggml.ggml module map directly to the original ggml C library and
they operate at a fairly low level.
No additional runtime checks checks are performed nor is memory management handled automatically.
You've been warned :).
With that in mind here are some useful things to keep in mind
- While runtime checks are avoided for performance reasons, this module attempts to provide a type-safe interface by using Python's type annotations. Please report any issues you find.
- Functions accept both ctypes types (c_int, c_bool, c_float, etc.) and Python types (int, bool, float, etc.) as parameters.
- Functions return Python types for simple values (int, bool, float, etc.) and ctypes types for complex values (ggml_context_p, ggml_tensor_p, etc.).
- Memory management is the responsibility of the user. The user must call
ggml.ggml_freeon the context after callingggml.ggml_init. - Opaque pointers that are returned by ggml functions (e.g.
ggml.ggml_init) are returned as int's or None in Python. For some additional static type safety these pointers are wrapped in NewType definitions (e.g. ggml.ggml_context_p).
Example
import ggml
import ctypes
# Allocate a new context with 16 MB of memory
params = ggml.ggml_init_params(mem_size=16 * 1024 * 1024, mem_buffer=None)
ctx = ggml.ggml_init(params)
# Instantiate tensors
x = ggml.ggml_new_tensor_1d(ctx, ggml.GGML_TYPE_F32, 1)
a = ggml.ggml_new_tensor_1d(ctx, ggml.GGML_TYPE_F32, 1)
b = ggml.ggml_new_tensor_1d(ctx, ggml.GGML_TYPE_F32, 1)
# Use ggml operations to build a computational graph
x2 = ggml.ggml_mul(ctx, x, x)
f = ggml.ggml_add(ctx, ggml.ggml_mul(ctx, a, x2), b)
gf = ggml.ggml_new_graph(ctx)
ggml.ggml_build_forward_expand(gf, f)
# Set the input values
ggml.ggml_set_f32(x, 2.0)
ggml.ggml_set_f32(a, 3.0)
ggml.ggml_set_f32(b, 4.0)
# Compute the graph
ggml.ggml_graph_compute_with_ctx(ctx, gf, 1)
# Get the output value
output = ggml.ggml_get_f32_1d(f, 0)
assert output == 16.0
# Free the context
ggml.ggml_free(ctx)
ggml_context_p = NewType('ggml_context_p', int)
module-attribute
Opaque pointer to a ggml_context.
ggml_context structs are not accessed directly instead they must be created using
ggml_init and freed using ggml_free.
ggml_object
Bases: Structure
ggml object
Attributes:
-
offs(int) –offset
-
size(int) –size
-
next(pointer[ggml_object]) –pointer to next object
-
type(int) –ggml object type
-
padding(bytes) –padding
Source code in ggml/ggml.py
ggml_tensor
Bases: Structure
n-dimensional tensor
Attributes:
-
type(int) –ggml_type
-
buffer(pointer[ggml_backend_buffer]) –pointer to backend buffer
-
ne(Array[c_int64]) –number of elements in each dimension
-
nb(Array[c_size_t]) –stride in bytes for each dimension
-
op(int) –ggml operation
-
op_params(Array[c_int32]) –GGML_MAX_OP_PARAMS-length array of operation parameters -
flags(int) –tensor flags
-
src(Array[ggml_tensor_p]) –GGML_MAX_SRC-length array of source tensors -
view_src(ggml_tensor_p) –pointer to tensor if this tensor is a view, None if the tensor is not a view
-
view_offs(c_size_t) –offset into the data pointer of the view tensor
-
data(c_void_p) –reference to raw tensor data
-
name(bytes) –name of tensor
-
extra(c_void_p) –extra data (e.g. for CUDA)
Source code in ggml/ggml.py
ggml_tensor_p = 'ctypes._Pointer[ggml_tensor]'
module-attribute
ctypes pointer to a ggml_tensor
Can be dereferenced to a ggml_tensor object using
the .contents attribute.
ggml_cplan
Bases: Structure
Compute plan for a ggml computation graph
Attributes:
-
work_size(int) –size of work buffer
-
work_data(pointer[c_uint8]) –work buffer
-
n_threads(int) –number of threads
-
threadpool(c_void_p) –optional ggml_threadpool pointer
-
abort_callback(ggml_abort_callback) –abort callback
-
abort_callback_data(c_void_p) –abort callback data
-
use_ref(bool) –use only reference implementations
Source code in ggml/ggml.py
ggml_cplan_p = 'ctypes._Pointer[ggml_cplan]'
module-attribute
ctypes pointer to a ggml_cplan
Can be dereferenced to a ggml_cplan object using
the .contents attribute.
ggml_hash_set
Bases: Structure
ggml hash set
Attributes:
-
size(int) –size
-
used(pointer[c_uint32]) –bitset of used entries
-
keys(Array[POINTER(ggml_tensor)]) –array of tensor keys
Source code in ggml/ggml.py
ggml_cgraph
Bases: Structure
ggml computation graph
Attributes:
-
size(int) –size
-
n_nodes(int) –number of nodes
-
n_leafs(int) –number of leafs
-
nodes(Array[ggml_tensor_p]) –n_nodes-length array of compute tensors -
grads(Array[ggml_tensor_p]) –n_nodes-length array of gradient tensors -
grad_accs(Array[ggml_tensor_p]) –n_nodes-length array of gradient accumulators -
leafs(Array[ggml_tensor_p]) –n_leafs-length array of parameter tensors -
use_counts(Array[c_int32]) –tensor use counts indexed by hash slot
-
visited_hash_set(ggml_hash_set) –hash set of visited tensors
-
order(int) –evaluation order
-
uid(int) –optional graph identifier
Source code in ggml/ggml.py
ggml_cgraph_p = 'ctypes._Pointer[ggml_cgraph]'
module-attribute
ctypes pointer to a ggml_cgraph
Can be dereferenced to a ggml_cgraph object using
the .contents attribute.
ggml_scratch
Bases: Structure
Scratch memory for ggml
Attributes:
Source code in ggml/ggml.py
ggml_init_params
Bases: Structure
Initialization parameters for a ggml context
NOTE: Reference counting does not cross into ggml, if you allocate a memory buffer in python using ctypes Arrays or a numpy array, you must keep a reference to it until you free the ggml context otherwise you will encounter a segmentation fault.
Attributes:
-
mem_size(int) –size of memory pool in bytes
-
mem_buffer(c_void_p) –pointer to memory pool, if None, memory will be allocated internally
-
no_alloc(bool) –don't allocate memory for tensor data
Source code in ggml/ggml.py
ggml_compute_params
Bases: Structure
Compute parameters for ggml
Attributes:
-
type(int) –task type
-
ith(int) –thread index
-
nth(int) –number of threads
-
wsize(int) –work buffer size
-
wdata(c_void_p) –work buffer data
Source code in ggml/ggml.py
ggml_nelements(tensor)
Get the number of elements in a tensor
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
int–number of elements
Source code in ggml/ggml.py
ggml_nrows(tensor)
Get the number of rows in a tensor
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
int–number of rows
ggml_nbytes(tensor)
Get the number of bytes required to store tensor data
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
int–number of bytes
Source code in ggml/ggml.py
ggml_nbytes_pad(tensor)
Get the number of bytes required to store tensor data, padded to GGML_MEM_ALIGN
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
int–number of bytes
Source code in ggml/ggml.py
ggml_is_transposed(tensor)
Check if a tensor is transposed
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
bool–True if tensor is transposed else False
Source code in ggml/ggml.py
ggml_is_contiguous(tensor)
Check if a tensor is contiguous
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
bool–True if tensor is contiguous else False
Source code in ggml/ggml.py
ggml_is_contiguous_0(tensor)
Check if a tensor is contiguous (same as ggml_is_contiguous)
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
bool–True if tensor is contiguous else False
Source code in ggml/ggml.py
ggml_is_contiguous_1(tensor)
Check if a tensor is contiguous for dimensions >= 1
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
bool–True if tensor is contiguous for dims >= 1 else False
Source code in ggml/ggml.py
ggml_is_contiguous_2(tensor)
Check if a tensor is contiguous for dimensions >= 2
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
bool–True if tensor is contiguous for dims >= 2 else False
Source code in ggml/ggml.py
ggml_is_contiguously_allocated(tensor)
Check if a tensor is allocated as one contiguous block
ggml_is_contiguous_channels(tensor)
Check if a tensor is stored as contiguous channels
ggml_is_contiguous_rows(tensor)
ggml_is_permuted(tensor)
Check if a tensor is permuted
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
bool–True if tensor is permuted else False
Source code in ggml/ggml.py
ggml_is_view(tensor)
ggml_is_scalar(tensor)
ggml_is_vector(tensor)
ggml_is_matrix(tensor)
ggml_is_3d(tensor)
ggml_n_dims(tensor)
ggml_are_same_shape(t0, t1)
Check if two tensors have the same shape
Parameters:
-
t0(ggml_tensor_p) –tensor 0
-
t1(ggml_tensor_p) –tensor 1
Returns:
-
bool–True if tensors have the same shape else False
Source code in ggml/ggml.py
ggml_are_same_stride(t0, t1)
Check if two tensors have the same stride
Parameters:
-
t0(ggml_tensor_p) –tensor 0
-
t1(ggml_tensor_p) –tensor 1
Returns:
-
bool–True if tensors have the same stride else False
Source code in ggml/ggml.py
ggml_tensor_overhead()
ggml_init(params)
Instantiate a new ggml context with params.
You must call ggml_free() to free the context.
Parameters:
-
params(ggml_init_params) –ggml init params
Returns:
-
Optional[ggml_context_p]–Pointer to ggml_context or None if failed to initialize context.
Source code in ggml/ggml.py
ggml_reset(ctx)
ggml_free(ctx)
ggml_used_mem(ctx)
Return the amount of memory used by the ggml context in bytes.
Parameters:
-
ctx(ggml_context_p) –ggml context
Returns:
-
int–amount of memory used in bytes
Source code in ggml/ggml.py
ggml_set_scratch(ctx, scratch)
Set the scratch buffer for the ggml context.
Source code in ggml/ggml.py
ggml_get_no_alloc(ctx)
ggml_set_no_alloc(ctx, no_alloc)
Set the no_alloc flag for the ggml context.
ggml_get_mem_buffer(ctx)
ggml_get_mem_size(ctx)
Return the size of the memory buffer for the ggml context in bytes.
ggml_get_max_tensor_size(ctx)
ggml_new_tensor(ctx, type, n_dims, ne)
Create a new tensor with the given type, number of dimensions, and number of elements in each dimension.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
type(Union[c_int, int]) –ggml type
-
n_dims(Union[c_int, int]) –number of dimensions
-
ne(Array[c_int64]) –number of elements in each dimension (array of length n_dims)
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_new_tensor_1d(ctx, type, ne0)
Create a new 1-dimensional tensor with the given type and number of elements.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
type(Union[c_int, int]) –ggml type
-
ne0(Union[c_int64, int]) –number of elements in dimension 0
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_new_tensor_2d(ctx, type, ne0, ne1)
Create a new 2-dimensional tensor with the given type and number of elements in each dimension.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
type(Union[c_int, int]) –ggml type
-
ne0(Union[c_int64, int]) –number of elements in dimension 0
-
ne1(Union[c_int64, int]) –number of elements in dimension 1
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_new_tensor_3d(ctx, type, ne0, ne1, ne2)
Create a new 3-dimensional tensor with the given type and number of elements in each dimension.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
type(Union[c_int, int]) –ggml type
-
ne0(Union[c_int64, int]) –number of elements in dimension 0
-
ne1(Union[c_int64, int]) –number of elements in dimension 1
-
ne2(Union[c_int64, int]) –number of elements in dimension 2
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_new_tensor_4d(ctx, type, ne0, ne1, ne2, ne3)
Create a new 4-dimensional tensor with the given type and number of elements in each dimension.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
type(Union[c_int, int]) –ggml type
-
ne0(Union[c_int64, int]) –number of elements in dimension 0
-
ne1(Union[c_int64, int]) –number of elements in dimension 1
-
ne2(Union[c_int64, int]) –number of elements in dimension 2
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_new_i32(ctx, value)
Create a 1 element tensor with the given integer value.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
value(Union[c_int32, int]) –integer value
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_new_f32(ctx, value)
Create a 1 element tensor with the given float value.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
value(Union[c_float, float]) –float value
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_dup_tensor(ctx, src)
Create a new tensor with the same type and dimensions as the source tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
src(ggml_tensor_p) –source tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_view_tensor(ctx, src)
Create a new tensor with the same type, dimensions and data as the source tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
src(ggml_tensor_p) –source tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_get_first_tensor(ctx)
Get the first tensor from the ggml context.
Parameters:
-
ctx(ggml_context_p) –ggml context
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_get_next_tensor(ctx, tensor)
Get the next tensor from the ggml context.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
tensor(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_get_tensor(ctx, name)
Get a tensor from the ggml context by name.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
name(bytes) –name of tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_set_zero(tensor)
Zero all elements in a tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_set_i32(tensor, value)
Set all elements in a tensor to the given integer value.
Parameters:
-
tensor(ggml_tensor_p) –tensor
-
value(Union[c_int32, int]) –integer value
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_set_f32(tensor, value)
Set all elements in a tensor to the given float value.
Parameters:
-
tensor(ggml_tensor_p) –tensor
-
value(Union[c_float, float]) –float value
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_unravel_index(tensor, i, i0, i1, i2, i3)
Convert a flat index into coordinates.
Parameters:
-
tensor(ggml_tensor_p) –tensor
-
i(Union[c_int64, int]) –flat index
-
i0(CtypesPointer[c_int64]) –pointer to index 0
-
i1(CtypesPointer[c_int64]) –pointer to index 1
-
i2(CtypesPointer[c_int64]) –pointer to index 2
-
i3(CtypesPointer[c_int64]) –pointer to index 3
Source code in ggml/ggml.py
ggml_get_i32_1d(tensor, i)
Get the integer value of the i-th element in a 1-dimensional tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
-
i(Union[c_int, int]) –index of element
Returns:
-
int–integer value of element at index i
Source code in ggml/ggml.py
ggml_set_i32_1d(tensor, i, value)
Set the integer value of the i-th element in a 1-dimensional tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
-
i(Union[c_int, int]) –index of element
-
value(Union[c_int32, int]) –integer value to set element to
Source code in ggml/ggml.py
ggml_get_i32_nd(tensor, i0, i1, i2, i3)
Get the integer value of the element at the given coordinates in a 4-dimensional tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
-
i0(Union[c_int, int]) –index of element in dimension 0
-
i1(Union[c_int, int]) –index of element in dimension 1
-
i2(Union[c_int, int]) –index of element in dimension 2
-
i3(Union[c_int, int]) –index of element in dimension 3
Returns:
-
int–integer value of element at coordinates
Source code in ggml/ggml.py
ggml_set_i32_nd(tensor, i0, i1, i2, i3, value)
Set the integer value of the element at the given coordinates in a 4-dimensional tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
-
i0(Union[c_int, int]) –index of element in dimension 0
-
i1(Union[c_int, int]) –index of element in dimension 1
-
i2(Union[c_int, int]) –index of element in dimension 2
-
i3(Union[c_int, int]) –index of element in dimension 3
-
value(Union[c_int32, int]) –integer value to set element to
Source code in ggml/ggml.py
ggml_get_f32_1d(tensor, i)
Get the float value of the i-th element in a 1-dimensional tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
float–float value of element at index i
Source code in ggml/ggml.py
ggml_set_f32_1d(tensor, i, value)
Set the float value of the i-th element in a 1-dimensional tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
-
i(Union[c_int, int]) –index of element
-
value(Union[c_float, float]) –float value to set element to
Source code in ggml/ggml.py
ggml_get_f32_nd(tensor, i0, i1, i2, i3)
Get the float value of the element at the given coordinates in a 4-dimensional tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
-
i0(Union[c_int, int]) –index of element in dimension 0
-
i1(Union[c_int, int]) –index of element in dimension 1
-
i2(Union[c_int, int]) –index of element in dimension 2
-
i3(Union[c_int, int]) –index of element in dimension 3
Returns:
-
float–float value of element at coordinates
Source code in ggml/ggml.py
ggml_set_f32_nd(tensor, i0, i1, i2, i3, value)
Set the float value of the element at the given coordinates in a 4-dimensional tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
-
i0(Union[c_int, int]) –index of element in dimension 0
-
i1(Union[c_int, int]) –index of element in dimension 1
-
i2(Union[c_int, int]) –index of element in dimension 2
-
i3(Union[c_int, int]) –index of element in dimension 3
-
value(Union[c_float, float]) –float value to set element to
Source code in ggml/ggml.py
ggml_get_data(tensor)
Get the data pointer of a tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
Source code in ggml/ggml.py
ggml_get_data_f32(tensor)
Get the data pointer of a tensor as a float array.
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
Source code in ggml/ggml.py
ggml_get_unary_op(tensor)
Get the unary operation of a tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
int–unary operation
Source code in ggml/ggml.py
ggml_get_glu_op(tensor)
ggml_get_name(tensor)
Get the name of a tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
Returns:
-
bytes–name of tensor
ggml_set_name(tensor, name)
Set the name of a tensor.
Parameters:
-
tensor(ggml_tensor_p) –tensor
-
name(bytes) –name to set tensor to
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_format_name(tensor, fmt, /, *args)
Format the name of a tensor using the given format c string and arguments.
Parameters:
-
tensor(ggml_tensor_p) –tensor
-
fmt(bytes) –format c string
-
args(Sequence[Union[bool, int, float, str]], default:()) –arguments to format string
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_add(ctx, a, b)
Add two tensors together and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –first tensor
-
b(ggml_tensor_p) –second tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_add_inplace(ctx, a, b)
Add two tensors together and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –first tensor
-
b(ggml_tensor_p) –second tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_add_cast(ctx, a, b, type)
Add two tensors together and cast the result to the given type.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –first tensor
-
b(ggml_tensor_p) –second tensor
-
type(Union[c_int, int]) –type to cast result to
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sub(ctx, a, b)
Subtract two tensors and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –first tensor
-
b(ggml_tensor_p) –second tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sub_inplace(ctx, a, b)
Subtract two tensors and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –first tensor
-
b(ggml_tensor_p) –second tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_mul(ctx, a, b)
Element-wise multiply two tensors and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –first tensor
-
b(ggml_tensor_p) –second tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_mul_inplace(ctx, a, b)
Element-wise multiply two tensors and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –first tensor
-
b(ggml_tensor_p) –second tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_div(ctx, a, b)
Element-wise divide two tensors and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –first tensor
-
b(ggml_tensor_p) –second tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_div_inplace(ctx, a, b)
Element-wise divide two tensors and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –first tensor
-
b(ggml_tensor_p) –second tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sqr(ctx, a)
Square all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sqr_inplace(ctx, a)
Square all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sqrt(ctx, a)
Square root all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sqrt_inplace(ctx, a)
Square root all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_log(ctx, a)
Take the natural logarithm of all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_log_inplace(ctx, a)
Take the natural logarithm of all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_expm1(ctx, a)
Compute exp(a) - 1 for all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_expm1_inplace(ctx, a)
Compute exp(a) - 1 for all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_softplus(ctx, a)
Apply the softplus activation function to all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_softplus_inplace(ctx, a)
Apply the softplus activation function to all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sin(ctx, a)
Compute the sine of all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sin_inplace(ctx, a)
Compute the sine of all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_cos(ctx, a)
Compute the cosine of all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_cos_inplace(ctx, a)
Compute the cosine of all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sum(ctx, a)
Sum all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sum_rows(ctx, a)
Sum all elements in a tensor along the first axis and return the result.
sums along rows, with input shape [a,b,c,d] return shape [1,b,c,d]
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_mean(ctx, a)
Take the mean of all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_argmax(ctx, a)
Take the argmax of all elements in a tensor and return the result.
argmax along rows
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_repeat(ctx, a, b)
Repeat a tensor to fit the shape of another tensor.
If a is the same shape as b, and a is not parameter, return a
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor to repeat
-
b(ggml_tensor_p) –tensor to fit
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_concat(ctx, a, b, dim)
Concatenate two tensors along the second axis and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –first tensor
-
b(ggml_tensor_p) –second tensor
-
dim(int) –dimension to concatenate along
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_abs(ctx, a)
Take the absolute value of all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_abs_inplace(ctx, a)
Take the absolute value of all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sgn(ctx, a)
Get the sign of all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sgn_inplace(ctx, a)
Get the sign of all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_neg(ctx, a)
Negate all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_neg_inplace(ctx, a)
Negate all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_step_inplace(ctx, a)
Apply the step function to all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_tanh(ctx, a)
Apply the tanh activation function to all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_tanh_inplace(ctx, a)
Apply the tanh activation function to all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_elu(ctx, a)
Apply the ELU activation function to all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_elu_inplace(ctx, a)
Apply the ELU activation function to all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_relu(ctx, a)
Apply the ReLU activation function to all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_leaky_relu(ctx, a, negative_slope, inplace)
Apply the Leaky ReLU activation function to all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
negative_slope(float) –negative slope
-
inplace(bool) –whether to store the result in the first tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_relu_inplace(ctx, a)
Apply the ReLU activation function to all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sigmoid(ctx, a)
Apply the Sigmoid activation function to all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_sigmoid_inplace(ctx, a)
Apply the Sigmoid activation function to all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_gelu(ctx, a)
Apply the Gaussian Error Linear Unit activation function to all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_gelu_inplace(ctx, a)
Apply the Gaussian Error Linear Unit activation function to all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_gelu_erf(ctx, a)
Apply the exact GELU activation function to all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_gelu_erf_inplace(ctx, a)
Apply the exact GELU activation function to all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_gelu_quick(ctx, a)
Apply the Gaussian Error Linear Unit activation function to all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_gelu_quick_inplace(ctx, a)
Apply the Gaussian Error Linear Unit activation function to all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_silu(ctx, a)
Apply the Sigmoid Linear Unit activation function to all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_silu_inplace(ctx, a)
Apply the Sigmoid Linear Unit activation function to all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_hardswish(ctx, a)
Apply the Hardswish activation function to all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_hardsigmoid(ctx, a)
Apply the Hardsigmoid activation function to all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_exp(ctx, a)
Compute the exponential of all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_exp_inplace(ctx, a)
Compute the exponential of all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_floor(ctx, a)
Compute the floor of all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_floor_inplace(ctx, a)
Compute the floor of all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_ceil(ctx, a)
Compute the ceiling of all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_ceil_inplace(ctx, a)
Compute the ceiling of all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_round(ctx, a)
Round all elements in a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_round_inplace(ctx, a)
Round all elements in a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_trunc(ctx, a)
Truncate all elements in a tensor toward zero and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_trunc_inplace(ctx, a)
Truncate all elements in a tensor toward zero and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_norm(ctx, a, eps)
Normalize all elements in a tensor along the first axis and return the result.
normalize along rows.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
eps(Union[c_float, float]) –minimum value to avoid division by zero
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_norm_inplace(ctx, a, eps)
Normalize all elements in a tensor along the first axis and store the result in the first tensor.
normalize along rows.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
eps(Union[c_float, float]) –minimum value to avoid division by zero
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_rms_norm(ctx, a, eps)
Compute the RMS norm of a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
eps(Union[c_float, float]) –float
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_group_norm(ctx, a, n_groups, eps)
Group normalize a tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
n_groups(Union[c_int, int]) –int
-
eps(Union[c_float, float]) –minimum value to avoid division by zero
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_group_norm_inplace(ctx, a, n_groups, eps)
Group normalize a tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
n_groups(Union[c_int, int]) –int
-
eps(Union[c_float, float]) –minimum value to avoid division by zero
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_l2_norm(ctx, a, eps)
L2 normalize a tensor along rows and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
eps(Union[c_float, float]) –minimum value to avoid division by zero
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_l2_norm_inplace(ctx, a, eps)
L2 normalize a tensor along rows and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
eps(Union[c_float, float]) –minimum value to avoid division by zero
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_mul_mat(ctx, a, b)
Multiply two matrices and return the result.
A: k columns, n rows => [ne03, ne02, n, k] B: k columns, m rows (i.e. we transpose it internally) => [ne03 * x, ne02 * y, m, k] result is n columns, m rows => [ne03 * x, ne02 * y, m, n]
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
b(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_mul_mat_set_prec(a, prec)
Change the precision of a matrix multiplication.
set to GGML_PREC_F32 for higher precision (useful for phi-2)
Parameters:
-
a(ggml_tensor_p) –tensor
-
prec(Union[c_int, int]) –precision
Source code in ggml/ggml.py
ggml_mul_mat_id(ctx, as_, b, ids)
Multiply two matrices indirectly and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
as_(ggml_tensor_p) –tensor
-
b(ggml_tensor_p) –tensor
-
ids(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_out_prod(ctx, a, b)
Compute the outer product of two matrices and return the result.
A: m columns, n rows, B: p columns, n rows, result is m columns, p rows
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
b(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_scale(ctx, a, s)
Scale a tensor by another tensor and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
s(Union[c_float, float]) –float
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_scale_inplace(ctx, a, s)
Scale a tensor by another tensor and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
s(Union[c_float, float]) –float
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_cont(ctx, a)
Make a tensor contiguous and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_transpose(ctx, a)
Transpose the first two dimensions of a tensor and return the result.
alias for ggml_permute(ctx, a, 1, 0, 2, 3)
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_rope(ctx, a, b, n_dims, mode)
Rotary position embedding
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
b(ggml_tensor_p) –int32 vector with size a->ne[2], it contains the positions
-
n_dims(Union[c_int, int]) –number of dimensions
-
mode(Union[c_int, int]) –if mode & 1 == 1, skip n_past elements (DEPRECATED) if mode & 2 == 1, GPT-NeoX style if mode & 4 == 1, ChatGLM style
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_rope_inplace(ctx, a, b, n_dims, mode)
Rotary position embedding inplace
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
b(ggml_tensor_p) –int32 vector with size a->ne[2], it contains the positions
-
n_dims(Union[c_int, int]) –number of dimensions
-
mode(Union[c_int, int]) –if mode & 1 == 1, skip n_past elements (DEPRECATED) if mode & 2 == 1, GPT-NeoX style if mode & 4 == 1, ChatGLM style
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_rope_custom(ctx, a, b, n_dims, mode, n_ctx_orig, freq_base, freq_scale, ext_factor, attn_factor, beta_fast, beta_slow)
Custom rotary position embedding
Source code in ggml/ggml.py
ggml_rope_custom_inplace(ctx, a, b, n_dims, mode, n_ctx_orig, freq_base, freq_scale, ext_factor, attn_factor, beta_fast, beta_slow)
Custom rotary position embedding inplace
Source code in ggml/ggml.py
ggml_rope_yarn_corr_dims(n_dims, n_orig_ctx, freq_base, beta_fast, beta_slow, dims)
Compute correction dims for YaRN RoPE scaling
Source code in ggml/ggml.py
ggml_rope_back(ctx, a, b, c, n_dims, mode, n_ctx, n_orig_ctx, freq_base, freq_scale, ext_factor, attn_factor, beta_fast, beta_slow, xpos_base, xpos_down)
Rotary position embedding backward pass
Source code in ggml/ggml.py
ggml_clamp(ctx, a, min, max)
Clamp tensor values between min and max
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
min(Union[c_float, float]) –minimum value
-
max(Union[c_float, float]) –maximum value
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_conv_1d(ctx, a, b, s0, p0, d0)
Convolution 1D
Parameters:
-
a(ggml_tensor_p) –input tensor
-
b(ggml_tensor_p) –filter tensor
-
s0(Union[c_int, int]) –stride
-
p0(Union[c_int, int]) –padding
-
d0(Union[c_int, int]) –dilation
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_conv_1d_ph(ctx, a, b, s, d)
Convolution 1D with padding = half
Parameters:
-
a(ggml_tensor_p) –input tensor
-
b(ggml_tensor_p) –filter tensor
-
s(Union[c_int, int]) –stride
-
d(Union[c_int, int]) –dilation
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_conv_transpose_1d(ctx, a, b, s0, p0, d0)
Convolution transpose 1D
Parameters:
-
a(ggml_tensor_p) –input tensor
-
b(ggml_tensor_p) –filter tensor
-
s0(Union[c_int, int]) –stride
-
p0(Union[c_int, int]) –padding
-
d0(Union[c_int, int]) –dilation
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_conv_2d(ctx, a, b, s0, s1, p0, p1, d0, d1)
Convolution 2D
Parameters:
-
a(ggml_tensor_p) –input tensor
-
b(ggml_tensor_p) –filter tensor
-
s0(Union[c_int, int]) –stride
-
s1(Union[c_int, int]) –stride
-
p0(Union[c_int, int]) –padding
-
p1(Union[c_int, int]) –padding
-
d0(Union[c_int, int]) –dilation
-
d1(Union[c_int, int]) –dilation
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_conv_2d_sk_p0(ctx, a, b)
Convolution 2D
Parameters:
-
a(ggml_tensor_p) –input tensor
-
b(ggml_tensor_p) –filter tensor
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_conv_2d_s1_ph(ctx, a, b)
Convolution 2D with stride = 1 and padding = half
Parameters:
-
a(ggml_tensor_p) –input tensor
-
b(ggml_tensor_p) –filter tensor
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_conv_transpose_2d_p0(ctx, a, b, stride)
Convolution Transpose 2D with padding = zero
Parameters:
-
a(ggml_tensor_p) –input tensor
-
b(ggml_tensor_p) –filter tensor
-
stride(Union[c_int, int]) –stride
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_pool_1d(ctx, a, op, k0, s0, p0)
1D Pooling
Parameters:
-
a(ggml_tensor_p) –input tensor
-
op(Union[c_int, int]) –pooling operation
-
k0(Union[c_int, int]) –kernel size
-
s0(Union[c_int, int]) –stride
-
p0(Union[c_int, int]) –padding
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_pool_2d(ctx, a, op, k0, k1, s0, s1, p0, p1)
2D Pooling
Parameters:
-
a(ggml_tensor_p) –input tensor
-
op(Union[c_int, int]) –pooling operation
-
k0(Union[c_int, int]) –kernel size
-
k1(Union[c_int, int]) –kernel size
-
s0(Union[c_int, int]) –stride
-
s1(Union[c_int, int]) –stride
-
p0(Union[c_float, float]) –padding
-
p1(Union[c_float, float]) –padding
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_upscale(ctx, a, scale_factor, mode)
Upscale
Multiply ne0 and ne1 by scale factor
Parameters:
-
a(ggml_tensor_p) –input tensor
-
scale_factor(Union[c_int, int]) –scale factor
-
mode(Union[c_int, int]) –scale mode
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_upscale_ext(ctx, a, ne0, ne1, ne2, ne3, mode)
Upscale to specified dimensions
Parameters:
-
a(ggml_tensor_p) –input tensor
-
ne0(Union[c_int, int]) –dimension 0
-
ne1(Union[c_int, int]) –dimension 1
-
ne2(Union[c_int, int]) –dimension 2
-
ne3(Union[c_int, int]) –dimension 3
-
mode(Union[c_int, int]) –scale mode
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_pad(ctx, a, p0, p1, p2, p3)
Pad tensor with zeros
Parameters:
-
a(ggml_tensor_p) –input tensor
-
p0(Union[c_int, int]) –padding
-
p1(Union[c_int, int]) –padding
-
p2(Union[c_int, int]) –padding
-
p3(Union[c_int, int]) –padding
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_fill(ctx, a, c)
Fill a tensor with a constant and return the result.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
c(Union[c_float, float]) –fill value
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_fill_inplace(ctx, a, c)
Fill a tensor with a constant and store the result in the first tensor.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
a(ggml_tensor_p) –tensor
-
c(Union[c_float, float]) –fill value
Returns:
-
ggml_tensor_p–Pointer to ggml_tensor
Source code in ggml/ggml.py
ggml_timestep_embedding(ctx, timesteps, dim, max_period)
Timestep embedding
Parameters:
-
timesteps(ggml_tensor_p) –input tensor
-
dim(Union[c_int, int]) –embedding dimension
-
max_period(Union[c_int, int]) –maximum period
Source code in ggml/ggml.py
ggml_argsort(ctx, a, order)
Argsort
Parameters:
-
a(ggml_tensor_p) –input tensor
-
order(Union[c_int, int]) –sort order
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_arange(ctx, start, stop, step)
Arange
Parameters:
-
start(Union[c_float, float]) –start
-
stop(Union[c_float, float]) –stop
-
step(Union[c_float, float]) –step
Source code in ggml/ggml.py
ggml_top_k(ctx, a, k)
Top k elements per row
Parameters:
-
a(ggml_tensor_p) –input tensor
-
k(Union[c_int, int]) –number of elements
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_custom1_op_f32_t = ctypes.CFUNCTYPE(None, ctypes.POINTER(ggml_tensor), ctypes.POINTER(ggml_tensor))
module-attribute
Unary operator function type
ggml_custom2_op_f32_t = ctypes.CFUNCTYPE(None, ctypes.POINTER(ggml_tensor), ctypes.POINTER(ggml_tensor), ctypes.POINTER(ggml_tensor))
module-attribute
Binary operator function type
ggml_custom3_op_f32_t = ctypes.CFUNCTYPE(None, ctypes.POINTER(ggml_tensor), ctypes.POINTER(ggml_tensor), ctypes.POINTER(ggml_tensor), ctypes.POINTER(ggml_tensor))
module-attribute
Ternary operator function type
ggml_map_custom1_f32(ctx, a, fun)
Custom unary operator on a tensor.
Example
Parameters:
-
a(ggml_tensor_p) –input tensor
-
fun(ggml_custom1_op_f32_t) –function to apply to each element
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_map_custom1_inplace_f32(ctx, a, fun)
Custom unary operator on a tensor inplace.
Parameters:
-
a(ggml_tensor_p) –input tensor
-
fun(ggml_custom1_op_f32_t) –function to apply to each element
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_map_custom2_f32(ctx, a, b, fun)
Custom binary operator on two tensors.
Parameters:
-
a(ggml_tensor_p) –input tensor
-
b(ggml_tensor_p) –input tensor
-
fun(ggml_custom2_op_f32_t) –function to apply to each element
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_map_custom2_inplace_f32(ctx, a, b, fun)
Custom binary operator on two tensors inplace.
Parameters:
-
a(ggml_tensor_p) –input tensor
-
b(ggml_tensor_p) –input tensor
-
fun(ggml_custom2_op_f32_t) –function to apply to each element
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_map_custom3_f32(ctx, a, b, c, fun)
Custom ternary operator on three tensors.
Parameters:
-
a(ggml_tensor_p) –input tensor
-
b(ggml_tensor_p) –input tensor
-
c(ggml_tensor_p) –input tensor
-
fun(ggml_custom3_op_f32_t) –function to apply to each element
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_map_custom3_inplace_f32(ctx, a, b, c, fun)
Custom ternary operator on three tensors inplace.
Parameters:
-
a(ggml_tensor_p) –input tensor
-
b(ggml_tensor_p) –input tensor
-
c(ggml_tensor_p) –input tensor
-
fun(ggml_custom3_op_f32_t) –function to apply to each element
Returns:
-
ggml_tensor_p–output tensor
Source code in ggml/ggml.py
ggml_custom1_op_t = ctypes.CFUNCTYPE(None, ctypes.POINTER(ggml_tensor), ctypes.POINTER(ggml_tensor), ctypes.c_int, ctypes.c_int, ctypes.c_void_p)
module-attribute
Custom unary operator on a tensor.
ggml_custom2_op_t = ctypes.CFUNCTYPE(None, ctypes.POINTER(ggml_tensor), ctypes.POINTER(ggml_tensor), ctypes.POINTER(ggml_tensor), ctypes.c_int, ctypes.c_int, ctypes.c_void_p)
module-attribute
Custom binary operator on two tensors.
ggml_custom3_op_t = ctypes.CFUNCTYPE(None, ctypes.POINTER(ggml_tensor), ctypes.POINTER(ggml_tensor), ctypes.POINTER(ggml_tensor), ctypes.POINTER(ggml_tensor), ctypes.c_int, ctypes.c_int, ctypes.c_void_p)
module-attribute
Custom ternary operator on three tensors.
ggml_custom_op_t = ctypes.CFUNCTYPE(None, ctypes.POINTER(ggml_tensor), ctypes.c_int, ctypes.c_int, ctypes.c_void_p)
module-attribute
Custom operator on a tensor with variadic tensor inputs.
ggml_build_forward_expand(cgraph, tensor)
Add a tensor to the forward computation graph. This is used to compute and save the value of the tensor.
Parameters:
-
cgraph(ggml_cgraph_p) –The graph.
-
tensor(ggml_tensor_p) –The tensor.
Source code in ggml/ggml.py
ggml_build_backward_expand(*args)
Add backward pass nodes to a graph.
Parameters:
-
args(Any, default:()) –Either
(ctx, cgraph, grad_accs)or the legacy(ctx, gf, gb, keep)call shape.
Source code in ggml/ggml.py
ggml_new_graph(ctx)
Create a new graph.
Parameters:
-
ctx(ggml_context_p) –The context.
Returns:
-
ggml_cgraph_p–The graph.
Source code in ggml/ggml.py
ggml_new_graph_custom(ctx, size, grads)
Create a new graph with custom size and grads.
Parameters:
-
ctx(ggml_context_p) –The context.
-
size(Union[c_size_t, int]) –The size of the graph.
-
grads(Union[c_bool, bool]) –Whether to keep the gradients.
Returns:
-
ggml_cgraph_p–The graph.
Source code in ggml/ggml.py
ggml_graph_dup(ctx, cgraph, force_grads=False)
Duplicate a graph.
Parameters:
-
ctx(ggml_context_p) –The context.
-
cgraph(ggml_cgraph_p) –The graph.
-
force_grads(Union[c_bool, bool], default:False) –Whether to force allocation of graph gradients.
Returns:
-
ggml_cgraph_p–The graph.
Source code in ggml/ggml.py
ggml_graph_view(cgraph, i0, i1)
View a graph.
Parameters:
-
cgraph(ggml_cgraph_p) –The graph.
-
i0(Union[c_int, int]) –The start index.
-
i1(Union[c_int, int]) –The end index.
Returns:
-
ggml_cgraph–The graph.
Source code in ggml/ggml.py
ggml_graph_cpy(src, dst)
Copy a graph.
Parameters:
-
src(ggml_cgraph_p) –The source graph.
-
dst(ggml_cgraph_p) –The destination graph.
Source code in ggml/ggml.py
ggml_graph_reset(cgraph)
ggml_graph_clear(cgraph)
ggml_graph_overhead()
ggml_graph_plan(cgraph, n_threads=GGML_DEFAULT_N_THREADS, threadpool=None)
Plan the computation graph.
Parameters:
-
cgraph(ggml_cgraph_p) –The graph.
-
n_threads(Union[c_int, int], default:GGML_DEFAULT_N_THREADS) –The number of threads to use.
-
threadpool(Union[c_void_p, int, None], default:None) –Optional ggml_threadpool pointer.
Returns:
-
ggml_cplan–The plan.
Source code in ggml/ggml.py
ggml_graph_compute_with_ctx(ctx, cgraph, n_threads)
Compute the graph with a context.
Parameters:
-
ctx(ggml_context_p) –The context.
-
cgraph(ggml_cgraph_p) –The graph.
-
n_threads(Union[c_int, int]) –The number of threads to use.
Source code in ggml/ggml.py
ggml_graph_get_tensor(cgraph, name)
Get a tensor from the graph by name.
Parameters:
-
cgraph(ggml_cgraph_p) –The graph.
-
name(bytes) –The name of the tensor.
Returns:
-
ggml_tensor_p–The tensor.
Source code in ggml/ggml.py
gguf_init_params
Bases: Structure
Initialization parameters for gguf.
Attributes:
-
no_alloc(bool) –No allocation.
-
ctx(CtypesPointer[ggml_context_p]) –The context.
Source code in ggml/ggml.py
ggml_type_traits
Bases: Structure
Internal types and functions exposed for tests and benchmarks.
Attributes:
-
type_name(bytes)–Name of the type
-
blck_size(int)–Block size
-
blck_size_interleave(int)–Interleaved block size
-
type_size(int)–Size of the type
-
is_quantized(bool)–Is quantized
-
to_float(ggml_to_float_t)–Convert to float
-
from_float_ref(ggml_from_float_t)–Reference conversion from float
Source code in ggml/ggml.py
ggml_internal_get_type_traits(type)
Compatibility alias for the removed ggml_internal_get_type_traits API.
ggml_type_traits_cpu
Bases: Structure
CPU-specific conversion and dot-product functions.
Source code in ggml/ggml.py
ggml_tallocr
Bases: Structure
Tensor allocator
Attributes:
-
buffer(ggml_backend_buffer_t) –ggml_backend_buffer_t
-
base(c_void_p) –ctypes.c_void_p
-
alignment(int) –ctypes.c_size_t
-
offset(int) –ctypes.c_size_t
Source code in ggml/ggml.py
ggml_gallocr_reserve(galloc, graph)
pre-allocate buffers from a measure graph - does not allocate or modify the graph call with a worst-case graph to avoid buffer reallocations not strictly required for single buffer usage: ggml_gallocr_alloc_graph will reallocate the buffers automatically if needed returns false if the buffer allocation failed
Source code in ggml/ggml.py
ggml_gallocr_reserve_n_size(galloc, graph, node_buffer_ids, leaf_buffer_ids, sizes)
write the buffer sizes that would be allocated by ggml_gallocr_reserve_n
Source code in ggml/ggml.py
ggml_gallocr_alloc_graph(galloc, graph)
automatic reallocation if the topology changes when using a single buffer returns false if using multiple buffers and a re-allocation is needed (call ggml_gallocr_reserve_n first to set the node buffers)
Source code in ggml/ggml.py
ggml_backend_alloc_ctx_tensors_from_buft_size(ctx, buft)
Get the size of the buffer that would be allocated for all tensors in a context.
Source code in ggml/ggml.py
ggml_backend_alloc_ctx_tensors_from_buft(ctx, buft)
Create a buffer and allocate all the tensors in a ggml_context
Source code in ggml/ggml.py
ggml_backend_sched_reserve_size(sched, measure_graph, sizes)
Initialize backend buffers from a measure graph and write per-backend sizes.
Source code in ggml/ggml.py
ggml_backend_sched_reserve(sched, measure_graph)
Initialize backend buffers from a measure graph.
Source code in ggml/ggml.py
ggml_backend_sched_get_n_splits(sched)
Get the number of splits of the last graph.
Source code in ggml/ggml.py
ggml_backend_sched_graph_compute(sched, graph)
Allocate and compute graph on the backend scheduler.
Source code in ggml/ggml.py
ggml_backend_sched_reset(sched)
Reset all assignments and allocators - must be called before changing the node backends.
ggml_backend_graph_copy
Bases: Structure
Structure for ggml_backend_graph_copy.
Attributes:
-
buffer(ggml_backend_buffer_t) –ggml_backend_buffer_t
-
ctx_allocated(ggml_context_p) –ggml_context_p
-
ctx_unallocated(ggml_context_p) –ggml_context_p
-
graph(CtypesPointer[ggml_cgraph]) –ctypes.POINTER(ggml_cgraph)
Source code in ggml/ggml.py
ggml.utils
Utility functions for ggml-python.
to_numpy(tensor, shape=None)
Get the data of a ggml tensor as a numpy array.
Parameters:
-
tensor(ggml_tensor_p) –ggml tensor
Returns:
Source code in ggml/utils.py
from_numpy(x, ctx)
Create a new ggml tensor with data copied from a numpy array.
Parameters:
-
x(NDArray[Any]) –numpy array
-
ctx(ggml_context_p) –ggml context
Returns:
-
ggml_tensor_p–New ggml tensor with data copied from x
Source code in ggml/utils.py
copy_to_cpu(ctx, tensor)
Copy a ggml tensor from a GPU backend to CPU.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
tensor(ggml_tensor_p) –ggml tensor
Returns:
-
ggml_tensor_p–New ggml tensor with data copied from tensor on CPU backend
Source code in ggml/utils.py
quantize_0(data_f32, nelements, ne0, ttype, work=None, imatrix=None)
Quantize a float32 array.
Parameters:
-
data_f32(CtypesArray[c_float]) –float32 array
-
nelements(int) –number of elements in data_f32
-
ne0(int) –number of elements in data_f32 that are zero
-
ttype(GGML_TYPE) –ggml type to quantize to
-
work(Optional[CtypesArray[c_float]], default:None) –work buffer
-
imatrix(Optional[CtypesArray[c_float]], default:None) –quantization matrix
Returns:
-
(work, cur_size)–outpuut buffer, histogram, number of bytes in work buffer
Source code in ggml/utils.py
quantize_row(data_f32, nelements, ttype, work=None)
Quantize a row of a ggml tensor.
Parameters:
-
data_f32(CtypesArray[c_float]) –float32 array
-
nelements(int) –number of elements in data_f32
-
ttype(GGML_TYPE) –ggml type to quantize to
-
work(Optional[c_void_p], default:None) –work buffer
Returns:
-
c_void_p–output buffer
Source code in ggml/utils.py
dequantize_row(data_q, nelements, ttype, work=None)
Dequantize a row of a ggml tensor.
Parameters:
-
data_q(c_void_p) –quantized data
-
nelements(int) –number of elements in data_q
-
ttype(GGML_TYPE) –ggml type to dequantize from
-
work(Optional[c_void_p], default:None) –work buffer
Returns:
-
c_void_p–output buffer
Source code in ggml/utils.py
get_ndims(tensor)
Get the number of dimensions of a ggml tensor.
Parameters:
-
tensor(ggml_tensor_p) –ggml tensor
Returns:
-
int–Number of dimensions of tensor
get_shape(tensor)
get_strides(tensor)
slice_tensor(ctx, tensor, indices)
Slice a ggml tensor along multiple dimensions.
The slice is a view of the original tensor with the same number of dimensions.
Parameters:
-
ctx(ggml_context_p) –ggml context
-
tensor(ggml_tensor_p) –ggml tensor
-
indices(Sequence[slice]) –indices to slice along
Returns:
-
ggml_tensor_p–New ggml tensor slice view
Source code in ggml/utils.py
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 | |