ls_mlkit.util.offload.split module

ls_mlkit.util.offload.split.get_model_memory(model: Module, forward_factor: float = 1.3)[source]

Calculates the estimated memory usage of a model in gigabytes.

Parameters:
  • model (torch.nn.Module) – The model whose memory usage is to be calculated.

  • forward_factor (float, optional) – A factor to account for additional memory usage during the forward pass. Defaults to 1.3.

Returns:

The estimated memory usage of the model in gigabytes.

Return type:

float

ls_mlkit.util.offload.split.get_split_num(origin_type: str = 'bf16', quant_type: str = 'int8')[source]

Calculates the ratio of original type size to quantized type size.

Parameters:
  • origin_type (str, optional) – The data type of the original tensor. Defaults to “bf16”. Options are “fp32” and “bf16”.

  • quant_type (str, optional) – The data type of the quantized tensor. Defaults to “int8”. Options are “int8” and “nf4”.

Raises:
  • ValueError – If the origin_type is not “fp32” or “bf16”.

  • ValueError – If the quant_type is not “int8” or “nf4”.

Returns:

The ratio of the original type size to the quantized type size.

Return type:

int