ls_mlkit.util.llm module¶
- ls_mlkit.util.llm.add_maybe_special_tokens(model, tokenizer)[source]¶
Add maybe special tokens to the tokenizer and model, resize the model’s token embeddings if needed
- Parameters:
model (torch.nn.Module) – the model to add the special tokens
tokenizer (Tokenizer) – the tokenizer to add the special tokens
- Returns:
the model and the tokenizer
- Return type:
tuple
- ls_mlkit.util.llm.compute_metrics(eval_prediction: EvalPrediction)[source]¶
Compute the metrics for a model
- Parameters:
eval_prediction (EvalPrediction) – the evaluation prediction
- Returns:
the metrics
- Return type:
dict
- ls_mlkit.util.llm.get_data_collator(tokenizer, max_length=-1, ignore_masked_token=True, model_type='causal')[source]¶
Get a data collator for a model, Shifting the inputs and labels to align them happens inside the model, so the data collator just copies the inputs to create the labels.
- Parameters:
tokenizer (Tokenizer) – the tokenizer to use
max_length (int, optional) – the maximum length of the input text. Defaults to -1.
ignore_masked_token (bool, optional) – whether to ignore the masked tokens. Defaults to True.
model_type (str, optional) – the type of the model. Defaults to “causal”.
- Returns:
the data collator
- Return type:
callable