ls_mlkit.pipeline package¶

Submodules¶

Module contents¶

class ls_mlkit.pipeline.BasePipeline(model: Module, dataset: Dataset | Dataset, optimizers: Tuple[Optimizer, LambdaLR], training_config: TrainingConfig, log_config: LogConfig, logger: Logger, collate_fn: Callable | None = None, callbacks: List[BaseCallback] | None = None, *args, **kwargs)[source]¶

Bases: object

add_callbacks(callbacks: List[BaseCallback])[source]¶

Add a list of callbacks

Parameters:: callbacks (List[BaseCallback]) – the callbacks to add

abstractmethod compute_loss(model: Module, batch: dict) → Tensor | dict[source]¶

Compute the loss

Parameters:

model (torch.nn.Module) – the model to train
batch (dict) – the batch of data

Returns:

loss Tensor or a dictionary containing the loss key

Return type:

Tensor | dict

eval()[source]¶: Evaluate the model

get_latest_checkpoint_dir() → str[source]¶

gradient_clip() → None[source]¶

Clip the gradient

Returns:: None

load() → None[source]¶

Load the checkpoint

Returns:: None

save() → None[source]¶

Save the checkpoint

Returns:: None

train() → None[source]¶

Train the model

Returns:: None

train_a_step(batch: Dict[str, Any]) → dict[source]¶

Train the model for one step

Parameters:: batch (Dict[str, Any]) – the batch of data
Returns:: a dictionary containing the loss key
Return type:: dict

train_an_epoch()[source]¶

Train the model for one epoch

Returns:: None

trigger_callbacks(event: CallbackEvent, *args, **kwargs)[source]¶

Trigger all callbacks for a given event

Parameters:

event (CallbackEvent) – the event to trigger
*args – the arguments to pass to the callback
**kwargs – the keyword arguments to pass to the callback

class ls_mlkit.pipeline.DistributedPipeline(model: Module, dataset: Dataset | Dataset, optimizers: Tuple[Optimizer, LambdaLR], training_config: DistributedTrainingConfig, log_config: LogConfig, logger: Logger, collate_fn: Callable | None = None, seed: int = 42, callbacks: List[BaseCallback] | None = None, *args, **kwargs)[source]¶

Bases: BasePipeline

gradient_clip() → None[source]¶

Clip the gradient

Returns:: None

load() → None[source]¶

Load the checkpoint

Returns:: None

save() → None[source]¶

Save the checkpoint

Returns:: None

train_a_step(batch: Dict[str, Any])[source]¶

Train the model for one step

Parameters:: batch (Dict[str, Any]) – the batch of data
Returns:: a dictionary containing the loss key
Return type:: dict

trigger_callbacks(event: CallbackEvent, *args, **kwargs)[source]¶

Trigger all callbacks for a given event

Parameters:

event (CallbackEvent) – the event to trigger
*args – the arguments to pass to the callback
**kwargs – the keyword arguments to pass to the callback

class ls_mlkit.pipeline.DistributedTrainingConfig(n_epochs: int = 100, batch_size: int = 4, device: str = 'cuda', save_strategy: Literal['epochs', 'steps', None] = 'epochs', save_dir: str = None, save_steps: int = 10, save_epochs: int = 1, save_total_limit: int = 5, num_workers: int = 4, train_shuffle: bool = True, eval_strategy: Literal['epochs', 'steps'] = None, eval_steps: int = 500, eval_epochs: int = 1, grad_clip_strategy: Literal['norm', 'value', None] = 'norm', max_grad_norm: float = 1.0, max_grad_value: float = 1.0, gradient_accumulation_steps: int = 1, mixed_precision: str = 'fp16', *args, **kwargs)[source]¶: Bases: TrainingConfig

class ls_mlkit.pipeline.LogConfig(log_dir: str = 'logs', log_steps: int = 100, log_epochs: int = 1, log_strategy: Literal['epochs', 'steps'] = 'epochs', *args, **kwargs)[source]¶: Bases: object

class ls_mlkit.pipeline.MyDistributedPipeline(model: Module, train_dataset: Dataset | Dataset, eval_dataset: Dataset | Dataset, optimizers: tuple[Optimizer, LambdaLR | CosineAnnealingLR], training_config: MyTrainingConfig, log_config: LogConfig, logger: Logger | None, collate_fn: Callable | None = None, seed: int = 42, callbacks: List[BaseCallback] | None = None, *args, **kwargs)[source]¶

Bases: DistributedPipeline

compute_loss(model, batch: dict) → Tensor[source]¶

Compute the loss

Parameters:

model (torch.nn.Module) – the model to train
batch (dict) – the batch of data

Returns:

loss Tensor or a dictionary containing the loss key

Return type:

Tensor | dict

eval(disable_grad: bool = False)[source]¶: Evaluate the model

eval_a_step(batch: dict) → dict[source]¶

Evaluate the model for one step

Parameters:: batch (dict) – the batch of data
Returns:: a dictionary containing the evaluation loss
Return type:: dict

train()[source]¶

Train the model

Returns:: None

class ls_mlkit.pipeline.MyTrainingConfig(train_strategy: Literal['epochs', 'steps'] = 'epochs', n_epochs: int = 100, batch_size: int = 16, device: str = 'cuda', save_strategy: Literal['epochs', 'steps', None] = 'epochs', save_dir: str | None = None, save_steps: int = 10, save_epochs: int = 1, save_total_limit: int = 5, num_workers: int = 4, train_shuffle: bool = True, eval_strategy: Literal['epochs', 'steps'] | None = None, eval_steps: int = 500, eval_epochs: int = 1, n_steps: int | None = None, grad_clip_strategy: Literal['norm', 'value', None] = 'norm', max_grad_norm: float = 1.0, max_grad_value: float = 1.0, gradient_accumulation_steps: int = 1, mixed_precision: str = 'fp16', *args, **kwargs)[source]¶

Bases: DistributedTrainingConfig

get_gradient_accumulation_steps(real_batch_size, per_device_batch_size) → int[source]¶

Get the gradient accumulation steps

Parameters:

real_batch_size (int) – the real batch size
per_device_batch_size (int) – the batch size per device

Returns:

the gradient accumulation steps

Return type:

int

class ls_mlkit.pipeline.TrainingConfig(n_epochs: int = 100, batch_size: int = 4, device: str = 'cuda', save_strategy: Literal['epochs', 'steps', None] = 'epochs', save_dir: str = None, save_steps: int = 10, save_epochs: int = 1, save_total_limit: int = 5, num_workers: int = 4, train_shuffle: bool = True, eval_strategy: Literal['epochs', 'steps'] = None, eval_steps: int = 500, eval_epochs: int = 1, grad_clip_strategy: Literal['norm', 'value', None] = 'norm', max_grad_norm: float = 1.0, max_grad_value: float = 1.0, gradient_accumulation_steps: int = 1, *args, **kwargs)[source]¶: Bases: object

class ls_mlkit.pipeline.TrainingState(current_epoch: int = 0, current_step_in_epoch: int = 0, current_global_step: int = 0)[source]¶: Bases: object

ls_mlkit.pipeline package¶

Submodules¶

Module contents¶

ls-mlkit

Navigation

Related Topics