ls_mlkit.pipeline.dist_pipeline_impl module¶
- class ls_mlkit.pipeline.dist_pipeline_impl.MyDistributedPipeline(model: Module, train_dataset: Dataset | Dataset, eval_dataset: Dataset | Dataset, optimizers: tuple[Optimizer, LambdaLR | CosineAnnealingLR], training_config: MyTrainingConfig, log_config: LogConfig, logger: Logger | None, collate_fn: Callable | None = None, seed: int = 42, callbacks: List[BaseCallback] | None = None, *args, **kwargs)[source]¶
Bases:
DistributedPipeline- compute_loss(model, batch: dict) Tensor[source]¶
Compute the loss
- Parameters:
model (torch.nn.Module) – the model to train
batch (dict) – the batch of data
- Returns:
loss Tensor or a dictionary containing the loss key
- Return type:
Tensor | dict
- class ls_mlkit.pipeline.dist_pipeline_impl.MyTrainingConfig(train_strategy: Literal['epochs', 'steps'] = 'epochs', n_epochs: int = 100, batch_size: int = 16, device: str = 'cuda', save_strategy: Literal['epochs', 'steps', None] = 'epochs', save_dir: str | None = None, save_steps: int = 10, save_epochs: int = 1, save_total_limit: int = 5, num_workers: int = 4, train_shuffle: bool = True, eval_strategy: Literal['epochs', 'steps'] | None = None, eval_steps: int = 500, eval_epochs: int = 1, n_steps: int | None = None, grad_clip_strategy: Literal['norm', 'value', None] = 'norm', max_grad_norm: float = 1.0, max_grad_value: float = 1.0, gradient_accumulation_steps: int = 1, mixed_precision: str = 'fp16', *args, **kwargs)[source]¶
Bases:
DistributedTrainingConfig