ls_mlkit.dataset.lda_dataset module

class ls_mlkit.dataset.lda_dataset.CyclicGroupGenerator(a: int, p: int)[source]

Bases: object

next()[source]
class ls_mlkit.dataset.lda_dataset.LDADataset(n_samples: int = 1000, n_local_topics: int = 1, n_total_topics: int = 10, n_words_per_topic: int = 7, seq_len: int = 100, fix_seq_len: bool = True, seed: int = 31, per_topic_strategy: str = 'cyclic', fix_local_topics_num: bool = True, topic_distribution: str = 'uniform')[source]

Bases: Dataset

Dataset for LDA. no need to be tokenized.

data_list_to_tensor(data_list: list[list[int]])[source]
static generate_dataset(n_samples: int, n_local_topics: int, n_total_topics: int, n_words_per_topic: int, seq_len: int, fix_local_topics_num: bool = True, fix_seq_len: bool = True, seed: int = 31, per_topic_strategy: str = 'cyclic', topic_distribution: str = 'uniform')[source]
class ls_mlkit.dataset.lda_dataset.RandomGenerator[source]

Bases: object

static get_probabilities_by_distribution(n_numbers: int, number_list: list[int], distribution: Literal['uniform', 'normal', 'binomial', 'linear'] = 'uniform')[source]
ls_mlkit.dataset.lda_dataset.get_lda_dataset(seed: int = 31, n_samples: int = 100, n_local_topics: int = 1, n_total_topics: int = 10, n_words_per_topic: int = 7, seq_len: int = 100, eval_ratio: float = 0.1, fix_seq_len: bool = True, fix_local_topics_num: bool = True, per_topic_strategy: str = 'cyclic', topic_distribution: str = 'uniform')[source]
ls_mlkit.dataset.lda_dataset.softmax(x)[source]