the_utils.evaluation package
Submodules
the_utils.evaluation.unsupervised_graph_learning module
Evaluation utils for unsupervised graph learning tasks.
- the_utils.evaluation.unsupervised_graph_learning.generate_split(num_nodes: int, train_ratio: int, valid_ratio: int) Tuple[ndarray] [source]
generate the train, val and test set.
- Parameters:
num_nodes (int) – num of nodes.
train_ratio (int) – node ratio of training set.
valid_ratio (int) – node ratio of valid set.
- Returns:
[train set node ids, val set node ids, test set node ids]
- Return type:
Tuple[np.ndarray]
- the_utils.evaluation.unsupervised_graph_learning.split_train_test_nodes(num_nodes: int, train_ratio: int, valid_ratio: int, data_name: str, split_id: int = 0, split_times: int = 10, fixed_split: bool = True, split_save_dir: str = './data') Tuple[ndarray, ndarray, ndarray] [source]
Split training and test set.
- Parameters:
num_nodes (int) – num of nodes.
train_ratio (int) – training ratio.
valid_ratio (int) – valid ratio.
data_name (str) – dataset name.
split_id (int, optional) – the idx of the split. Defaults to 0.
split_times (int, optional) – num of the random splits. Defaults to 10.
fixed_split (bool, optional) – save the split after splitting once and load the fixed split next time. Defaults to True.
split_save_dir (str, optional) – the dir for saving the fixed split. Defaults to ‘./data’.
- Returns:
[train_idx, val_idx, test_idx]
- Return type:
Tuple[np.ndarray, np.ndarray, np.ndarray]
- the_utils.evaluation.unsupervised_graph_learning.cluster_eval(y_true: Tensor | ndarray, y_pred: Tensor | ndarray) Tuple[float] [source]
Evaluate Clustering.
- Parameters:
y_true (Union[torch.Tensor, np.ndarray]) – ground truth label.
y_pred (Union[torch.Tensor, np.ndarray]) – predicted label.
- Returns:
[ACC, NMI, AMI, ARI, MacroF1, Purity]
- Return type:
Tuple[List[Tuple[float]], float]
- the_utils.evaluation.unsupervised_graph_learning.kmeans_test(X: Tensor | ndarray, y: Tensor | ndarray, n_clusters: int, repeat: int = 1) Tuple[float] [source]
Evaluate Embedding with kmeans.
- Parameters:
X (Union[torch.Tensor, np.ndarray]) – embedding.
y (Union[torch.Tensor, np.ndarray]) – ground truth label.
n_clusters (int) – num of clusters.
repeat (int, optional) – kmeans repeat times. Defaults to 0.
- Returns:
[svm_macro_f1_list, svm_micro_f1_list, acc_mean, acc_std, nmi_mean, nmi_std, ami_mean, ami_std, ari_mean,ari_std, f1_mean, f1_std, purity_mean, purity_std]
- Return type:
Tuple[List[Tuple[float]], float]
- the_utils.evaluation.unsupervised_graph_learning.svm_test(num_nodes: int, data_name: str, embeddings: tensor, labels: ndarray, train_ratios: Tuple[int] = (10, 20, 30, 40), valid_ratios: Tuple[int] = (10, 20, 30, 40), repeat: int = 3, fixed_split: bool = True, split_save_dir: str = './data') Tuple[List[Tuple[float]]] [source]
Linear regression (node classification) using SVM on embedding.
- Parameters:
num_nodes (int) – number of nodes.
data_name (str) – dataset name.
embeddings (torch.tensor) – node embeddings.
labels (np.ndarray) – ground truth labels.
train_ratios (tuple, optional) – split ratio of training set. Defaults to (10, 20, 30, 40).
valid_ratios (tuple, optional) – split ratio of validation set. Defaults to (10, 20, 30, 40).
repeat (int, optional) – svm repeat times. Defaults to 10.
fixed_split (bool, optional) – save the split after splitting once and load the fixed split next time. Defaults to True.
split_save_dir (str, optional) – the dir for saving the fixed split. Defaults to ‘./data’.
- Returns:
([(mean, std),(mean, std),…] of macro_f1 for all train_ratios,[(mean, std),(mean, std),…] of micro_f1 for all train_ratios)
- Return type:
Tuple[List[Tuple[float]]]
- the_utils.evaluation.unsupervised_graph_learning.evaluate_clf_cls(labels: ndarray, num_classes: int, num_nodes: int, data_name: str, embeddings: Tensor, quiet: bool = True, method: str = 'both', clf_repeat: int = 3, cls_repeat: int = 1, train_ratios: Tuple[int] = (10, 20, 30, 40), valid_ratios: Tuple[int] = (10, 20, 30, 40), fixed_split: bool = True, split_save_dir: str = './data') Tuple[List[Tuple[float]], float] [source]
Evaluation of node classification (linear regression) and clustering.
- Parameters:
labels (np.ndarray) – Labels.
num_classes (int) – Num of classes.
num_nodes (int) – Num of nodes.
data_name (str) – Dataset name.
embeddings (torch.Tensor) – Node embedding matrix.
quiet (bool, optional) – Whether to print info. Defaults to True.
method (bool, optional) – method for evaluation, “clf” for linear regression (node classification), “cls” for node clustering, “both” for both. Defaults to “both”.
clf_repeat (int, optional) – node classification repeat times. Defaults to 3.
cls_repeat (int, optional) – node clustering repeat times. Defaults to 1.
train_ratios (tuple, optional) – split ratio of training set for node classification. Defaults to (10, 20, 30, 40).
valid_ratios (tuple, optional) – split ratio of validation set. Defaults to (10, 20, 30, 40).
split_times (int, optional) – num of the random splits. Defaults to 10.
fixed_split (bool, optional) – save the split after splitting once and load the fixed split next time. Defaults to True.
split_save_dir (str, optional) – the dir for saving the fixed split. Defaults to ‘./data’.
- Returns:
[svm_macro_f1_list, svm_micro_f1_list, acc_mean, acc_std, nmi_mean, nmi_std, ami_mean, ami_std, ari_mean,ari_std, f1_mean, f1_std, purity_mean, purity_std]
- Return type:
Tuple[List[Tuple[float]], float]
- the_utils.evaluation.unsupervised_graph_learning.evaluate_from_embeddings(labels: ndarray, num_classes: int, num_nodes: int, data_name: str, embeddings: Tensor, quiet: bool = True, method: str = 'both', clf_repeat: int = 3, cls_repeat: int = 1, train_ratios: Tuple[int] = (10, 20, 30, 40), valid_ratios: Tuple[int] = (10, 20, 30, 40), fixed_split: bool = True, split_save_dir: str = './data') Tuple[Dict, Dict] [source]
Evaluate embeddings with node classification (linear regression) and clustering.
- Parameters:
labels (np.ndarray) – labels.
num_classes (int) – number of classes.
num_nodes (int) – number of nodes.
data_name (str) – name of the datasets.
embeddings (torch.Tensor) – embeddings.
quiet (bool, optional) – whether to print info. Defaults to True.
method (bool, optional) – method for evaluation, “clf” for linear regression (node classification), “cls” for node clustering, “both” for both. Defaults to “both”.
clf_repeat (int, optional) – node classification repeat times. Defaults to 3
cls_repeat (int, optional) – node clustering repeat times. Defaults to 1.
train_ratios (tuple, optional) – split ratio of training set for node classification. Defaults to (10, 20, 30, 40).
valid_ratios (tuple, optional) – split ratio of validation set. Defaults to (10, 20, 30, 40).
fixed_split (bool, optional) – save the split after splitting once and load the fixed split next time. Defaults to True.
split_save_dir (str, optional) – the dir for saving the fixed split. Defaults to ‘./data’.
- Returns:
(clustering_results, classification_results)
- Return type:
Tuple[Dict, Dict]
Module contents
Evaluation Utils