graph_datasets package
Subpackages
- graph_datasets.datasets package
- graph_datasets.utils package
Submodules
graph_datasets.data_info module
Data source information.
- graph_datasets.data_info.DEFAULT_DATA_DIR = './data'
Default directory for data saving.
- graph_datasets.data_info.SDCN_URL = 'https://github.com/bdy9527/SDCN/blob/da6bb007b7'
Downloading url of datasets in paper SDCN.
- graph_datasets.data_info.COLA_URL = 'https://github.com/GRAND-Lab/CoLA/blob/main'
Downloading url of datasets in paper CoLA.
- graph_datasets.data_info.LINKX_URL = 'https://github.com/CUAI/Non-Homophily-Large-Scale/blob/82f8f05c5c/data'
Downloading url of datasets in paper LINKX.
- graph_datasets.data_info.CRITICAL_URL = 'https://github.com/yandex-research/heterophilous-graphs/blob/a431395/data'
Downloading url of datasets in paper A Critical Look at the Evaluation of GNNs Under Heterophily: Are We Really Making Progress?*.
- graph_datasets.data_info.PYG_DATASETS = ['cora', 'citeseer', 'pubmed', 'corafull', 'reddit', 'chameleon', 'squirrel', 'actor', 'cornell', 'texas', 'wisconsin', 'computers', 'photo', 'cs', 'physics', 'wikics']
Supported datasets of pyG.
Note
- main difference of dgl and pyG datasets
dgl has self-loops while pyG removes them.
dgl row normalizes features while pyG does not.
- graph_datasets.data_info.DGL_DATASETS = ['cora', 'citeseer', 'pubmed', 'corafull', 'reddit', 'chameleon', 'squirrel', 'actor', 'cornell', 'texas', 'wisconsin']
Supported datasets of dgl.
- graph_datasets.data_info.LINKX_DATASETS = ['snap-patents', 'pokec', 'genius', 'arxiv-year', 'Penn94', 'twitch-gamers', 'wiki', 'cornell', 'chameleon', 'film', 'squirrel', 'texas', 'wisconsin', 'yelp-chi', 'deezer-europe', 'Amherst41', 'Cornell5', 'Johns Hopkins55', 'Reed98']
Datasets in paper LINKX.
- graph_datasets.data_info.CRITICAL_DATASETS = ['roman-empire', 'amazon-ratings', 'minesweeper', 'tolokers', 'questions', 'squirrel', 'chameleon']
Datasets in paper A Critical Look at the Evaluation of GNNs Under Heterophily: Are We Really Making Progress?*.
graph_datasets.load_data module
Load Graph Datasets
- graph_datasets.load_data.load_data(dataset_name: str, directory: str = './data', verbosity: int = 0, source: str = 'pyg', return_type: str = 'dgl', rm_self_loop: bool = True, to_simple: bool = True) Tuple[DGLGraph, Tensor, int] [source]
Load graphs.
- Parameters:
dataset_name (str) – Dataset name.
directory (str, optional) – Raw dir for loading or saving. Defaults to DEFAULT_DATA_DIR=os.path.abspath(“./data”).
verbosity (int, optional) – Output debug information. The greater, the more detailed. Defaults to 0.
source (str, optional) – Source for data loading. Defaults to “pyg”.
return_type (str, optional) – Return type of the graphs within [“dgl”, “pyg”]. Defaults to “dgl”.
rm_self_loop (str, optional) – Remove self loops. Defaults to True.
to_simple (str, optional) – Convert to a simple graph with no duplicate undirected edges.
- Raises:
NotImplementedError – Dataset unknown.
- Returns:
[graph, label, n_clusters]
- Return type:
Tuple[dgl.DGLGraph, torch.Tensor, int]
Example
from graph_datasets import load_data # dgl graph graph, label, n_clusters = load_data( dataset_name='cora', directory="./data", return_type="dgl", source='pyg', verbosity=3, rm_self_loop=True, to_simple=True, ) # pyG data data = load_data( dataset_name='cora', directory="./data", return_type="pyg", source='pyg', verbosity=3, rm_self_loop=True, to_simple=True, )
Module contents
Graph Datasets