graph_datasets package

Subpackages

Submodules

graph_datasets.data_info module

Data source information.

graph_datasets.data_info.DEFAULT_DATA_DIR = './data': Default directory for data saving.

graph_datasets.data_info.SDCN_URL = 'https://github.com/bdy9527/SDCN/blob/da6bb007b7': Downloading url of datasets in paper SDCN.

graph_datasets.data_info.COLA_URL = 'https://github.com/GRAND-Lab/CoLA/blob/main': Downloading url of datasets in paper CoLA.

graph_datasets.data_info.LINKX_URL = 'https://github.com/CUAI/Non-Homophily-Large-Scale/blob/82f8f05c5c/data': Downloading url of datasets in paper LINKX.

graph_datasets.data_info.CRITICAL_URL = 'https://github.com/yandex-research/heterophilous-graphs/blob/a431395/data': Downloading url of datasets in paper A Critical Look at the Evaluation of GNNs Under Heterophily: Are We Really Making Progress?*.

graph_datasets.data_info.PYG_DATASETS = ['cora', 'citeseer', 'pubmed', 'corafull', 'reddit', 'chameleon', 'squirrel', 'actor', 'cornell', 'texas', 'wisconsin', 'computers', 'photo', 'cs', 'physics', 'wikics']

Supported datasets of pyG.

Note

main difference of dgl and pyG datasets

dgl has self-loops while pyG removes them.
dgl row normalizes features while pyG does not.

graph_datasets.data_info.DGL_DATASETS = ['cora', 'citeseer', 'pubmed', 'corafull', 'reddit', 'chameleon', 'squirrel', 'actor', 'cornell', 'texas', 'wisconsin']: Supported datasets of dgl.

graph_datasets.data_info.SDCN_DATASETS = ['dblp', 'acm']: Datasets in paper SDCN.

graph_datasets.data_info.COLA_DATASETS = ['blogcatalog', 'flickr']: Datasets in paper CoLA.

graph_datasets.data_info.LINKX_DATASETS = ['snap-patents', 'pokec', 'genius', 'arxiv-year', 'Penn94', 'twitch-gamers', 'wiki', 'cornell', 'chameleon', 'film', 'squirrel', 'texas', 'wisconsin', 'yelp-chi', 'deezer-europe', 'Amherst41', 'Cornell5', 'Johns Hopkins55', 'Reed98']: Datasets in paper LINKX.

graph_datasets.data_info.CRITICAL_DATASETS = ['roman-empire', 'amazon-ratings', 'minesweeper', 'tolokers', 'questions', 'squirrel', 'chameleon']: Datasets in paper A Critical Look at the Evaluation of GNNs Under Heterophily: Are We Really Making Progress?*.

graph_datasets.load_data module

Load Graph Datasets

graph_datasets.load_data.load_data(dataset_name: str, directory: str = './data', verbosity: int = 0, source: str = 'pyg', return_type: str = 'dgl', rm_self_loop: bool = True, to_simple: bool = True) → Tuple[DGLGraph, Tensor, int][source]

Load graphs.

Parameters:

dataset_name (str) – Dataset name.
directory (str, optional) – Raw dir for loading or saving. Defaults to DEFAULT_DATA_DIR=os.path.abspath(“./data”).
verbosity (int, optional) – Output debug information. The greater, the more detailed. Defaults to 0.
source (str, optional) – Source for data loading. Defaults to “pyg”.
return_type (str, optional) – Return type of the graphs within [“dgl”, “pyg”]. Defaults to “dgl”.
rm_self_loop (str, optional) – Remove self loops. Defaults to True.
to_simple (str, optional) – Convert to a simple graph with no duplicate undirected edges.

Raises:

NotImplementedError – Dataset unknown.

Returns:

[graph, label, n_clusters]

Return type:

Tuple[dgl.DGLGraph, torch.Tensor, int]

Example

from graph_datasets import load_data
# dgl graph
graph, label, n_clusters = load_data(
    dataset_name='cora',
    directory="./data",
    return_type="dgl",
    source='pyg',
    verbosity=3,
    rm_self_loop=True,
    to_simple=True,
)
# pyG data
data = load_data(
    dataset_name='cora',
    directory="./data",
    return_type="pyg",
    source='pyg',
    verbosity=3,
    rm_self_loop=True,
    to_simple=True,
)

Module contents

Graph Datasets