gnnwr.datasets module
- gnnwr.datasets.BasicDistance(x, y)[source]
Calculate the distance between two points
- Parameters:
x – Input point coordinate data
y – Input target point coordinate data
- Returns:
distance matrix
- gnnwr.datasets.Manhattan_distance(x, y)[source]
Calculate the Manhattan distance between two points
- Parameters:
x – Input point coordinate data
y – Input target point coordinate data
- Returns:
distance matrix
- class gnnwr.datasets.baseDataset(data=None, x_column: list | None = None, y_column: list | None = None, id_column=None, is_need_STNN=False)[source]
Bases:
Dataset
baseDataset is the base class of dataset, which is used to store the data and other information. it also provides the function of data scaling, data saving and data loading.
- Parameters:
data – DataSets with x_column and y_column
x_column – independent variables column name
y_column – dependent variables column name
is_need_STNN – whether to use STNN
- rescale(x, y)[source]
- Parameters:
x – Input independent variable data
y – Input dependent variable data
- Returns:
rescaled x and y
- gnnwr.datasets.init_dataset(data, test_ratio, valid_ratio, x_column, y_column, spatial_column=None, temp_column=None, id_column=None, sample_seed=100, process_fn='minmax_scale', batch_size=32, shuffle=True, use_class=<class 'gnnwr.datasets.baseDataset'>, spatial_fun=<function BasicDistance>, temporal_fun=<function Manhattan_distance>, max_val_size=-1, max_test_size=-1, from_for_cv=0, is_need_STNN=False, Reference=None, simple_distance=True)[source]
Initialize the dataset and return the training set, validation set and test set for the model
- Parameters:
data – dataset
test_ratio – test data ratio
valid_ratio – valid data ratio
x_column – input attribute column name
y_column – output attribute column name
spatial_column – spatial attribute column name
temp_column – temporal attribute column name
sample_seed – random seed
process_fn – data pre-process function
batch_size – batch size
max_val_size – max valid data size in one injection
max_test_size – max test data size in one injection
shuffle – shuffle data
use_class – dataset class
spatial_fun – spatial distance calculate function
temporal_fun – temporal distance calculate function
from_for_cv – the start index of the data for cross validation
is_need_STNN – whether to use STNN
Reference – reference points to calculate the distance
simple_distance – whether to use simple distance function to calculate the distance
- Returns:
train dataset, valid dataset, test dataset
- gnnwr.datasets.init_dataset_cv(data, test_ratio, k_fold, x_column, y_column, spatial_column=None, temp_column=None, id_column=None, sample_seed=100, process_fn='minmax_scale', batch_size=32, shuffle=True, use_class=<class 'gnnwr.datasets.baseDataset'>, spatial_fun=<function BasicDistance>, temporal_fun=<function Manhattan_distance>, max_val_size=-1, max_test_size=-1, is_need_STNN=False, Reference=None, simple_distance=True)[source]
initialize dataset for cross validation
- Parameters:
data – input data
test_ratio – test set ratio
k_fold – k of k-fold
x_column – attribute column name
y_column – label column name
spatial_column – spatial distance column name
temp_column – temporal distance column name
id_column – id column name
sample_seed – random seed
process_fn – data process function
batch_size – batch size
shuffle – shuffle or not
use_class – dataset class
spatial_fun – spatial distance calculate function
temporal_fun – temporal distance calculate function
max_val_size – validation set size
max_test_size – test set size
is_need_STNN – whether need STNN
Reference – reference data
simple_distance – is simple distance
- Returns:
cv_data_set, test_dataset
- gnnwr.datasets.init_predict_dataset(data, train_dataset, x_column, spatial_column=None, temp_column=None, process_fn='minmax_scale', scale_sync=True, use_class=<class 'gnnwr.datasets.predictDataset'>, spatial_fun=<function BasicDistance>, temporal_fun=<function Manhattan_distance>, max_size=-1, is_need_STNN=False)[source]
initialize predict dataset
- Parameters:
data – input data
train_dataset – train data
x_column – attribute column name
spatial_column – spatial distance column name
temp_column – temporal distance column name
process_fn – data process function
scale_sync – scale sync or not
max_size – max size of predict dataset
use_class – dataset class
spatial_fun – spatial distance calculate function
temporal_fun – temporal distance calculate function
is_need_STNN – is need STNN or not
- Returns:
predict_dataset
- class gnnwr.datasets.predictDataset(data, x_column, process_fn='minmax_scale', scale_info=[], is_need_STNN=False)[source]
Bases:
Dataset
Predict dataset is used to predict the dependent variable of the data.
- Parameters:
data – dataframe
x_column – independent variable column name
process_fn – process function name
scale_info – process function parameters
is_need_STNN – whether need STNN
- minmax_scaler(x, min=[], max=[])[source]
function of minmax scaler
- Parameters:
x – Input attribute data
min – minimum value of each attribute
max – maximum value of each attribute
- Returns:
Output attribute data