Title: | Fusing Machine Learning in R |
---|---|
Description: | Recent technological advances have enable the simultaneous collection of multi-omics data i.e., different types or modalities of molecular data, presenting challenges for integrative prediction modeling due to the heterogeneous, high-dimensional nature and possible missing modalities of some individuals. We introduce this package for late integrative prediction modeling, enabling modality-specific variable selection and prediction modeling, followed by the aggregation of the modality-specific predictions to train a final meta-model. This package facilitates conducting late integration predictive modeling in a systematic, structured, and reproducible way. |
Authors: | Cesaire J. K. Fouodo [aut, cre] |
Maintainer: | Cesaire J. K. Fouodo <[email protected]> |
License: | GPL-3 |
Version: | 0.0.1 |
Built: | 2024-12-18 05:36:45 UTC |
Source: | https://github.com/imbs-hl/fusemlr |
The meta learner is the best layer-specific learner. This function is intended to be (internally) used as meta-learner in fuseMLR.
bestLayerLearner(x, y, perf = NULL)
bestLayerLearner(x, y, perf = NULL)
x |
|
y |
|
perf |
|
A model object of class weightedMeanLeaner
.
set.seed(20240624L) x = data.frame(x1 = runif(n = 50L, min = 0, max = 1)) y = sample(x = 0L:1L, size = 50L, replace = TRUE) my_best_model = bestLayerLearner(x = x, y = y)
set.seed(20240624L) x = data.frame(x1 = runif(n = 50L, min = 0, max = 1)) y = sample(x = 0L:1L, size = 50L, replace = TRUE) my_best_model = bestLayerLearner(x = x, y = y)
The function cobra
implements the COBRA (COmBined Regression Alternative),
an aggregation method for combining predictions from multiple individual learners.
This method aims to tune key parameters for achieving optimal predictions
by averaging the target values of similar candidates in the training dataset's predictions.
Only the training points that are sufficiently similar to the test point
(based on the proximity threshold epsilon
) are used for prediction.
If no suitable training points are found, the function returns NA
.
cobra(x, y, tune = "epsilon", k_folds = NULL, eps = NULL)
cobra(x, y, tune = "epsilon", k_folds = NULL, eps = NULL)
x |
|
y |
|
tune |
|
k_folds |
|
eps |
|
An object of class cobra
containing the training data, target values, and chosen parameters.
Biau, G., Fischer, A., Guedj, B., & Malley, J. D. (2014). COBRA: A combined regression strategy. The Journal of Multivariate Analysis 46:18-28
# Example usage set.seed(123) x_train <- data.frame(a = runif(10L), b = runif(10L)) y_train <- sample(0L:1L, size = 10L, replace = TRUE) # Train the model with epsilon optimization cobra_model <- cobra(x = x_train, y = y_train, tune = "epsilon", k_folds = 2) # Make predictions on new data set.seed(156) x_new <- data.frame(a = runif(5L), b = runif(5L)) prediction <- predict(object = cobra_model, data = x_new)
# Example usage set.seed(123) x_train <- data.frame(a = runif(10L), b = runif(10L)) y_train <- sample(0L:1L, size = 10L, replace = TRUE) # Train the model with epsilon optimization cobra_model <- cobra(x = x_train, y = y_train, tune = "epsilon", k_folds = 2) # Make predictions on new data set.seed(156) x_new <- data.frame(a = runif(5L), b = runif(5L)) prediction <- predict(object = cobra_model, data = x_new)
The createCobraPred
function calculates predictions by averaging the target
values of all the nearest candidates in the training dataset. Only the
training points that are within the specified proximity (eps
) to the test
point are used to determine the prediction. If no suitable training points
are found, the function returns NA
as the prediction.
createCobraPred( train, test, n_train, n_test, nlearners, eps, alpha, train_target )
createCobraPred( train, test, n_train, n_test, nlearners, eps, alpha, train_target )
train |
A |
test |
A |
n_train |
An |
n_test |
An |
nlearners |
An |
eps |
A |
alpha |
A value that determines the optimal number of learners in the neighborhood (only for alpha optimization). |
train_target |
A |
The createDif
function computes the difference between the maximum and minimum predictions in a dataset.
createDif(x)
createDif(x)
x |
Predictions vector |
Create Loss
createLoss(pred, target)
createLoss(pred, target)
pred |
A |
target |
A |
Creates a Testing object.
createTesting(id, ind_col, verbose = TRUE)
createTesting(id, ind_col, verbose = TRUE)
id |
|
ind_col |
|
verbose |
|
A Testing object.
Creates and stores a TestLayer on the Testing object passed as argument.
createTestLayer(testing, test_layer_id, test_data)
createTestLayer(testing, test_layer_id, test_data)
testing |
|
test_layer_id |
|
test_data |
|
The updated Testing object (with the new layer) is returned.
Creates a Training object. A training object is designed to encapsulate training layers and training meta-layer. Functions createTrainLayer and createTrainMetaLayer are available to add the training layer and the training meta-layer to a training object.
createTraining( id, target_df, ind_col, target, problem_type = "classification", verbose = TRUE )
createTraining( id, target_df, ind_col, target, problem_type = "classification", verbose = TRUE )
id |
|
target_df |
|
ind_col |
|
target |
|
problem_type |
|
verbose |
|
The created Training object is returned.
createTrainLayer, createTrainMetaLayer and fusemlr.
Creates and stores a TrainLayer on the Training object passed as argument. The main components of a training layer are training data modality, a variable selection methods, and a modality-specific learner.
createTrainLayer( training, train_layer_id, train_data, varsel_package = NULL, varsel_fct = NULL, varsel_param = list(), lrner_package = NULL, lrn_fct, param_train_list = list(), param_pred_list = list(), na_action = "na.rm", x_varsel = "x", y_varsel = "y", x_lrn = "x", y_lrn = "y", object = "object", data = "data", extract_pred_fct = NULL, extract_var_fct = NULL )
createTrainLayer( training, train_layer_id, train_data, varsel_package = NULL, varsel_fct = NULL, varsel_param = list(), lrner_package = NULL, lrn_fct, param_train_list = list(), param_pred_list = list(), na_action = "na.rm", x_varsel = "x", y_varsel = "y", x_lrn = "x", y_lrn = "y", object = "object", data = "data", extract_pred_fct = NULL, extract_var_fct = NULL )
training |
|
train_layer_id |
|
train_data |
|
varsel_package |
|
varsel_fct |
|
varsel_param |
|
lrner_package |
|
lrn_fct |
|
param_train_list |
|
param_pred_list |
|
na_action |
|
x_varsel |
|
y_varsel |
|
x_lrn |
|
y_lrn |
|
object |
|
data |
|
extract_pred_fct |
|
extract_var_fct |
|
The updated Training object (with the new layer) is returned.
Fouodo C.J.K, Bleskina M. and Szymczak S. (2024). fuseMLR: An R package for integrative prediction modeling of multi-omics data, paper submitted.
createTrainMetaLayer and fusemlr.
Creates and store a TrainMetaLayer on the Training object passed as argument. The meta-layer encapsulates the meta-learner and the fold predictions (internally created) of the layer-specific base models.
createTrainMetaLayer( training, meta_layer_id, lrner_package = NULL, lrn_fct, param_train_list = list(), param_pred_list = list(), na_action = "na.impute", x_lrn = "x", y_lrn = "y", object = "object", data = "data", extract_pred_fct = NULL )
createTrainMetaLayer( training, meta_layer_id, lrner_package = NULL, lrn_fct, param_train_list = list(), param_pred_list = list(), na_action = "na.impute", x_lrn = "x", y_lrn = "y", object = "object", data = "data", extract_pred_fct = NULL )
training |
|
meta_layer_id |
|
lrner_package |
|
lrn_fct |
|
param_train_list |
|
param_pred_list |
|
na_action |
|
x_lrn |
|
y_lrn |
|
object |
|
data |
|
extract_pred_fct |
|
Internal meta-learners are available in the package.
The cobra meta-learner implements the COBRA (COmBined Regression Alternative),
an aggregation method for combining predictions from multiple individual learners (Biau et al. 2014).
This method aims to tune key parameters for achieving optimal predictions
by averaging the target values of similar candidates in the training dataset's predictions.
Only the training points that are sufficiently similar to the test point
(based on the proximity threshold epsilon
) are used for prediction.
If no suitable training points are found, the function returns NA
.
The weightedMeanLearner evaluates the prediction performance of modality-specific learners and uses these estimates to weight the base models, aggregating their predictions accordingly.
The bestLayerLearner evaluates the prediction performance of modality-specific learners and returns predictions made by the best learner as the meta-prediction.
Beyond the internal meta-learners, any other learning algorithm can be used.
The updated Training object (with the new layer) is returned.
Fouodo C.J.K, Bleskina M. and Szymczak S. (2024). fuseMLR: An R package for integrative prediction modeling of multi-omics data, paper submitted.
Biau, G., Fischer, A., Guedj, B., & Malley, J. D. (2014). COBRA: A combined regression strategy. The Journal of Multivariate Analysis 46:18-28
createTrainLayer, varSelection, and fusemlr.
The createWeights
function is used to calculate weights for predictions.
createWeights(train, test, n_train, n_test, nlearners, eps, alpha)
createWeights(train, test, n_train, n_test, nlearners, eps, alpha)
train |
A |
test |
A |
n_train |
An |
n_test |
An |
nlearners |
An |
eps |
A |
alpha |
A value that determines the optimal number of learners in the neighborhood (only for alpha optimization). |
As abstract, a Data object cannot be stored on any layer. Instead, extended TrainData or TestData objects can be stored on a layer.
new()
Constructor of class Data.
Data$new(id, ind_col, data_frame)
id
character
Object ID.
ind_col
character
Column name containing individual IDs.
data_frame
data.frame
data.frame
containing data.
print()
Printer
Data$print(...)
...
any
getIndSubset()
Retrieve a data subset for a given variable name and values, a data subset.
Data$getIndSubset(var_name, value)
var_name
character
Variable name of interest.
value
vector
Values of interest.
The data subset is returned.
impute()
Imputes missing values in modality-specific predictions. Only mode and median based imputations are actually supported.
Data$impute(impute_fct, impute_param, target_name)
impute_fct
character
An imputation function to use instead of median or mode imputation. Not yet implemented!
impute_param
list
target_name
character
Name of the target variable.
The list of parameters to call the imputation function.
A new object with the predicted values is returned.
getVarSubset()
Retrieve a subset of variables from data.
Data$getVarSubset(var_name)
var_name
character
Variable names of interest.
The data subset is returned.
getSetDiff()
For the given variable name, non existing values in the current dataset are returned.
Data$getSetDiff(var_name, value)
var_name
character
Variable name of interest.
value
vector
Values of interest.
The subset difference is returned.
getDataFrame()
Getter of the data.frame
.
Data$getDataFrame()
The data.frame
of the current object is returned.
setDataFrame()
Set a new data.frame
to the current object.
Data$setDataFrame(data_frame)
data_frame
data.frame
The current object is returned.
getCompleteData()
Getter of the complete dataset without missing values.
Data$getCompleteData()
The complete dataset is returned.
getId()
Getter of the current object ID.
Data$getId()
The current object ID is returned.
getData()
Getter of the current Data. This function is re-implemented by TrainData and TestData.
Data$getData()
Do not use on this class.
getIndCol()
Getter of the individual column variable.
Data$getIndCol()
clone()
The objects of this class are cloneable with this method.
Data$clone(deep = FALSE)
deep
Whether to make a deep clone.
Extracts data stored on each layers; base data and modality-specific predictions (for Training) are extracted.
extractData(object)
extractData(object)
object |
|
A list of data is returned.
Extracts models stored on each layers; base and meta models are extracted.
extractModel(training)
extractModel(training)
training |
|
A list of models is returned.
Trains the Training object passed as argument. A training object must contain
the training layers and a training meta-layer. A training layer encapsulates
data modalities, a variable selection method and a learner. Use the function
createTraining to create a training object, createTrainLayer to add training
layers to the created training object, and createTrainMetaLayer to add a meta-layer
with the corresponding meta-learner to the training object. The function fusemlr
is designed to train all training layers and the meta-learner. After training
the layer-specific base models and the meta-model will be stored in the training
object which can be used for predictions.
fusemlr( training, ind_subset = NULL, use_var_sel = FALSE, resampling_method = NULL, resampling_arg = list(), seed = NULL )
fusemlr( training, ind_subset = NULL, use_var_sel = FALSE, resampling_method = NULL, resampling_arg = list(), seed = NULL )
training |
|
ind_subset |
|
use_var_sel |
|
resampling_method |
|
resampling_arg |
|
seed |
|
The current object is returned, with each learner trained on each layer.
Fouodo C.J.K, Bleskina M. and Szymczak S. (2024). fuseMLR: An R package for integrative prediction modeling of multi-omics data, paper submitted.
createTrainLayer, createTrainMetaLayer, extractModel and extractData.
Hashtable to contain object modalities. Storage objects like Training and TrainLayer are extensions of this class.
new()
Initialize a default parameters list.
HashTable$new(id)
id
character
ID of the hash table. It must be unique.
print()
Printer
HashTable$print(...)
...
any
add2HashTable()
Function to add a key-value pair to the hash table.
HashTable$add2HashTable(key, value, .class)
key
character
The key to be added.
value
object
Object to be added.
.class
character
Class of the object to be added.
getFromHashTable()
Getter of the object which the key passed as argument.
HashTable$getFromHashTable(key)
key
character
Key of the required object.
getKeyClass()
Getter of the data.frame
that stores all key class pairs.
HashTable$getKeyClass()
removeFromHashTable()
Remove the object with the corresponding key from the hashtable.
HashTable$removeFromHashTable(key)
key
Key of the object to be removed.
getId()
Getter of the current object ID.
HashTable$getId()
getHashTable()
Getter of the current hashtable.
HashTable$getHashTable()
checkClassExist()
Check whether object from a class has already been stored.
HashTable$checkClassExist(.class)
.class
character
Boolean value
This class implements a learner. A Lrner object can only exist as a component of a TrainLayer or a TrainMetaLayer object.
new()
Initialize a default parameters list.
Lrner$new( id, package = NULL, lrn_fct, param_train_list, param_pred_list = list(), train_layer, na_action = "na.rm" )
id
character
Learner ID.
package
character
Package that implements the learn function. If NULL, the
lrn_fct
character
learn function is called from the current environment.
param_train_list
list
List of parameter for training.
param_pred_list
list
List of parameter for testing.
Learn parameters.
train_layer
TrainLayer
Layer on which the learner is stored.
na_action
character
Handling of missing values. Set to "na.keep" to keep missing values, "na.rm" to remove individuals with missing values or "na.impute" (only applicable on meta-data) to impute missing values in meta-data. Only median and mode based imputations are actually handled. With the "na.keep" option, ensure that the provided learner can handle missing values.
print()
Printer
Lrner$print(...)
...
any
summary()
Printer
Lrner$summary(...)
...
any
interface()
Learner and prediction parameter interface. Use this function
to provide how the following parameters are named in the learning
function (lrn_fct
) you provided when creating the learner, or in the predicting function.
Lrner$interface( x = "x", y = "y", object = "object", data = "data", extract_pred_fct = NULL )
x
character
Name of the argument to pass the matrix of independent variables in the original learning function.
y
character
Name of the argument to pass the response variable in the original learning function.
object
character
Name of the argument to pass the model in the original predicting function.
data
character
Name of the argument to pass new data in the original predicting function.
extract_pred_fct
character
or function
If the predict function that is called for the model does not return a vector, then
use this argument to specify a (or a name of a) function that can be used to extract vector of predictions.
Default value is NULL, if predictions are in a vector.
train()
Tains the current learner (from class Lrner) on the current training data (from class TrainData).
Lrner$train(ind_subset = NULL, use_var_sel = FALSE, verbose = TRUE)
ind_subset
vector
Individual ID subset on which the training will be performed.
use_var_sel
boolean
If TRUE, variable selection is performed before training.
verbose
boolean
Warning messages will be displayed if set to TRUE.
The resulting model, from class Model, is returned.
getTrainLayer()
The current layer is returned.
Lrner$getTrainLayer()
TrainLayer object.
getNaRm()
The current layer is returned.
Lrner$getNaRm()
getNaAction()
The current layer is returned.
Lrner$getNaAction()
getId()
Getter of the current learner ID.
Lrner$getId()
The current learner ID.
getPackage()
Getter of the learner package implementing the learn function.
Lrner$getPackage()
The name of the package implementing the learn function.
getIndSubset()
Getter of the learner package implementing the learn function.
Lrner$getIndSubset()
The name of the package implementing the learn function.
getVarSubset()
Getter of the variable subset used for training.
Lrner$getVarSubset()
The list of variables used for training is returned.
getParamPred()
Getter predicting parameter list.
Lrner$getParamPred()
The list of predicting parameters.
getParamInterface()
The current parameter interface is returned.
Lrner$getParamInterface()
A data.frame of interface.
getExtractPred()
The function to extract predicted values is returned.
Lrner$getExtractPred()
A data.frame of interface.
This class implements a model. A Model object can only exist as element of a TrainLayer or a TrainMetaLayer object. A Model object is automatically created by fitting a learner on a training data.
A Model object can compute predictions for a TestData object. See the predict
function below.
new()
Constructor of Model class.
Model$new(lrner, train_data, base_model, train_layer)
lrner
Lrner
The learner.
train_data
TrainData(1)
Training data.
base_model
object
Base model as returned by the original learn function.
train_layer
TrainLayer
The current training layer on which the model is stored.
An object is returned.
print()
Printer
Model$print(...)
...
any
summary()
Summary
Model$summary(...)
...
any
getBaseModel()
Getter of the base model
Model$getBaseModel()
getTrainData()
Getter of the traning data
Model$getTrainData()
getTrainLabel()
Getter of the individual ID column in the training data.
Model$getTrainLabel()
...
any
getLrner()
Getter of the learner use to fit the model.
Model$getLrner()
setId()
Setter of the model ID.
Model$setId(id)
id
character
ID value
predict()
Predict target values for the new data (from class TestData) taken as into.
Model$predict(testing_data, use_var_sel, ind_subset = NULL)
testing_data
TestData
An object from class TestData.
use_var_sel
boolean
If TRUE, selected variables available at each layer are used.
ind_subset
vector
Subset of individual IDs to be predicted.
...
Further parameters to be passed to the basic predict function.
The predicted object are returned. The predicted object must be either a vector or a list containing a field predictions with predictions.
clone()
The objects of this class are cloneable with this method.
Model$clone(deep = FALSE)
deep
Whether to make a deep clone.
interSIM
.The dataset is a list containing training and testing data,
called training
and testing
respectively. Each data is a list
containing the following multi_omics at each layer.
data(multi_omics)
data(multi_omics)
A list with training and testing data contaning methylation, gene expressions and protein expressions data.
methylation
: A data.frame
containing the simulated methylation dataset.
genexpr
: A data.frame
containing the gene expression dataset.
proteinexpr
: A data.frame
containing the protein expression dataset.
target
: A data.frame
with two columns, containing patient IDs and values of target variable.
Predict function for models from class bestLayerLearner
.
## S3 method for class 'bestLayerLearner' predict(object, data, ...)
## S3 method for class 'bestLayerLearner' predict(object, data, ...)
object |
|
data |
|
... |
|
Predicted target values are returned.
set.seed(20240625) x = data.frame(x1 = runif(n = 50L, min = 0, max = 1)) y <- sample(x = 0:1, size = 50L, replace = TRUE) my_model <- bestLayerLearner(x = x, y = y) x_new <- data.frame(x1 = rnorm(10L)) my_predictions <- predict(object = my_model, data = x_new)
set.seed(20240625) x = data.frame(x1 = runif(n = 50L, min = 0, max = 1)) y <- sample(x = 0:1, size = 50L, replace = TRUE) my_model <- bestLayerLearner(x = x, y = y) x_new <- data.frame(x1 = rnorm(10L)) my_predictions <- predict(object = my_model, data = x_new)
#' The predict.cobra
function makes predictions on new data using a trained COBRA object.
## S3 method for class 'cobra' predict(object, data, ...)
## S3 method for class 'cobra' predict(object, data, ...)
object |
An object of class "cobra" created by the |
data |
A |
... |
Additional arguments (currently not used). |
A vector of predictions for the new data.
# Example usage set.seed(123) x_train <- data.frame(a = rnorm(10L), b = rnorm(10L)) y_train <- sample(0L:1L, size = 10L, replace = TRUE) # Train the model with epsilon optimization cobra_model <- cobra(x = x_train, y = y_train, tune = "epsilon") # Make predictions on new data set.seed(156) x_new <- data.frame(a = rnorm(5L), b = rnorm(5L)) prediction <- predict(object = cobra_model, data = x_new)
# Example usage set.seed(123) x_train <- data.frame(a = rnorm(10L), b = rnorm(10L)) y_train <- sample(0L:1L, size = 10L, replace = TRUE) # Train the model with epsilon optimization cobra_model <- cobra(x = x_train, y = y_train, tune = "epsilon") # Make predictions on new data set.seed(156) x_new <- data.frame(a = rnorm(5L), b = rnorm(5L)) prediction <- predict(object = cobra_model, data = x_new)
Computes predictions for the Testing object passed as argument.
## S3 method for class 'Training' predict(object, testing, ind_subset = NULL, ...)
## S3 method for class 'Training' predict(object, testing, ind_subset = NULL, ...)
object |
|
testing |
|
ind_subset |
|
... |
|
The final predicted object. All layers and the meta layer are predicted.
Predict function for models from class weightedMeanLearner
.
## S3 method for class 'weightedMeanLearner' predict(object, data, na_rm = FALSE, ...)
## S3 method for class 'weightedMeanLearner' predict(object, data, na_rm = FALSE, ...)
object |
|
data |
|
na_rm |
|
... |
|
Predicted target values are returned.
set.seed(20240625) x <- data.frame(x1 = rnorm(50L)) y <- sample(x = 0:1, size = 50L, replace = TRUE) my_model <- weightedMeanLearner(x = x, y = y) x_new <- data.frame(x1 = rnorm(10L)) my_predictions <- predict(object = my_model, data = x_new)
set.seed(20240625) x <- data.frame(x1 = rnorm(50L)) y <- sample(x = 0:1, size = 50L, replace = TRUE) my_model <- weightedMeanLearner(x = x, y = y) x_new <- data.frame(x1 = rnorm(10L)) my_predictions <- predict(object = my_model, data = x_new)
This class implements PredictData object to be predicted. A PredictData object can only exist as a component of a PredictLayer or a PredictMetaLayer object.
fuseMLR::Data
-> PredictData
new()
Initialize a new object from the current class.
PredictData$new(id, ind_col, data_frame)
id
character
Object ID.
ind_col
character
Column name containing individual IDs.
data_frame
data.frame
data.frame
containing data.
print()
Printer
PredictData$print(...)
...
any
getPredictData()
Getter of the current predicted data.frame
wihtout individual
ID variable.
PredictData$getPredictData()
The data.frame
without individual ID nor target variables is returned.
getPredictLayer()
Getter of the current layer.
PredictData$getPredictLayer()
The layer (from class PredictLayer) on which the current train data are stored is returned.
setPredictLayer()
Assigns a predicted layer to the predicted data.
PredictData$setPredictLayer(predict_layer)
predict_layer
PredictLayer(1)
The current object
clone()
The objects of this class are cloneable with this method.
PredictData$clone(deep = FALSE)
deep
Whether to make a deep clone.
This class is designed for predictions.
The Predicting is structured as followed:
PredictLayer: Exists for each modality.
PredictData: Related class for modality-specific predictions.
PredictMetaLayer: Related class for meta predictions.
PredictData: Specific to the meta layer, it is set up internally after cross-validation.
Use the function train
for training and predict
for predicting.
TODO: Do not export me.
fuseMLR::HashTable
-> Predicting
new()
constructor
Predicting$new(id, ind_col)
id
character
Predicting id.
ind_col
character
Name of column of individuals IDS
print()
Printer
Predicting$print(...)
...
any
createMetaTestData()
Creates a new modality-specific predictions dataset based on layer predictions.
Predicting$createMetaTestData(meta_layer_id)
meta_layer_id
(character(1)
)
ID of the meta layer where the testing meta data will be stored.
A TestData is returned.
getIndIDs()
Gather individual IDs from all layer.
Predicting$getIndIDs()
A data.frame
containing individuals IDs.
getPredictMetaLayer()
Getter of the meta layer.
Predicting$getPredictMetaLayer()
Object from class PredictMetaLayer
getIndCol()
Getter of the individual column name.
Predicting$getIndCol()
This class implements a layer. A PredictLayer object can only exist as a component of a Predicting object.
A predicted layer can only contain PredictData.
fuseMLR::HashTable
-> PredictLayer
new()
constructor
PredictLayer$new(id)
id
character
The layer ID.
print()
Printer
PredictLayer$print(...)
...
any
getPredicting()
Getter of the current predicting object
PredictLayer$getPredicting()
The current predicting object is returned.
getIndIDs()
Getter of IDS from the current layer.
PredictLayer$getIndIDs()
A data.frame
containing individuals IDs values.
getPredictData()
Getter of the predicted data stored on the current layer.
PredictLayer$getPredictData()
The stored PredictData object is returned.
setPredicting()
Assigns a predicting object to the predicted layer.
PredictLayer$setPredicting(predicting)
predicting
Predicting
The current object
summary()
Generate summary.
PredictLayer$summary()
Training, Lrner, TrainData, TestData and Model
This class implement a predicted meta layer. A PredictMetaLayer can only exist as unique element of a Training object.
A predicted meta layer can only contain a PredictData object.
fuseMLR::HashTable
-> PredictMetaLayer
new()
constructor
PredictMetaLayer$new(id, predicting)
id
character
predicting
Predicting
print()
Printer
PredictMetaLayer$print(...)
...
any
getPredicting()
Getter of the current predicting object
PredictMetaLayer$getPredicting()
The current predicting object is returned.
getIndIDs()
Getter of IDS from the current layer.
PredictMetaLayer$getIndIDs()
A data.frame
containing individuals IDs values.
getPredictData()
Getter of the predicted data.
PredictMetaLayer$getPredictData()
The stored PredictData object is returned.
openAccess()
Open access to the meta layer. A meta learner is only modifiable if the access is opened.
PredictMetaLayer$openAccess()
closeAccess()
Close access to the meta layer to avoid accidental modification.
PredictMetaLayer$closeAccess()
getAccess()
Getter of the current access to the meta layer.
PredictMetaLayer$getAccess()
Summaries a fuseMLR
Testing object.
## S3 method for class 'Testing' summary(object, ...)
## S3 method for class 'Testing' summary(object, ...)
object |
|
... |
|
Summaries a fuseMLR
Training object.
## S3 method for class 'Training' summary(object, ...)
## S3 method for class 'Training' summary(object, ...)
object |
|
... |
|
This class implements the target object. A Target object can only exist as a component of a Training object.
fuseMLR::Data
-> Target
new()
Initialize a new object from the current class.
Target$new(id, data_frame, training)
id
character
The Object ID.
data_frame
data.frame
data.frame
containing data.
training
Training
Training where to store the current object.
print()
Printer
Target$print(...)
...
any
summary()
Summary
Target$summary(...)
...
any
getData()
Getter of the current data.frame
wihtout individual
ID nor target variables.
Target$getData()
The data.frame
without individual ID nor target variables is returned.
getTargetValues()
Getter of target values stored on the current training layer.
Target$getTargetValues()
The observed target values stored on the current training layer are returned.
getTargetName()
Getter of the target variable name.
Target$getTargetName()
getTraining()
Getter of the current training object.
Target$getTraining()
The training layer (from class Training) on which the current train data are stored is returned.
setData()
Getter of the current data.frame
wihtout individual
ID nor target variables.
Target$setData(data_frame)
data_frame
data.frame
data.frame to be set.
Title
clone()
The objects of this class are cloneable with this method.
Target$clone(deep = FALSE)
deep
Whether to make a deep clone.
TrainLayer, Lrner, Model, TestData
This class implements TestData object to be predicted. A TestData object can only exist as a component of a TestLayer or a TestMetaLayer object.
fuseMLR::Data
-> TestData
new()
Initialize a new object from the current class.
TestData$new(id, data_frame, new_layer)
id
character
Object ID.
data_frame
data.frame
data.frame
containing data.
new_layer
TestLayer
Layer where to store the current object.
ind_col
character
Column name containing individual IDs.
print()
Printer
TestData$print(...)
...
any
getData()
Getter of the current data.frame
wihtout individual
ID variable.
TestData$getData()
The data.frame
without individual ID nor target variables is returned.
getTestLayer()
Getter of the current layer.
TestData$getTestLayer()
The layer (from class TestLayer) on which the current train data are stored is returned.
clone()
The objects of this class are cloneable with this method.
TestData$clone(deep = FALSE)
deep
Whether to make a deep clone.
This is a primary classes of fuseMLR. An object from this class is designed to contain multiple layers, but only one new meta layer.
A Testing object is structured as followed:
fuseMLR::HashTable
-> Testing
new()
constructor
Testing$new(id, ind_col, verbose = TRUE)
id
character
Testing id.
ind_col
character
Name of column of individuals IDS in testing data.frame.
verbose
boolean
Warning messages will be displayed if set to TRUE.
print()
Printer
Testing$print(...)
...
any
getIndIDs()
Gather individual IDs from all layer.
Testing$getIndIDs()
A data.frame
containing individuals IDs.
getTestMetaLayer()
Getter of the meta layer.
Testing$getTestMetaLayer()
Object from class TestMetaLayer
getIndCol()
Getter of the individual column name.
Testing$getIndCol()
getVerbose()
Getter of the verbose setting.
Testing$getVerbose()
getData()
Retrieve modality-specific prediction data.
Testing$getData()
A list
containing all (base and meta) models.
upset()
UpSet plot to show an overview of the overlap of individuals across various layers.
Testing$upset(...)
...
any
Further parameters to be passed to the the upset
function from package UpSetR
.
summary()
Generate testing summary
Testing$summary()
This class implements a layer. A TestLayer object can only exist as a component of a Predicting object.
A predicted layer can only contain TestData.
fuseMLR::HashTable
-> TestLayer
new()
constructor
TestLayer$new(id, testing)
id
character
Testing layer id.
testing
Testing
print()
Printer
TestLayer$print(...)
...
any
getTesting()
Getter of the current Testing object.
TestLayer$getTesting()
The current Testing object is returned.
getIndIDs()
Getter of IDS from the current layer.
TestLayer$getIndIDs()
A data.frame
containing individuals IDs values.
getTestData()
Getter of the predicted data stored on the current layer.
TestLayer$getTestData()
The stored TestData object is returned.
checkTestDataExist()
Check whether a new data has been already stored.
TestLayer$checkTestDataExist()
Boolean value
summary()
Generate summary.
TestLayer$summary()
Training, Lrner, TrainData, TestData and Model
This class implement a predicted meta layer. A TestMetaLayer can only exist as unique element of a Training object.
A predicted meta layer can only contain a TestData object.
fuseMLR::HashTable
-> TestMetaLayer
new()
constructor
TestMetaLayer$new(id, testing)
id
character
Testing meta-layer id.
testing
Testing
print()
Printer
TestMetaLayer$print(...)
...
any
getTesting()
Getter of the current testing object.
TestMetaLayer$getTesting()
The current testing object is returned.
getTestData()
Getter of the training dataset stored on the current layer.
TestMetaLayer$getTestData()
The stored TestData object is returned.
openAccess()
Open access to the meta layer. A meta learner is only modifiable if the access is opened.
TestMetaLayer$openAccess()
closeAccess()
Close access to the meta layer to avoid accidental modification.
TestMetaLayer$closeAccess()
getAccess()
Getter of the current access to the meta layer.
TestMetaLayer$getAccess()
setTestData()
Create and set an TestData object to the current new meta learner.
TestMetaLayer$setTestData(id, ind_col, data_frame)
id
character(1)
ID of the TestData object to be instanciated.
ind_col
character(1)
Name of individual column IDs.
data_frame
data.frame(1)
data.frame
of layer specific predictions.
checkTestDataExist()
Check whether a new data has been already stored.
TestMetaLayer$checkTestDataExist()
Boolean value
This class implements the training data. A TrainData object can only exist as a component of a TrainLayer or a TrainMetaLayer object.
fuseMLR::Data
-> TrainData
new()
Initialize a new object from the current class.
TrainData$new(id, data_frame, train_layer)
id
character
The Object ID.
data_frame
data.frame
data.frame
containing data.
train_layer
TrainLayer
Training layer where to store the current object.
print()
Printer
TrainData$print(...)
...
any
summary()
Summary
TrainData$summary(...)
...
any
getData()
Getter of the current data.frame
wihtout individual
ID nor target variables.
TrainData$getData()
The data.frame
without individual ID nor target variables is returned.
getTargetValues()
Getter of target values stored on the current training layer.
TrainData$getTargetValues()
The observed target values stored on the current training layer are returned.
getTargetName()
Getter of the target variable name.
TrainData$getTargetName()
getTrainLayer()
Getter of the current training layer.
TrainData$getTrainLayer()
The training layer (from class TrainLayer) on which the current train data are stored is returned.
getTestLayer()
Getter of the current layer.
TrainData$getTestLayer()
The layer (from class TestLayer) on which the current train data are stored is returned.
setDataFrame()
Set a new data.frame
to the current object.
TrainData$setDataFrame(data_frame)
data_frame
data.frame
The current object is returned.
clone()
The objects of this class are cloneable with this method.
TrainData$clone(deep = FALSE)
deep
Whether to make a deep clone.
TrainLayer, Lrner, Model, TestData
This is a primary classes of fuseMLR. An object from this class is designed to contain multiple training layers, but only one meta training layer.
The Training class is structured as followed:
TrainLayer: Specific layer containing:
TrainMetaLayer: Basically a TrainLayer, but with some specific properties.
Use the function train
for training and predict
for predicting.
fuseMLR::HashTable
-> Training
new()
constructor
Training$new( id, ind_col, target, target_df, problem_type = "classification", verbose = TRUE )
id
character
ind_col
character
Name of column of individuals IDS.
target
character
Name of the target variable.
target_df
data.frame
Data frame with two columns: individual IDs and response variable values.
problem_type
character
Either "classification" or "regression".
verbose
boolean
Warning messages will be displayed if set to TRUE.
print()
Printer
Training$print(...)
...
any
trainLayer()
Train each layer of the current Training.
Training$trainLayer(ind_subset = NULL, use_var_sel = FALSE, verbose = TRUE)
ind_subset
character
Subset of individuals IDs to be used for training.
use_var_sel
boolean
If TRUE, selected variables available at each layer are used.
verbose
boolean
Warning messages will be displayed if set to TRUE.
Returns the object itself, with a model for each layer.
predictLayer()
Predicts values given new data.
Training$predictLayer(testing, ind_subset = NULL)
testing
TestData
Object of class TestData.
ind_subset
vector
Subset of individuals IDs to be used for training.
A new Training with predicted values for each layer.
createMetaTrainData()
Creates a meta training dataset and assigns it to the meta layer.
Training$createMetaTrainData( resampling_method, resampling_arg, use_var_sel, impute = TRUE )
resampling_method
function
Function for internal validation.
resampling_arg
list
List of arguments to be passed to the function.
use_var_sel
boolean
If TRUE, selected variables available at each layer are used.
impute
boolean
If TRUE, mode or median based imputation is performed on the modality-specific predictions.
The current object is returned, with a meta training dataset assigned to the meta layer.
train()
Trains the current object. All leaners and the meta learner are trained.
Training$train( ind_subset = NULL, use_var_sel = FALSE, resampling_method = NULL, resampling_arg = list(), seed = NULL )
ind_subset
vector
ID subset to be used for training.
use_var_sel
boolean
If TRUE, variable selection is performed before training.
resampling_method
function
Function for internal validation. If not specify, the resampling
function from the package caret
is used for a 10-folds cross-validation.
resampling_arg
list
List of arguments to be passed to the function.
seed
integer
Random seed. Default is NULL, which generates the seed from R
.
The current object is returned, with each learner trained on each layer.
predict()
Compute predictions for a testing object.
Training$predict(testing, ind_subset = NULL)
testing
Testing
A new testing object to be predicted.
ind_subset
vector
Vector of IDs to be predicted.
The predicted object. All layers and the meta layer are predicted. This is the final predicted object.
varSelection()
Variable selection on the current training object.
Training$varSelection(ind_subset = NULL, verbose = TRUE)
ind_subset
vector
ID subset of individuals to be used for variable selection.
verbose
boolean
Warning messages will be displayed if set to TRUE.
The current layer is returned with the resulting model.
getTargetValues()
Gather target values from all layer.
Training$getTargetValues()
A data.frame
containing individuals IDs and corresponding target values.
getIndIDs()
Gather individual IDs from all layer.
Training$getIndIDs()
A data.frame
containing individuals IDs.
getLayer()
Get a layer of a given ID.
Training$getLayer(id)
id
character
The ID of the layer to be returned.
The TrainLayer object is returned for the given ID.
getTrainMetaLayer()
Getter of the meta layer.
Training$getTrainMetaLayer()
Object from class TrainMetaLayer
getModel()
Retrieve models from all layer.
Training$getModel()
A list
containing all (base and meta) models.
getData()
Retrieve modality-specific predictions.
Training$getData()
A list
containing all (base and meta) models.
removeLayer()
Remove a layer of a given ID.
Training$removeLayer(id)
id
character
The ID of the layer to be removed.
The TrainLayer object is returned for the given ID.
removeTrainMetaLayer()
Remove the meta layer from the current Training object.
Training$removeTrainMetaLayer()
getIndCol()
Getter of the individual column name.
Training$getIndCol()
getTarget()
Getter of the target variable name.
Training$getTarget()
getVerbose()
Getter of the verbose setting.
Training$getVerbose()
getUseVarSel()
Getter of the use_var_sel field.
Training$getUseVarSel()
getVarSelDone()
Getter of the use_var_sel field.
Training$getVarSelDone()
increaseNbTrainedLayer()
Increase the number of trained layer.
Training$increaseNbTrainedLayer()
checkTargetExist()
Check whether a target object has already been stored.
Training$checkTargetExist()
Boolean value
getTargetObj()
Getter of the target object.
Training$getTargetObj()
getProblemTyp()
Getter of the problem type.
Training$getProblemTyp()
setImpute()
Set imputation action na.action.
Training$setImpute(impute)
impute
character
How to handle missing values.
testOverlap()
Test that individuals overlap over layers. At least five individuals must overlapped.
Training$testOverlap()
upset()
UpSet plot to show an overview of the overlap of individuals across various layers.
Training$upset(...)
...
any
Further parameters to be passed to the upset
function from package UpSetR
.
summary()
Generate training summary
Training$summary()
Testing and Predicting
This class implements a traning layer. A TrainLayer object can only exist as a component of a Training object.
A training layer is structured as followed:
TrainData: Data to be used to train the learner.
Lrner: Includes a learning function and the package implementing the function.
Model: The result of training the learner on the training data.
VarSel: Includes a variable selection function and the package implementing the function.
A training layer can train its learner on its training data and store the resulting model. See the public function Layer$train()
below.
A training layer can make predictions for a new layer passed as argument to its predict function. See the public function Layer$predict()
below.
fuseMLR::HashTable
-> TrainLayer
new()
constructor
TrainLayer$new(id, training)
id
character
Training layer id.
training
Training
print()
Printer
TrainLayer$print(...)
...
any
getTraining()
Getter of the current training object.
TrainLayer$getTraining()
The current training object is returned.
getTargetObj()
Getter of the target object.
TrainLayer$getTargetObj()
train()
Trains the current layer.
TrainLayer$train(ind_subset = NULL, use_var_sel = FALSE, verbose = TRUE)
ind_subset
vector
ID subset of individuals to be used for training.
use_var_sel
boolean
If TRUE, variable selection is performed before training.
verbose
boolean
Warning messages will be displayed if set to TRUE.
The current layer is returned with the resulting model.
varSelection()
Variable selection on the current layer.
TrainLayer$varSelection(ind_subset = NULL, verbose = TRUE)
ind_subset
vector
ID subset of individuals to be used for variable selection.
verbose
boolean
Warning messages will be displayed if set to TRUE.
The current layer is returned with the resulting model.
predict()
Predicts values for the new layer taking as argument.
TrainLayer$predict(new_layer, use_var_sel, ind_subset = NULL)
new_layer
TrainLayer
use_var_sel
boolean
If TRUE, selected variables available at each layer are used.
ind_subset
vector
A new PredictLayer object with the predicted data is returned.
getTrainData()
Getter of the training dataset stored on the current layer.
TrainLayer$getTrainData()
The stored TrainData object is returned.
getTargetValues()
Getter of target values from the current layer.
TrainLayer$getTargetValues()
A data.frame
containing individuals IDs and corresponding target values.
getIndIDs()
Getter of IDS from the current layer.
TrainLayer$getIndIDs()
A data.frame
containing individuals IDs values.
getTestData()
Getter of the new data.
TrainLayer$getTestData()
The stored TestData object is returned.
getLrner()
Getter of the learner.
TrainLayer$getLrner()
The stored Lrner object is returned.
getVarSel()
Getter of the variable selector.
TrainLayer$getVarSel()
The stored VarSel object is returned.
getModel()
Getter of the model.
TrainLayer$getModel()
The stored Model object is returned.
checkLrnerExist()
Check whether a learner has been already stored.
TrainLayer$checkLrnerExist()
Boolean value
checkModelExist()
Check whether a model has been already stored.
TrainLayer$checkModelExist()
Boolean value
checkVarSelExist()
Check whether a variable selection tool has been already stored.
TrainLayer$checkVarSelExist()
Boolean value
checkTrainDataExist()
Check whether a training data has been already stored.
TrainLayer$checkTrainDataExist()
Boolean value
summary()
Generate summary.
TrainLayer$summary()
Training, Lrner, TrainData, TestData and Model
This class implement a meta meta layer. A TrainMetaLayer can only exist as unique element of a Training object.
A layer is structured as followed:
Lrner: It is set by the user to be trained on the meta training data.
TrainData: It are modality-specific prediction data, automatically created by the internal cross validation.
Model: The meta model, result of training the learner on the training data, and therefore, not to be set by the user.
TestData: The meta new data to be predicted, consisting in predictions obtained from each layer.
A meta layer can train its meta learner on the meta training data and store the resulting meta model. The meta layer can predict values given a new meta layer.
fuseMLR::HashTable
-> TrainMetaLayer
new()
constructor
TrainMetaLayer$new(id, training)
id
character
Id of training meta-layer.
training
Training
print()
Printer
TrainMetaLayer$print(...)
...
any
getTraining()
Getter of the current training object.
TrainMetaLayer$getTraining()
The current training object is returned.
getTargetObj()
Getter of the target object.
TrainMetaLayer$getTargetObj()
train()
Trains the current layer.
TrainMetaLayer$train(ind_subset = NULL, verbose = TRUE)
ind_subset
vector
ID subset of individuals to be used for training.
verbose
boolean
Warning messages will be displayed if set to TRUE.
The current layer is returned with the resulting model.
predict()
Predicts values for the new layer taking as argument.
TrainMetaLayer$predict(new_layer, ind_subset = NULL)
new_layer
TrainLayer
A trained TrainLayer object.
ind_subset
vector
Index subset.
A new object with the predicted values is returned.
impute()
Imputes missing values in modality-specific predictions. Only mode and median based imputations are actually supported.
TrainMetaLayer$impute(impute_fct = NULL, impute_param = NULL)
impute_fct
character
An imputation function to use instead of median or mode imputation.
This parameter is actually not used.
This corresponds to median or mode based imputation.
impute_param
list
The list of parameters to call the imputation function. Not yet implemented!
A new object with the predicted values is returned.
getTrainData()
Getter of the training dataset stored on the current layer.
TrainMetaLayer$getTrainData()
The stored TrainData object is returned.
getLrner()
Getter of the learner.
TrainMetaLayer$getLrner()
The stored Lrner object is returned.
getModel()
Getter of the model.
TrainMetaLayer$getModel()
The stored Model object is returned.
openAccess()
Open access to the meta layer. A meta learner is only modifiable if the access is opened.
TrainMetaLayer$openAccess()
closeAccess()
Close access to the meta layer to avoid accidental modification.
TrainMetaLayer$closeAccess()
getAccess()
Getter of the current access to the meta layer.
TrainMetaLayer$getAccess()
setTrainData()
Create and set an TrainData object to the current meta learner.
TrainMetaLayer$setTrainData(id, ind_col, data_frame)
id
character
ID of the TrainData object to be instanciated.
ind_col
character
Name of individual column IDs.
data_frame
data.frame
data.frame
of layer specific predictions.
checkLrnerExist()
Check whether a training data has been already stored.
TrainMetaLayer$checkLrnerExist()
Boolean value
checkModelExist()
Check whether a model has been already stored.
TrainMetaLayer$checkModelExist()
Boolean value
checkTrainDataExist()
Check whether a training data has been already stored.
TrainMetaLayer$checkTrainDataExist()
Boolean value
set2NotTrained()
Only usefull to reset status FALSE after cross validation.
TrainMetaLayer$set2NotTrained()
summary()
Generate summary.
TrainMetaLayer$summary()
An upset plot of overlapping individuals.
upsetplot(object, ...)
upsetplot(object, ...)
object |
|
... |
|
This class implements a learner. A VarSel object can only exist as a component of a TrainLayer or a TrainMetaLayer object.
new()
Variable selection parameter list.
Learner ID.
VarSel$new( id, package = NULL, varsel_fct, varsel_param, train_layer, na_action = "na.rm" )
id
character
Package that implements the variable selection function.
If NULL, the variable selection function is called from
the current environment.
package
character
Variable selection function name. Note: Variable selection functions, except Boruta
, must return a vector of selected variables.
varsel_fct
character
Variable selection parameters.
varsel_param
list
Layer on which the learner is stored.
train_layer
TrainLayer
The training layer where to store the learner.
na_action
character
Handling of missing values in meta-data. Set to "na.keep" to keep missing values, "na.rm" to remove individuals with missing values or "na.impute" (only applicable on meta-data) to impute missing values in meta-data. Only median and mode based imputations are actually handled. With the "na.keep" option, ensure that the provided learner can handle missing values.
If TRUE
, the individuals with missing predictor values will be removed from the training dataset.
print()
Printer
VarSel$print(...)
...
any
summary()
Summary
VarSel$summary(...)
...
any
interface()
Learner and prediction parameter interface. Use this function
to provide how the following parameters are named in the learning
function (lrn_fct
) you provided when creating the learner, or in the predicting function.
VarSel$interface( x = "x", y = "y", object = "object", data = "data", extract_var_fct = NULL )
x
string
Name of the argument to pass the matrix of independent variables in the original learning function.
y
string
Name of the argument to pass the response variable in the original learning function.
object
string
Name of the argument to pass the model in the original predicting function.
data
character
Name of the argument to pass new data in the original predicting function.
extract_var_fct
character
or function
If the variable selection function that is called does not return a vector, then
use this argument to specify a (or a name of a) function that can be used to extract vector of selected variables.
Default value is NULL, if selected variables are in a vector.
varSelection()
Tains the current learner (from class Lrner) on the current training data (from class TrainData).
VarSel$varSelection(ind_subset = NULL)
ind_subset
vector
Individual ID subset on which the training will be performed.
The resulting model, from class Model, is returned.
getTrainLayer()
The current layer is returned.
VarSel$getTrainLayer()
TrainLayer object.
getId()
Getter of the current learner ID.
VarSel$getId()
The current learner ID.
getPackage()
Getter of the variable selection package implementing the variable selection function.
VarSel$getPackage()
The name of the package implementing the variable selection function.
getVarSubSet()
Getter of the list of selected variables.
VarSel$getVarSubSet()
List of selected variables..
getParamInterface()
The current parameter interface is returned.
VarSel$getParamInterface()
A data.frame of interface.
getNaAction()
The current layer is returned.
VarSel$getNaAction()
getExtractVar()
The function to extract selected variables is returned.
VarSel$getExtractVar()
A data.frame of interface.
Variable selection on the training object passed as argument.
varSelection(training, ind_subset = NULL)
varSelection(training, ind_subset = NULL)
training |
|
ind_subset |
|
A data.frame
with two columns: layer and selected variables.
Fouodo C.J.K, Bleskina M. and Szymczak (2024). fuseMLR: An R package for integrative prediction modeling of multi-omics data, paper submitted.
Modality-specific learner are assessed and weighted based on their predictions. This function is intended to be (internally) used as meta-learner in fuseMLR.
weightedMeanLearner(x, y, weighted = TRUE, perf = NULL, na_rm = FALSE)
weightedMeanLearner(x, y, weighted = TRUE, perf = NULL, na_rm = FALSE)
x |
|
y |
|
weighted |
|
perf |
|
na_rm |
|
Object of class weightedMeanLearner
with the vector of estimated weights pro layer.
set.seed(20240624L) x = data.frame(x1 = runif(n = 50L, min = 0, max = 1), x2 = runif(n = 50L, min = 0, max = 1)) y = sample(x = 0L:1L, size = 50L, replace = TRUE) my_model = weightedMeanLearner(x = x, y = y)
set.seed(20240624L) x = data.frame(x1 = runif(n = 50L, min = 0, max = 1), x2 = runif(n = 50L, min = 0, max = 1)) y = sample(x = 0L:1L, size = 50L, replace = TRUE) my_model = weightedMeanLearner(x = x, y = y)