Xuefeng Xi, Victor S. Sheng, , Binqi Sun Lei Wang and Fuyuan Hu
Abstract: Multi-target regression is concerned with the simultaneous prediction of multiple continuous target variables based on the same set of input variables. It has received relatively small attention from the Machine Learning community. However,multi-target regression exists in many real-world applications. In this paper we conduct extensive experiments to investigate the performance of three representative multi-target regression learning algorithms (i.e. Multi-Target Stacking (MTS), Random Linear Target Combination (RLTC), and Multi-Objective Random Forest (MORF)), comparing the baseline single-target learning. Our experimental results show that all three multi-target regression learning algorithms do improve the performance of the single-target learning.Among them, MTS performs the best, followed by RLTC, followed by MORF. However,the single-target learning sometimes still performs very well, even the best. This analysis sheds the light on multi-target regression learning and indicates that the single-target learning is a competitive baseline for multi-target regression learning on multi-target domains.
Keywords: Multi-target regression, multi-label classification, multi-target stacking.
Multi-target regression, also known as multivariate or multi-output regression, is another instance of the more general learning task of multi-target prediction. Here the prediction targets are real-valued, as opposed to the closely related task of multi-label classification where the target variables are binary. Multi-output regression has recently emerged and extensively studied for many computer vision tasks, e.g., head pose estimation [Hara and Chellappa (2014)], human body pose estimation [Toshev and Szegedy (2014)] and viewpoint estimation [Torki and Elgammal (2011)]. Moreover, many researchers have found their applications, e.g. camera re-localization [Shotton, Glocker, Zach et al. (2013)]and cardiac volume estimation [Zhen, Wang, Islam et al. (2014)], can be elaborately solved by transferring the corresponding original problem into a multi-output regression task, which not only substantially outperforms conventional approaches but also offers a more compact and exquisite mathematical formulation to circumvent the difficulty in conventional approaches, e.g. the inverse problems [Guzman-Rivera, Kohli, Glocker et al. (2014)].
Multi-target regression and multi-label classification are supervised machine learning algorithms. They make predictions based on a set of examples. For example, historical stock prices can be used to predict the future prices. Each example used for training can be labeled with the value of interest, for example, the stock price. Both Multi-target regression and multi-label classification algorithms look for patterns in those values, but each algorithm looks for different types of patterns. After an algorithm has found the best pattern, it can use the pattern to make predictions for unlabeled testing data the future price[Microsoft (2017)]. Multi-target regression and multi-label classification are closely related to each other. Despite that multi-target regression is a little more general. Multi-label learning is often treated as a special case of multi-target regression in statistics. However,we could more precisely state that both are instances of learning for predicting multiple targets, which could be real-valued, binary, ordinal, categorical or even of mixed type.
Current existing multi-target regression learning algorithms are developed based on two basic approaches: Algorithm adaptation and problem transformation. Problem transformation is easy to understand. It is to transfer a multi-target regression problem into multiple traditional single-target regression problems. After a multi-target regression problem is transferred into multiple single-target regression ones. All the traditional regression learning algorithms can be applied directly to build a regressor for each singletarget dataset and make prediction for its correlated test instances. The prediction for a multi-target instance is made by aggregating outputs from autonomous regressors. Multi-Target Stacking (MTS) [Spyromitros-Xioufis, Tsoumakas, Groves et al. (2016)] is chosen as the representative of this group.
The second method used in multi-target regression is algorithm adaptation. It extends existing traditional regression algorithms to perform multi-target regression directly, for example, Random Linear Target Combination (RLTC) and multi-objective random forest(MORF) [Kocev, Vens, Struyf et al. (2007)]. Algorithm adaption completely differs from problem transformation. Instead, the algorithm learns the structure and correlations that exist among multi-targets directly. Thus, it is very useful to investigate the performance of multi-target regression learning algorithms, which are developed based on the two approaches. The investigating results will guide data mining researchers in their future research on developing better multi-target regression learning algorithms.
The rest of the paper is organized as follows. Section 2 introduces the three representative multi-target regression learning algorithms which we will make comparison empirically.Section 3, we describe the experiments we have conducted. They consist of the setting of the experiments, the experimental results, and the analysis of the experimental results.Section 4 concludes with a summary of our work and a discussion of future work.
In this section, we first provide a brief description of multi-target regression learning, and then we briefly review three representative multi-target learning algorithms (i.e. Multi-Target Stacking (MTS), random linear target combination (RLTC), and Multi-Objective Random Forest (MORF)) in this section, which are used in our experiments in Section 3.
Multi-target regression is a statistical process for estimating the relationships among variables. Let us consider a training datasetDwithNinstances containing a value assignment for each variableX1,...,Xm, Y1,...,Yd,i.e.Each instance is characterized by an input vector ofmdescriptive or predictive variablesand an output vector of d target variablesThe task is to learn a multi-target regression model fromDconsisting of finding a functionhthat assigns to each instance, given by the vectorx, a vectoryofdtarget values:
Where?Xjand?Yidenote the sample spaces of each predictive variable Xj, for allj∈{1,...,m}, and each target variableYi, for alli∈{1,...,d}, respectively. Note that, all target variables are considered to be continuous here. The learned multi-target model will be used afterwards to simultaneously predict the valuesof all target variables of the new incoming unlabeled instances[Hanen, Gherardo,Concha et al. (2015)].
In this way, the dependencies of the target attributes are implicitly modeled as well,producing better predictive performance. The other advantage of described multi-target model is that the size and complexity of the produced model is smaller than the combined size of the single-target models.
Multi-Target Stacking(MTS) is a representative multi-target regression learning algorithm developed via problem transformation. Its brief introduction is as follows.
Stacking (also called meta ensembling) is a model ensembling technique used to combine information from multiple predictive models to generate a new model [Gorman (2016)].The multi-target stacking algorithm is inspired by where stacked generalization was used to deal with multi-label classification [Godbole and Sarawagi (2004)]. Multi-target stacking training is a two-stage process. First,dsingle-target models are learned respectively in a single-target learning mode. However, instead of directly using these models for prediction, multi-target stacking includes an additional training stage where a second set of d meta-models are learned, one for each targetYi,i∈{1,...,d}.
Each meta-model is learned on a transformed training set
transformed input vector consisting of the original input vector of the training set augmented by predictions (or estimates) of their target variables yielded by the first-stage models. In fact, MTS is based on the idea that a second-stage model is able to correct the prediction of the first-stage models by using the predictions of other variables obtained in the first-stage models.
The predictions for a new instancex(N+1)are obtained by generating the first-stage models inducing the estimated output vector, and then MTS applies the second-stage models on the transformed input vector
Random Linear Target Combinations (RLTC) is a representative multi-target regression learning algorithm [Tsoumakas, Spyromitros-Xioufis, Vrekou et al. (2014)] developed via algorithm adaption. Its brief introduction is as follows.
Consider a set ofminput variablesx∈Rmand a set ofdtarget variablesy∈Rd. There are a set ofNtraining examples:where X and Y are matrices of sizeN×mandN×d, respectively. RLTC constructsr>>dnew target variables via corresponding random linear combinations ofy. To achieve this, this approach defines a coefficient matrixCof sized×rfilled with random values uniformly chosen from [0..1].Each column of this matrix contains the coefficients of a linear combination of the target variables. Multiplying Y with C leads to a transformed multi-target training set D’=(X, Z),where Z=YC is a matrix of sizeN×rwith the values of the new target variables. A userspecified multi-target regression learning algorithm is then applied to D’ in order to build a corresponding model.
Note that RLTC expects that the original target variables take values from the same domain, as otherwise their linear combinations could be dominated by the values of targets with a much wider domain than the others. To ensure this, it applies 0-1 normalization in order to bring the values of all targets into the range [0…1].
This algorithm considers an additional parameterk∈{2,...,d}for specifying the number of original target variables involved in each random linear combination, by setting the coefficients for the rest of the target variables to zero. A higher k means that potential correlations among more targets are being considered. However, at the same time, it means that the new targets are more difficult to predict, especially in the absence of actual correlations among the targets. Therefore, RLTC hypothesizes that lowkvalues will lead to the best results. In practice, whenk<d, for each linear combination RLTC selectsktargets at random, but with priority to targets with the lowest frequency of participation to previously considered linear combinations. This ensures that all targets will participate inCas equivalently (i.e. with similar frequency) as possible.
Given a new test instance,x’, the multi-targetregression model is first invoked to obtain a vectorz’with r predictions. The estimates for the original target variables are then obtained by solving the following overdetermined (as r>>d) system of linear equations:
Multi-Objective Random Forest (MORF) is a direct multi-output learning approach[Kocev, Vens, Struyf et al. (2007)]. Again, it is another representative multi-target regression learning algorithm developed via algorithm adaption. It integrates one of the most popular ensemble meta-learning approach Bagging with Random Forest. For multitarget learning, instead of building single-target random forests, MORF builds multitarget random forests. Specifically, it builds multi-target random forests based on multiobjective decision trees (MODTs) from different random selected feature sets.
Multi-objective decision trees (MODTs) [Blockeel, De Raedt and Ramon (1998)] are decision trees capable of predicting multiple target attributes at once. The main difference between MODTs with single-target standard decision trees is that MODTs treats the variance function and the prototype function that computes a label for each leaf as parameters. For multi-label classification, the variance function is computed as the sum of the entropies of target variables, i.e.and the prototype function returns a vector containing the majority class for each target of the corresponding training examplesE. For multi-objective regression trees, the sum of the variances of the targets is used, i.e.and the prototype of each leaf is the vector mean of the target vectors of the corresponding training examplesE.
We conducted an extensive experiments to investigate the performance of three popular and representative multi-target learning algorithms (i.e. MTS, RLTC, and MORF),comparing with the baseline (a single-target learning). Before presenting our experimental results, we first discuss the implementations and parameter settings of these algorithms, and then provide a brief description of each dataset used in our experiments.
RLTC is to solve the overdetermined system of linear equations during prediction. It is to learn a single independent regression model for each target. Each regression model is built using gradient boosting [Friedman (2001)] with a 4-terminal node regression tree as the base learner, a learning rate of 0.1 and 100 boosting iterations. The system of linear equations is solved by the un-regularized least squares approach. In our experiments, we generate r=100 new target variables by combining k=2 the original target variables.
Concerning the parameter settings of MTS and RLTC, in MORF we use an ensemble size of 100 trees and the values suggested by Kocev et al. [Kocev, Vens, Struyf et al. (2007)]for the rest of its parameters.
All of the algorithms are implemented within Mulan. Mulan is an open-source Java library for learning from multi-label datasets, which is built on top of Weka, including implementations of bagging, gradient boosting, and regression tree. MTS and RLTC are already integrated in Mulan [Mulan (2010)].
Root Mean Square Error (RMSE) is the standard deviation of the residuals (prediction errors), which is the measure we used to compare the performance of the multi-target learning algorithms. Besides, we chose RMSE because it is commonly used in climatology, forecasting, and regression analysis to verify experimental results.
Although multi-target regression has many interesting applications, quite of a few multitarget regression datasets are publicly available. We conduct experiments on 10 multitarget regression datasets, which can be downloaded from the website of MULAN[Tsoumakas, Katakis and Vlahavas (2010)]. A brief description of each dataset used in our experiments is provided as follows.
Table 1: The characteristics of the 10 datasets used in our experiments
The characteristics of each dataset are shown in Tab. 1, where the 1stcolumn shows the name of each dataset, the abbreviation of each dataset is shown in the 2ndcolumn, the number of instances of each dataset is shown in the 3rdcolumn, the number of input variablespis shown in the 4thcolumn, and the number of output variablesdis shown in the 5thcolumn.
The electrical discharge machining dataset (EDM) [Karalic and Bratko (1997)] was used to study shortening the machining time by reproducing the behavior of a human operator to control the values of two target variables (i.e. gap control and flow control). Each of the target variables takes three distinct numeric values (1, 0, 1). There are 16 continuous input variables, representing mean values and deviations of the observed quantities of considered machining parameters.
I continued collecting, and eventually ended up with some old and valuable cards. But there s one card I would never trade, not even for a Mickey Mantle12 rookie card. I still have that Ken Griffey, Jr., the one with the uneven13 borders and the ragged14 corners, the one that has only plain gray pasteboard on the back instead of statistics.
The solar flare dataset was used for predicting how often three potential types of solar flare (i.e. common, moderate, and severe) occur in a 24 h period [Lichman (2013)]. That is, each target variable counts the number of solar flares of the corresponding type. There are ten input feature variables, which describe active regions on the sun. There are two versions of this dataset (i.e. sf1 and sf2 in Tab. 1). Sf1 contains the data from year 1969,and sf2 contains the dataset from year 1978.
The water quality dataset (wq) was obtained from the Hydrometeorological Institute of Slovenia. It was used to monitor water quality of Slovenian rivers and maintained a database of water quality samples covering the six year period from 1990 to 1995[Dzeroski, Demsar and Grbovic (2000)]. There are 16 different measured chemical parameters and 14 target variables representing bioindicator taxa.
The energy building dataset (enb) [Tsanas and Xifara (2012)] was used to study the effect of eight input variables (relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, glazing area distribution) on two output variables,namely heating load (HL) and cooling load (CL), of residential buildings.
The Concrete Slump dataset (slump) [Yeh (2007)] was used to make predictions on three properties (i.e. slump, flow and compressive strength) of concrete as a function of the content of seven concrete ingredients (i.e. cement, fly ash, blast furnace slag, water,superplasticizer, coarse aggregate, and fine aggregate).
The Andromeda dataset (andro) [Hatzikos, Tsoumakas, Tzanis et al. (2008)] was used to predict the future values of six water quality variables (temperature, pH, conductivity,salinity, oxygen, turbidity) in Thermaikos Gulf of Thessaloniki, Greece. Measurements of the target variables are taken from under-water sensors with a sampling interval of nine seconds and then averaged to get a single measurement for each variable over each day.
The Jura dataset [Goovaerts (1997)] contains the measurements of concentrations of seven heavy metals (cadmium, cobalt, chromium, copper, nickel, lead, and zinc),recorded at 359 locations in the topsoil of a region of the Swiss Jura. The type of land use(Forest, Pasture, Meadow, and Tillage) and rock type (Argovian, Kimmeridgian,Sequanian, Portlandian, and Quaternary) were also recorded for each location.Specifically, the concentration of three metals (i.e. cadmium, copper and lead) is more expensive to measure than other metals. Therefore, the concentration of three metals (i.e.cadmium, copper and lead) are treated as target variables while the remaining metals along with land use type, rock type and the coordinates of each location (15 features in total) are used as input features.
The online product sale dataset (osales) [Kaggle (2012)] was used to predict monthly online sales of consumer products. Each row in this dataset represents a different consumer product that is described by various product features as well as features of an advertising campaign (413 input features in total). There are 12 target variables corresponding to the monthly sales for the first 12 months after the product launches.
The “See Click Predict Fix” dataset (scpf) [Kaggle (2013)] is to quantify and predict the number of views, votes, and comments that a specific issue has received to date, in terms of 23 input features, such as the number of days that an issues stayed online, the source from which the issue was created (e.g. android, iphone, remote api, etc.), the type of the issue (e.g. graffiti, pothole, trash, etc.), and the geographical co-ordinates of the issue.The issues have been collected from four cities (Oakland, Richmond, New Haven,Chicago) in the U.S. and span a period of 12 months (01/2012-12/2012).
We have conducted the experiments on ten datasets, comparing the performance of three multi-target learning methods (i.e. MORF, MTS, and RLTC) with a single-target learning,in terms of the relative root mean squared error (RMSE). Our experimental results of four learning algorithms on ten datasets are shown in Tab. 2.
We further conducted analysis on our experimental results in Tab. 2. We first ranked the performance of the four algorithms on each dataset and have their number of rank #1,rank #2, rank #3, and rank #4 shown in Tab. 3.
Table 2: Comparisons among the four algorithms in terms of RMSE on the ten datasets.The lowest RMSE is in bold, and the second lowest is in italic
From the above Tab. 2 and the following Tab. 3, we can see that MTS performs the best on six out the ten datasets, MORF performs the best on three out of the ten datasets, and Single-Target performs the best on one of the ten datasets. According to the average performance in terms of RMSE (shown in the last row of Tab. 2) and the average ranking(shown in the last row of Tab. 3), which supports each other, we can see that MTS performs the best in general, followed by Single-Target. Although Single-Target only performs the best on one of the ten datasets, it is in the second on four out of the ten dataset, which makes the second best among the four learning algorithms. MORF performs the worst, followed by RLTC.
The results shown in Tabs. 2 and 3 are not consistent with our thoughts on multi-target learning. Especially, the Single-Target learning performs the second, which is better than MORF and RLTC. We usually think that multi-target learning can improve the performance of the single-target learning in multi-target domains. Therefore, we further investigate the detailed performance of each learning algorithm on each target of each dataset, instead of investigating the performance of the four algorithms on each dataset.Our detailed experimental results are shown in Tab. 4.
Table 3: Comparisons among the four algorithms in terms of the ranks of RMSE on the ten datasets
We further conducted analysis on our detailed experimental results in Tab. 4. We first ranked the performance of the four algorithms on each target of each dataset and have their number of rank #1, rank #2, rank #3, and rank #4 shown in Tab. 5.
From Tabs. 4 and 5, we can see that Single-Target takes the first rank seven times out of 51 targets of the ten datasets, MORF takes the first rank 13 times out of 51 targets of the ten datasets, MTS takes the first rank 17 times out of 51 targets of the ten datasets, and RLTC takes the first rank 15 times out of 51 targets of the ten datasets. Note that there exist 51 targets from the ten dataset, which is the summation of the last column in Tab. 1.This analysis can make us understand that the Single-Target learning can perform better than all three multi-target learning algorithms (i.e. MORF, MTS, and RLTC) for some targets of some domains. However, in general, MTS performs the best, followed by RLTC, followed by MORF. Single-Target performs the worst. This analysis sheds the light on multi-target learning and also indicates that the single-target learning is a competitive baseline for multi-target learning on multi-target domains.
Table 4: Our detailed experimental results of the four comparison algorithms in terms of RMSE on each target of each dataset. Again, the lowest RMSE is in bold, and the second is in italic
andro Target_5 0.8088 0.6320 0.6271 0.7454 andro Target_6 0.8268 0.6222 0.7867 0.7466 jura Cd 0.7108 0.6943 0.6862 0.7017 jura Co 0.5428 0.5661 0.5503 0.5579 jura Cu 0.5137 0.5301 0.5217 0.5296 osales Outcome_M1 0.6528 0.6759 0.6547 0.6567 osales Outcome_M2 0.7539 0.7195 0.7517 0.7496 osales Outcome_M3 0.7856 0.7782 0.7787 0.7719 osales Outcome_M4 0.6889 0.7361 0.6825 0.6761 osales Outcome_M5 0.7363 0.7380 0.7199 0.7035 osales Outcome_M6 0.6964 0.7528 0.7032 0.7100 osales Outcome_M7 0.7427 0.7682 0.7403 0.7378 osales Outcome_M8 0.7641 0.7887 0.7614 0.7587 osales Outcome_M9 0.8119 0.7461 0.7325 0.7931 osales Outcome_M10 0.7725 0.7697 0.7643 0.7760 osales Outcome_M11 0.7490 0.7604 0.7423 0.7356 osales Outcome_M12 0.8205 0.8063 0.8193 0.8181 scpf num_views 0.8153 0.8085 0.8048 0.8144 scpf num_votes 0.7200 0.7036 0.7021 0.7242 scpf num_comments 0.9760 0.9883 0.9710 0.9659 average 0.7885 0.8018 0.7025 0.7878
We further analyzed why we have different comparison conclusions shown in Tabs. 3 and 5. This is because the granularity of comparisons is different. Tabs. 2 and 3 are based on datasets. Each dataset is the basic unit in comparisons. However, Tabs. 4 and 5 are based on targets. Each target is the basic unit in comparisons. Since different datasets have a different number of targets. Tab. 5 is a weighted comparison summarization,where the number of targets in each dataset works as the corresponding weight. When the number of targets is great, multi-target learning is preferred. This is reasonable because when there exist a great number of targets in a multi-target domain, there exist some targets that could improve the learning performance of others.
In this paper, we conducted extensive experiments to investigate the performance of four multi-target regression learning algorithms (i.e. Single-Target, MTS, RLTC, and MORF).Our experimental results in terms of RMSE showed that in general MTS performs the best, followed by RLTC, followed by MORF. Single-Target performs the worst.However, Single-Target performs the best on one of the ten datasets, and the second best on four out of the ten datasets. This analysis sheds the light on multi-target learning and also indicates that the single-target learning is a competitive baseline for multi-target learning on multi-target domains.
All of the algorithms that used above, including Single-Target, MTS, RLTC, and MORF,are categorized as problem transformation methods in multi-target learning. All of them first transform a multi-output regression problem into multiple single-target regression problems, then build a model for each target, and finally concatenate all predictions. The main drawback of the Single-Target learning is that the relationships among the targets are ignored, and the targets are predicted independently, which may affect the overall quality of the predictions [Hanen, Gherardo, Concha et al. (2015)]. However, the Single-Target learning is the simplest approach to learn from multi-output regression domains.Both MTS and RLTC employ the correlations between targets to improve performance of multi-target regression learning.
Considering potential real-world applications of multi-target regression, we will continue to evaluate the performance of existing multi-target regression learning algorithms. In the same time, we are going to design novel algorithms for multi-target regression with the insights found in the experiments.
Acknowledgments:This research has been supported by the US National Science Foundation under grant IIS-1115417, the National Natural Science Foundation of China under grant 61728205, 61472267, and Foundation of Key Laboratory in Science and Technology Development Project of Suzhou under grant SZS201609.
Computers Materials&Continua2018年8期