Skip to content

Commit

Permalink
MOD:multi class label
Browse files Browse the repository at this point in the history
  • Loading branch information
scharoun committed May 19, 2017
1 parent e06affe commit 412dcb9
Show file tree
Hide file tree
Showing 4 changed files with 33 additions and 11 deletions.
16 changes: 12 additions & 4 deletions docs/data_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Ytk-learn supports weighted trainning and sample weight scaling, so each line is
- regression: real number, e.g. 122.23
- binary classification: 0 or 1
- binary cross_entropy: real number, belongs to [0, 1], e.g. 0.245
- multiclass classficication: one-hot coding, length of labels is equal to class number, e.g. 0, 0, 1, 0 means that target is class2 in 4 classes(class_0, class_1, class_2, class_3)
- multiclass classficication: integer number from 0 to K-1, K is class number.
- multiclass cross_entropy: length of labels is equal to class number, sum of labels equals to 1, e.g. 0.2,0.1,0.4,0.3 (4 class in total)


Expand Down Expand Up @@ -52,7 +52,11 @@ then the data format becomes
1###0###height:2.0,weight:50.0,size:80
```

There are two samples, the first '10' in the first line is the sample weight, the '1' in ``10###1###``is the sample label. Similarly, the second sample has weight '1' and label '0'. In binary classification, label '1' stands for positive sample while '0' stands for negative sample. You can also provide probability values in [0,1] as label, indicating the probability that the sample is positive, e.g.
There are two samples, the first '10' in the first line is the sample weight, the '1' in ``10###1###``is the sample label. Similarly, the second sample has weight '1' and label '0'. In binary classification, label '1' stands for positive sample while '0' stands for negative sample.

- **binary cross-entropy**

You can also provide probability values in [0,1] as label, indicating the probability that the sample is positive, e.g.

```
10###0.9###height:1.6,weight:56.0,size:102
Expand All @@ -68,10 +72,14 @@ then the data format becomes
- **multi-class classification**

```
1###0,0,0,1,0,0###height:1.6,weight:56.0,size:102
1###3###height:1.6,weight:56.0,size:102
```

The first '1' is the sample weight, '0,0,0,1,0,0' is the sample label which means target is the fourth class of the six classes. Ytklearn also supports probability values in [0,1] as label, to indicate the probability that the sample belongs to this class, e.g.
label is 3 means target is the fourth class of the six classes.

- **multi cross-entropy**

Ytk-learn also supports probability values in [0,1] as label, to indicate the probability that the sample belongs to this class, e.g.

```
1###0.01,0.02,0.01,0.75,0.01,0.2###height:1.6,weight:56.0,size:102
Expand Down
2 changes: 1 addition & 1 deletion docs/gbdt.config.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ data {
# regression : weight###label###f1name:f1value,f2name:f2value,...(###init_prediction)
# binary classification : weight###label(0 or 1)###f1name:f1value,f2name:f2value,...(###init_prediction)
# binary cross_entropy : weight###label(0~1, positive)###f1name:f1value,f2name:f2value,...(###init_prediction)
# multi-class classficication : weight###0,0,1,0(total 4 class, only one is 1, others is 0, this belongs to 3'rd class)###f1name:f1value,f2name:f2value,...(###init_prediction)
# multi classficication : weight###2(this belongs to 3'rd class, label must be in range [0,K-1], K is class number)###f1name:f1value,f2name:f2value,...
# multi cross_entropy : weight###0.2,0.1,0.4,0.3(total 4 class, sum must be equal 1.0)###f1name:f1value,f2name:f2value,...(###init_prediction)
# (###init_prediction) is optional. If you provide initial prediction(s) for each sample, set optimization.sample_dependent_base_prediction to true
delim {
Expand Down
2 changes: 1 addition & 1 deletion docs/multiclass_linear.config.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ data {
},
# delimiters, see data_format.md for more details
# multi classficication : weight###0,0,1,0(total 4 class, only one is 1, others is 0, this belongs to 3'rd class)###f1name:f1value,f2name:f2value,...
# multi classficication : weight###2(this belongs to 3'rd class, label must be in range [0,K-1], K is class number)###f1name:f1value,f2name:f2value,...
# multi cross_entropy : weight###0.2,0.1,0.4,0.3(total 4 class, sum must be equal 1.0)###f1name:f1value,f2name:f2value,...
# if your model is tree model and
delim {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

package com.fenbi.ytklearn.dataflow;

import com.fenbi.ytklearn.exception.YtkLearnException;
import com.fenbi.ytklearn.feature.FeatureHash;
import com.fenbi.ytklearn.fs.IFileSystem;
import com.fenbi.mp4j.exception.Mp4jException;
Expand Down Expand Up @@ -103,13 +104,26 @@ protected void updateY() throws Exception {
protected boolean yExtract(String line, String[] info) throws Exception {
String []linfo = info[1].split(coreParams.y_delim);

if (linfo.length != K) {
throw new Exception("label num must equal:" + K + ", line:" + line);
if (linfo.length != K && linfo.length != 1) {
throw new Exception("label num must = " + K + ", or = 1, line:" + line);
}
for (int i = 0; i < K; i++) {
label[i] = Float.parseFloat(linfo[i]);

if (linfo.length == 1) {
for (int i = 0; i < K; i++) {
label[i] = 0;
}
int clazz = Integer.parseInt(linfo[0]);
if (clazz >= K) {
throw new YtkLearnException("multi classification label must in range [0,K-1]!\n" + line);
}
label[clazz] = 1.0f;
} else {
for (int i = 0; i < K; i++) {
label[i] = Float.parseFloat(linfo[i]);
}
}


if (coreParams.needYStat) {
labelIdx = -1;
for (int i = 0; i < K; i++) {
Expand All @@ -125,7 +139,7 @@ protected boolean yExtract(String line, String[] info) throws Exception {


if (labelIdx == -1) {
throw new Exception("label error! line:" + line);
throw new Exception("label error for y sampling! line:" + line);
}
float rate = coreParams.ySampling[labelIdx];
if (rate <= 1.0f) {
Expand Down

0 comments on commit 412dcb9

Please sign in to comment.