Skip to content
This repository has been archived by the owner on Sep 20, 2022. It is now read-only.

[HIVEMALL-220] Implement Cofactor #167

Open
wants to merge 264 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
264 commits
Select commit Hold shift + click to select a range
d0d4821
Copied and pasted from OnlineMatrixFactorizationUDTF
Sep 26, 2018
f0d13b3
Implemented part of process()
Sep 26, 2018
686fd19
WIP: implementing process
Sep 26, 2018
0b0e924
Removing zero features in input Feature[], Added test for nnz array c…
Sep 26, 2018
ec814f4
Assert non-zero entries in Features when updating cooccurrence matrix
Sep 26, 2018
c2bc3c8
fix: use feature index instead of index of modified array for co-occu…
Sep 26, 2018
824e409
Removed SPPMI matrix as it will be supplied by the user
jaxony Sep 27, 2018
c696f9d
Rename weight variable names to be the same as cofacto.py
jaxony Sep 27, 2018
d6e129e
Change Feature#parseFeature method to public
jaxony Oct 11, 2018
1c2f770
WIP: changed input argument format to process(...)
jaxony Oct 11, 2018
2a5ff40
Refactor: less code duplication
jaxony Oct 11, 2018
7882e59
Better implementation of minibatch data structure, updated writing da…
jaxony Oct 11, 2018
7463e43
Remove RatingInitializer interface
jaxony Oct 11, 2018
e0162d4
Replace setBetaBias with flexible implementation
jaxony Oct 11, 2018
8ad08f3
Reformatting
jaxony Oct 11, 2018
2078cf8
More reformatting
jaxony Oct 11, 2018
0e8e616
record items and users that are trainable
jaxony Oct 15, 2018
ce60cbd
directly initialize factors and biases for parents
jaxony Oct 15, 2018
4ac308a
add lambda for each weight matrix
jaxony Oct 15, 2018
270f4b6
implemented updateTheta logic
jaxony Oct 15, 2018
9b3a126
Refactor CofactorModel and added precomputeWTW test (passing)
jaxony Oct 15, 2018
a38e0e1
Make calculateDelta static method
jaxony Oct 15, 2018
d8b7c7e
Refactored and added test for CofactorModel#initIdentity
jaxony Oct 15, 2018
2ff4ac5
Refactor and added test for CofactorModel#calculateA
jaxony Oct 16, 2018
27d7166
Extract method for filtering trainable features
jaxony Oct 16, 2018
0466fe3
Fix and test CofactorModel#calculateDelta
jaxony Oct 16, 2018
3518d5a
Refactor, fixes and added test for solve_implicitFeedback
jaxony Oct 16, 2018
781dba8
Capitalize the matrix B
jaxony Oct 16, 2018
a5fa774
Update factors in using Model instance
jaxony Oct 16, 2018
41a28b4
Refactor WTWpR calculation, removed redundant test for identity matri…
jaxony Oct 16, 2018
a31297d
Rename meanRating to globalBias so same as cofacto.py
jaxony Oct 16, 2018
d50c2e2
refactor: rename meanRating to globalBias
jaxony Oct 16, 2018
7c9c416
CofactorModel#calculateA returns RealVector
jaxony Oct 16, 2018
965da0a
Change signature of filterTrainableFeatures method
jaxony Oct 16, 2018
37086e2
Remove unused method
jaxony Oct 16, 2018
954f522
Implemented and tested CofactorModel#calculateRSD
jaxony Oct 16, 2018
194dec7
implemented and tested Cofactor#updateBeta
jaxony Oct 16, 2018
3148d78
Autoformat
jaxony Oct 16, 2018
55b1fc9
Skip update in updateTheta if no items are trainable
jaxony Oct 16, 2018
c2ecc0a
fix comment
jaxony Oct 17, 2018
9656574
Rename variables to make CofactorModel#calculateRSD more generic for …
jaxony Oct 17, 2018
ed463f1
Rename test function name
jaxony Oct 17, 2018
c7702a9
Implemented and tested CofactorModel#updateGamma
jaxony Oct 17, 2018
d5cb24a
Refactor vector updates for better testing
jaxony Oct 17, 2018
60c667d
fix: deleted wrong test
jaxony Oct 17, 2018
da282c2
refactor test and method of beta vector calculation
jaxony Oct 17, 2018
1cd75f7
refactor test and method for calculating new theta vector
jaxony Oct 17, 2018
90a72ac
uncomment
jaxony Oct 17, 2018
96320c6
redundant initialization of biases
jaxony Oct 17, 2018
64d58ce
test CofactorModel#recordAsParent
jaxony Oct 17, 2018
0d2263f
comments
jaxony Oct 17, 2018
17a1ee2
CofactorModel: rename calculateDelta to calculateWTWSubset
jaxony Oct 17, 2018
2207278
Implemented and tested updateBetaBias method
jaxony Oct 17, 2018
def8904
Implemented updateGammaBias method
jaxony Oct 17, 2018
1ddee6a
Refactor CofactorModel's public API, cleaner now
jaxony Oct 17, 2018
713cf0a
rename method
jaxony Oct 17, 2018
442f2ad
rename variables to context, features, sppmi
jaxony Oct 17, 2018
0d43fd2
fix process and num args
jaxony Oct 17, 2018
f099715
mostly renaming
jaxony Oct 17, 2018
690e4ae
logic for UDTF training: read minibatches from file
jaxony Oct 18, 2018
c63a310
renaming variable
jaxony Oct 18, 2018
44e01d3
Implemented loss methods and tests
jaxony Oct 18, 2018
3106bbb
format: move methods
jaxony Oct 18, 2018
28653e9
test calculateEmbedLoss
jaxony Oct 18, 2018
bf91d43
fix: bug with dimension param of RealVector in calculateA
jaxony Oct 18, 2018
990e1f7
test decreasing training loss on toy data, fixed bugs in CofactorMode…
jaxony Oct 18, 2018
cb293b2
Added public weight getters for debugging
jaxony Oct 18, 2018
ffbb5d6
fix test data for toy example
jaxony Oct 18, 2018
06ef254
Change test sppmi
jaxony Oct 18, 2018
9408da9
Add predict method
jaxony Oct 18, 2018
64de491
revert change
jaxony Oct 18, 2018
8f1d555
Change Map to Object2DoubleMap for biases
jaxony Oct 18, 2018
392516b
IDE warning fixes
jaxony Oct 18, 2018
49bb7e0
refactor: RealVector to double[], RealMatrix to double[][], remove id…
jaxony Oct 19, 2018
10227d5
fix efficiency: remove redundant hash lookups when getting bias value
jaxony Oct 19, 2018
b2d4f82
Add @VisibleForTesting annotation on protected methods
jaxony Oct 19, 2018
7cf0127
change protected to private methods
jaxony Oct 19, 2018
a86e797
Fix regression bug: unintended side effect of changing precomputed ma…
jaxony Oct 19, 2018
d1680d7
minor fixes: style, imports
jaxony Oct 19, 2018
a140512
test: Add strict test for exact match of prediction results
jaxony Oct 19, 2018
54b7143
refactor test
jaxony Oct 19, 2018
c9c653f
test: explicit feedback training
jaxony Oct 19, 2018
8365443
context is string, not feature
jaxony Oct 19, 2018
64dadc6
remove @Before
jaxony Oct 19, 2018
d34963f
make fields private
jaxony Oct 19, 2018
08cf7c8
added cofactor udtf tests, and fixes for these tests
jaxony Oct 19, 2018
5dc17bd
add new test for writing + reading one item and one user to/from disk…
jaxony Oct 19, 2018
190d2d8
use static method
jaxony Oct 19, 2018
b8ff1e4
WIP: adding test for sample data from MovieLens
jaxony Oct 22, 2018
17301b0
Feature.java: do not rely on overriding equals()
jaxony Oct 22, 2018
d0b40e2
Removing unnecessary input args
jaxony Oct 22, 2018
f650f1f
Fix bug: do not clear input buffer at the start of every minibatch re…
jaxony Oct 22, 2018
b8f97ff
WIP: optimized CofactorModel and removed loss calculation from traini…
jaxony Oct 22, 2018
7d7cb6d
style: Added annotations and final
jaxony Oct 23, 2018
785c6d9
return annotations, remove unused method
jaxony Oct 23, 2018
d4fc47e
more annotations
jaxony Oct 23, 2018
086a158
implemented and tested global bias
jaxony Oct 23, 2018
cec4007
fix test: fix read from disk
jaxony Oct 23, 2018
1ada21f
added global bias to CofactorModel constructor
jaxony Oct 23, 2018
a8e4bd1
added Hive function description
jaxony Oct 23, 2018
7ffed7d
fixed output feilds
jaxony Oct 23, 2018
f854d41
implemented forward model
jaxony Oct 23, 2018
7d23eee
added isValidation flag to UDTF and refactored tests, comments
jaxony Oct 26, 2018
e019f54
added validation samples to minibatch
jaxony Oct 26, 2018
4020222
feature: added validation metric as option to UDTF
jaxony Oct 26, 2018
18ffd57
refactor processOptions
jaxony Oct 26, 2018
f6934ea
Replace HashMap with Weights class
jaxony Oct 26, 2018
16c029e
Implemented validation feature and convergence (AUC)
jaxony Oct 29, 2018
82b5fae
fix model forwarding bug, cofactor UDTF description
jaxony Oct 30, 2018
a41d56e
WIP: debugging
jaxony Nov 1, 2018
eec0242
WIP: changing training data format
jaxony Nov 1, 2018
b9949c1
passing most tests
jaxony Nov 2, 2018
9864926
WIP
jaxony Nov 8, 2018
d847ff9
feat: implementing CofactorModel as subclass of FactorizedModel
Sep 20, 2018
3d5e682
feat: implemented getter and setter for contextBias
Sep 20, 2018
fcfb017
feat: added co-occurrence matrix accumulation
Sep 20, 2018
7680cd1
CofactorModel: add hyperparameters c0 and c1
Sep 20, 2018
5a568f7
CofactorModel: c hyperparameters are final
Sep 20, 2018
f409674
CofactorModel: Change c0 and c1 to float
Sep 20, 2018
6102151
WIP: Implementing cofactor UDTF
Sep 20, 2018
f1d852b
CofactorizationUDTF: rename scaling parameters
Sep 20, 2018
890a3bc
CofactorizationUDTF: Implement option parsing for cofactorization opt…
Sep 20, 2018
e09cb53
make Cofactor standalone class: copied code from FactorizedModel
Sep 26, 2018
7b9f6b1
Remove user bias because cofactor paper does not use it
Sep 26, 2018
4e2e104
Added numItems to getOptions
Sep 26, 2018
d7dbcfe
Implementing RatingInitializer
Sep 26, 2018
6af05a9
Added batch training class
Sep 26, 2018
cd89332
Copied and pasted from OnlineMatrixFactorizationUDTF
Sep 26, 2018
3ed5626
Implemented part of process()
Sep 26, 2018
2d21e3c
WIP: implementing process
Sep 26, 2018
1447751
Removing zero features in input Feature[], Added test for nnz array c…
Sep 26, 2018
a249c02
Assert non-zero entries in Features when updating cooccurrence matrix
Sep 26, 2018
9c7c87d
fix: use feature index instead of index of modified array for co-occu…
Sep 26, 2018
1b38f9f
Removed SPPMI matrix as it will be supplied by the user
jaxony Sep 27, 2018
a12f4c2
Rename weight variable names to be the same as cofacto.py
jaxony Sep 27, 2018
628886d
Change Feature#parseFeature method to public
jaxony Oct 11, 2018
798edcb
WIP: changed input argument format to process(...)
jaxony Oct 11, 2018
dd3f5e9
Refactor: less code duplication
jaxony Oct 11, 2018
61fa84a
Better implementation of minibatch data structure, updated writing da…
jaxony Oct 11, 2018
e04a773
Remove RatingInitializer interface
jaxony Oct 11, 2018
249fb1c
Replace setBetaBias with flexible implementation
jaxony Oct 11, 2018
a8c7ebd
Reformatting
jaxony Oct 11, 2018
7f10838
More reformatting
jaxony Oct 11, 2018
af7d45c
record items and users that are trainable
jaxony Oct 15, 2018
648eb6a
directly initialize factors and biases for parents
jaxony Oct 15, 2018
d4ed275
add lambda for each weight matrix
jaxony Oct 15, 2018
2f9e5a5
implemented updateTheta logic
jaxony Oct 15, 2018
266c92a
Refactor CofactorModel and added precomputeWTW test (passing)
jaxony Oct 15, 2018
5a2599f
Make calculateDelta static method
jaxony Oct 15, 2018
f7a5b7e
Refactored and added test for CofactorModel#initIdentity
jaxony Oct 15, 2018
9572166
Refactor and added test for CofactorModel#calculateA
jaxony Oct 16, 2018
ee216f1
Extract method for filtering trainable features
jaxony Oct 16, 2018
92e7b8f
Fix and test CofactorModel#calculateDelta
jaxony Oct 16, 2018
ce53fcc
Refactor, fixes and added test for solve_implicitFeedback
jaxony Oct 16, 2018
72b2cbf
Capitalize the matrix B
jaxony Oct 16, 2018
070584d
Update factors in using Model instance
jaxony Oct 16, 2018
294e204
Refactor WTWpR calculation, removed redundant test for identity matri…
jaxony Oct 16, 2018
0fc1fa9
Rename meanRating to globalBias so same as cofacto.py
jaxony Oct 16, 2018
91f66f9
refactor: rename meanRating to globalBias
jaxony Oct 16, 2018
bb6be12
CofactorModel#calculateA returns RealVector
jaxony Oct 16, 2018
b85c1e7
Change signature of filterTrainableFeatures method
jaxony Oct 16, 2018
9e72b8e
Remove unused method
jaxony Oct 16, 2018
7562b3f
Implemented and tested CofactorModel#calculateRSD
jaxony Oct 16, 2018
f82ae91
implemented and tested Cofactor#updateBeta
jaxony Oct 16, 2018
ed682c8
Autoformat
jaxony Oct 16, 2018
26362bc
Skip update in updateTheta if no items are trainable
jaxony Oct 16, 2018
e5b6083
fix comment
jaxony Oct 17, 2018
88c6327
Rename variables to make CofactorModel#calculateRSD more generic for …
jaxony Oct 17, 2018
2cf08ea
Rename test function name
jaxony Oct 17, 2018
39d8897
Implemented and tested CofactorModel#updateGamma
jaxony Oct 17, 2018
3764b4f
Refactor vector updates for better testing
jaxony Oct 17, 2018
f77c3cf
fix: deleted wrong test
jaxony Oct 17, 2018
1e1d8bc
refactor test and method of beta vector calculation
jaxony Oct 17, 2018
c181715
refactor test and method for calculating new theta vector
jaxony Oct 17, 2018
44f4a74
uncomment
jaxony Oct 17, 2018
6ba9c99
redundant initialization of biases
jaxony Oct 17, 2018
552ec0d
test CofactorModel#recordAsParent
jaxony Oct 17, 2018
d9b262f
comments
jaxony Oct 17, 2018
b13b47b
CofactorModel: rename calculateDelta to calculateWTWSubset
jaxony Oct 17, 2018
8fcbbdc
Implemented and tested updateBetaBias method
jaxony Oct 17, 2018
2c1752a
Implemented updateGammaBias method
jaxony Oct 17, 2018
6c42eb4
Refactor CofactorModel's public API, cleaner now
jaxony Oct 17, 2018
b1ef959
rename method
jaxony Oct 17, 2018
8e04afd
rename variables to context, features, sppmi
jaxony Oct 17, 2018
3290ee1
fix process and num args
jaxony Oct 17, 2018
af42cbd
mostly renaming
jaxony Oct 17, 2018
d1487ae
logic for UDTF training: read minibatches from file
jaxony Oct 18, 2018
f72b5ad
renaming variable
jaxony Oct 18, 2018
c258941
Implemented loss methods and tests
jaxony Oct 18, 2018
a43d4d6
format: move methods
jaxony Oct 18, 2018
4169a23
test calculateEmbedLoss
jaxony Oct 18, 2018
223d57a
fix: bug with dimension param of RealVector in calculateA
jaxony Oct 18, 2018
6d30ce7
test decreasing training loss on toy data, fixed bugs in CofactorMode…
jaxony Oct 18, 2018
708e914
Added public weight getters for debugging
jaxony Oct 18, 2018
867ce62
fix test data for toy example
jaxony Oct 18, 2018
0466227
Change test sppmi
jaxony Oct 18, 2018
367ce95
Add predict method
jaxony Oct 18, 2018
844eff6
revert change
jaxony Oct 18, 2018
8ee95e8
Change Map to Object2DoubleMap for biases
jaxony Oct 18, 2018
53c5171
IDE warning fixes
jaxony Oct 18, 2018
ddcbade
refactor: RealVector to double[], RealMatrix to double[][], remove id…
jaxony Oct 19, 2018
642263d
fix efficiency: remove redundant hash lookups when getting bias value
jaxony Oct 19, 2018
44bbd3a
Add @VisibleForTesting annotation on protected methods
jaxony Oct 19, 2018
91be916
change protected to private methods
jaxony Oct 19, 2018
11fadbe
Fix regression bug: unintended side effect of changing precomputed ma…
jaxony Oct 19, 2018
032bc77
minor fixes: style, imports
jaxony Oct 19, 2018
75dbd47
test: Add strict test for exact match of prediction results
jaxony Oct 19, 2018
0989ef4
refactor test
jaxony Oct 19, 2018
dc7a05d
test: explicit feedback training
jaxony Oct 19, 2018
b9fbc76
context is string, not feature
jaxony Oct 19, 2018
22a0309
remove @Before
jaxony Oct 19, 2018
a4b7404
make fields private
jaxony Oct 19, 2018
efd311d
added cofactor udtf tests, and fixes for these tests
jaxony Oct 19, 2018
5f783aa
add new test for writing + reading one item and one user to/from disk…
jaxony Oct 19, 2018
9d8c4e7
use static method
jaxony Oct 19, 2018
69f8237
WIP: adding test for sample data from MovieLens
jaxony Oct 22, 2018
4335c5e
Feature.java: do not rely on overriding equals()
jaxony Oct 22, 2018
0200114
Removing unnecessary input args
jaxony Oct 22, 2018
2fccf23
Fix bug: do not clear input buffer at the start of every minibatch re…
jaxony Oct 22, 2018
7e11095
WIP: optimized CofactorModel and removed loss calculation from traini…
jaxony Oct 22, 2018
780efe5
style: Added annotations and final
jaxony Oct 23, 2018
7496be8
return annotations, remove unused method
jaxony Oct 23, 2018
d00aef4
more annotations
jaxony Oct 23, 2018
9420d7d
implemented and tested global bias
jaxony Oct 23, 2018
bbb450f
fix test: fix read from disk
jaxony Oct 23, 2018
a55885b
added global bias to CofactorModel constructor
jaxony Oct 23, 2018
b1a2dfb
added Hive function description
jaxony Oct 23, 2018
cbd531b
fixed output feilds
jaxony Oct 23, 2018
d152fd2
implemented forward model
jaxony Oct 23, 2018
a0f80ac
added isValidation flag to UDTF and refactored tests, comments
jaxony Oct 26, 2018
1c6a36f
added validation samples to minibatch
jaxony Oct 26, 2018
780a717
feature: added validation metric as option to UDTF
jaxony Oct 26, 2018
1b5b781
refactor processOptions
jaxony Oct 26, 2018
566b118
Replace HashMap with Weights class
jaxony Oct 26, 2018
074ef59
Implemented validation feature and convergence (AUC)
jaxony Oct 29, 2018
1c5d38d
fix model forwarding bug, cofactor UDTF description
jaxony Oct 30, 2018
15eed73
WIP: debugging
jaxony Nov 1, 2018
d9fcd14
WIP: changing training data format
jaxony Nov 1, 2018
04d32e3
passing most tests
jaxony Nov 2, 2018
08bc526
WIP
jaxony Nov 8, 2018
a8bb232
Removed unnessesary imports and applied formatter
myui Nov 14, 2018
146b54c
Reverted unnecessary change
myui Nov 14, 2018
34bab16
Resolved IDE warnings
myui Nov 14, 2018
cee7ed1
Moved package to hivemall.factorization.cofactor
myui Nov 14, 2018
43c79d6
Applied formatter
myui Nov 14, 2018
547054a
Applied refactoring to cofactor_predict UDF
myui Nov 14, 2018
6d8dd6d
Revised scope and annotations
myui Nov 14, 2018
1a46e85
Applied refactoring
myui Nov 14, 2018
ecd30db
Merge branch 'feature/cofactor-feature-array' of https://github.com/j…
jaxony Nov 26, 2018
ca56dd1
WIP: Testing on Hive
jaxony Nov 30, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,139 changes: 1,139 additions & 0 deletions core/src/main/java/hivemall/factorization/cofactor/CofactorModel.java

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package hivemall.factorization.cofactor;

import java.util.List;

import javax.annotation.Nonnull;
import javax.annotation.Nullable;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.UDFType;
import org.apache.hadoop.hive.serde2.io.DoubleWritable;
import org.apache.hadoop.io.FloatWritable;

@Description(name = "cofactor_predict",
value = "_FUNC_(array<float> theta, array<float> beta) - Returns the prediction value")
@UDFType(deterministic = true, stateful = false)
public final class CofactorizationPredictUDF extends UDF {

private static final DoubleWritable ZERO = new DoubleWritable(0.d);

// reused result variable
private final DoubleWritable result = new DoubleWritable();

@Nonnull
public DoubleWritable evaluate(@Nullable List<FloatWritable> Pu,
@Nullable List<FloatWritable> Qi) throws HiveException {
if (Pu == null || Qi == null) {
return ZERO;
}

final int PuSize = Pu.size();
final int QiSize = Qi.size();
// workaround for TD
if (PuSize == 0) {
return ZERO;
} else if (QiSize == 0) {
return ZERO;
}

if (QiSize != PuSize) {
throw new HiveException("|Pu| " + PuSize + " was not equal to |Qi| " + QiSize);
}

double ret = 0.d;
for (int k = 0; k < PuSize; k++) {
FloatWritable Pu_k = Pu.get(k);
if (Pu_k == null) {
continue;
}
FloatWritable Qi_k = Qi.get(k);
if (Qi_k == null) {
continue;
}
ret += Pu_k.get() * Qi_k.get();
}
result.set(ret);
return result;
}
}
Loading