Skip to content

Commit

Permalink
move distributed xgboost to wormhole
Browse files Browse the repository at this point in the history
  • Loading branch information
tqchen committed Apr 6, 2015
1 parent 421f5c6 commit 65abc26
Show file tree
Hide file tree
Showing 10 changed files with 9 additions and 239 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Learning about the model: [Introduction to Boosted Trees](http://homes.cs.washin
What's New
==========
* XGBoost now support HDFS and S3
* [Distributed XGBoost now runs on YARN](multi-node/hadoop)!
* [Distributed XGBoost now runs on YARN](https://github.com/dmlc/wormhole/learn/xgboost)!
* [xgboost user group](https://groups.google.com/forum/#!forum/xgboost-user/) for tracking changes, sharing your experience on xgboost
* [Distributed XGBoost](multi-node) is now available!!
* New features in the lastest changes :)
Expand All @@ -37,7 +37,6 @@ What's New
* XGBoost wins [Tradeshift Text Classification](https://kaggle2.blob.core.windows.net/forum-message-attachments/60041/1813/TradeshiftTextClassification.pdf?sv=2012-02-12&se=2015-01-02T13%3A55%3A16Z&sr=b&sp=r&sig=5MHvyjCLESLexYcvbSRFumGQXCS7MVmfdBIY3y01tMk%3D)
* XGBoost wins [HEP meets ML Award in Higgs Boson Challenge](http://atlas.ch/news/2014/machine-learning-wins-the-higgs-challenge.html)


Features
========
* Sparse feature format:
Expand Down
25 changes: 8 additions & 17 deletions multi-node/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,10 @@
Distributed XGBoost
======
This folder contains information of Distributed XGBoost (Distributed GBDT).

Distributed XGBoost is now part of [Wormhole](https://github.com/dmlc/wormhole/learn/xgboost).
See the [Wormhole](https://github.com/dmlc/wormhole/learn/xgboost) for usage examples, build and job submissions.
* The distributed version is built on Rabit:[Reliable Allreduce and Broadcast Library](https://github.com/dmlc/rabit)
- Rabit is a portable library that provides fault-tolerance for Allreduce calls for distributed machine learning
- This makes xgboost portable and fault-tolerant against node failures
* You can run Distributed XGBoost on platforms including Hadoop(see [hadoop folder](hadoop)) and MPI
- Rabit only replies a platform to start the programs, so it should be easy to port xgboost to most platforms

Build
=====
* In the root folder, type ```make```
- If you have C++11 compiler, it is recommended to use ```make cxx11=1```

Notes
====
Expand All @@ -27,11 +20,9 @@ Notes

Solvers
=====
There are two solvers in distributed xgboost. You can check for local demo of the two solvers, see [row-split](row-split) and [col-split](col-split)
* Column-based solver split data by column, each node work on subset of columns,
it uses exactly the same algorithm as single node version.
* Row-based solver split data by row, each node work on subset of rows,
it uses an approximate histogram count algorithm, and will only examine subset of
potential split points as opposed to all split points.
- This is the mode used by current hadoop version, since usually data was stored by rows in many industry system

* Column-based solver split data by column, each node work on subset of columns,
it uses exactly the same algorithm as single node version.
* Row-based solver split data by row, each node work on subset of rows,
it uses an approximate histogram count algorithm, and will only examine subset of
potential split points as opposed to all split points.
- This is the mode used by current hadoop version, since usually data was stored by rows in many industry system
40 changes: 0 additions & 40 deletions multi-node/hadoop/README.md

This file was deleted.

36 changes: 0 additions & 36 deletions multi-node/hadoop/mushroom.hadoop.conf

This file was deleted.

28 changes: 0 additions & 28 deletions multi-node/hadoop/run_mushroom.sh

This file was deleted.

18 changes: 0 additions & 18 deletions multi-node/row-split/README.md

This file was deleted.

20 changes: 0 additions & 20 deletions multi-node/row-split/machine-row-rabit-mock.sh

This file was deleted.

24 changes: 0 additions & 24 deletions multi-node/row-split/machine-row-rabit.sh

This file was deleted.

30 changes: 0 additions & 30 deletions multi-node/row-split/machine-row.conf

This file was deleted.

24 changes: 0 additions & 24 deletions multi-node/row-split/splitrows.py

This file was deleted.

0 comments on commit 65abc26

Please sign in to comment.