-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathREADME
108 lines (75 loc) · 3.37 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
Miyamoto Task Queue
Miyamoto is a fast, clusterable task queue inspired by Google App Engine's task
queue. This means it speaks HTTP with a RESTful API for enqueuing tasks and
uses HTTP callbacks (webhooks) for processing tasks. Worker daemons can be any
web daemon.
End result? Super easy asynchronous processing.
Like the App Engine task queue, Miyamoto features task scheduling and rate
limiting. Unlike the App Engine task queue, it provides idempotency semantics
and helps you avoid duplicate execution of tasks.
Miyamoto is designed for:
- High throughput (fast)
- Fault tolerance (highly available)
- Data replication (minimal lost messages)
- No major dependencies or client libs (easy to set up and use)
- Simple design/implementation (easy to hack!)
Miyamoto is bad if you want:
- Slow disk persistence (but we may add it)
- Traditional task polling
- Smart clients
- AMQP, JMS, STOMP or other interfaces*
Not out of the box, but you can still achieve:
- Synchronous tasks
- Return values / result storage
- Workers behind firewall / NAT
- Dynamically adding to the worker pool
* = Excluding ZeroMQ, which is used internally, so there are optional ZMQ interfaces
GETTING STARTED
Make sure you have Python 2.6+ installed with gevent. Run on port 8088 with:
INTERFACE=127.0.0.1 python scripts/start.py
USAGE
Once it's running (currently hardcoded to 8088), set up a pretend web server with netcat:
nc -l 9099
Now let's post a task to it scheduled to run in 5 seconds:
curl -d "task.url=http://localhost:9099/worker&task.countdown=5&foo=bar" http://localhost:8088/queue
You should get back some JSON:
{"status": "scheduled", "id": "7e998f34-bed7-44f4-80cd-49620b0ab562"}
And in 5 seconds, you should see on your listening netcat:
POST /worker HTTP/1.1
Accept-Encoding: identity
Content-Length: 7
Host: localhost:9099
Content-Type: application/x-www-form-urlencoded
X-Task-Id: 7e998f34-bed7-44f4-80cd-49620b0ab562
Connection: close
User-Agent: miyamoto/0.1
foo=bar
You can also add tasks as JSON objects. Here's one without a delay:
curl -H "Content-Type: applicatoin/json" -d '{"url": "http://localhost:9099/worker", "params": {"foo": "bar"}}' http://localhost:8088/queue
WHAT'S GOING ON?
In theory, you'd run a cluster of Miyamoto nodes. Each one is exactly the same,
except that one will be the leader. You seed the leader with the first node, but
otherwise, leaders are elected if, say, the leader dies. Internally there is a list
of hosts in the cluster that form a ring. Whenever you add tasks to any node, it
will send the task round robin to another node in the cluster for basic load
balancing. But it will do this N times where N is a replication factor. Each replicant
has an added delay to execution. If any of them run, the others are canceled.
In this way you can schedule tasks and not worry that hosts might go down, achieving
HA of service and data.
ADVERTISED FEATURES NOT YET DONE
- rate limiting
- idempotency safeguards
- error handling
INSPIRATION
App Engine - API, features
Celery - features, anti-features
Memcache - ideas around consistent hashing
Membase - replication, clustering
ZooKeeper - realtime group membership, leader election
Kestrel - speed, simplicity
ZeroMQ - oh, the possibilities
CONTRIBUTORS
Evan Cooke - offset replica scheduling idea
JJ Behrens - code review, feedback
LICENSE
MIT