-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfeatures.html
146 lines (145 loc) · 6.73 KB
/
features.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
layout: homepage
title: Features and Capabilities
---
<div class="features-subheader">
<div class="container">
<div class="row quarter-row-spacing mobile-padding">
<div class="col-sm-12">
<h1>{{ page.title }}</h1>
</div>
</div>
<div class="row half-row-spacing vertical-align mobile-padding">
<div class="col-md-6">
<h3>Programmatically Build and Manage Your Training Data</h3>
<p>
Snorkel is a system for programmatically building and managing training datasets.
In Snorkel, users can <i>develop</i> training datasets in hours or days rather than hand-labeling them over
weeks or months.
</p>
<p>
Snorkel currently exposes three key programmatic operations: <i>labeling</i> data, for example using heuristic
rules or distant supervision techniques; <i>transforming</i> data, for example rotating or stretching images
to perform data augmentation; and <i>slicing data</i> into different critical subsets.
Snorkel then automatically models, cleans, and integrates the resulting training data using <a href="#">novel,
theoretically-grounded techniques</a>.
</p>
<p>
Snorkel has been deployed in industry, medicine, science, and government</a> to build new ML applications in a
fraction of the time; for more, see <a href="/use-cases/">tutorials and other <a href="/resources/">
resources</a>.
</p>
</div>
<div class="col-md-1"></div>
<div class="col-md-5">
<img class="features-hero-image" src="/doks-theme/assets/images/layout/Overview.png"
alt="Training Data Operations">
</div>
</div>
<br>
<div class="row half-row-spacing mobile-padding">
<h1>Key Advantages</h1>
<br>
<div class="light-blue-card-container">
<div class="border-card">
<p class="subheadline">Speed</p>
<h3>
Training Data in Hours, Not Months
</h3>
<p>
Label and manage training datasets by writing code to quickly leverage ML for new applications.
</p>
</div>
<div class="border-card">
<p class="subheadline">Flexibility</p>
<h3>Easily Adapt to Changing Settings</h3>
<p>
Adapt training sets to changing conditions or problem specifications by modifying code, rather that
expensive re-labeling.
</p>
</div>
<div class="border-card">
<p class="subheadline">Privacy</p>
<h3>Label Without Eyes On Data</h3>
<p>
Programmatic labeling strategies can be completely decoupled from sensitive data.
</p>
</div>
</div>
</div>
<br>
<div class="mobile-padding">
<h1>Core Operations</h1>
</div>
<div class="row half-row-spacing vertical-align mobile-padding">
<br>
<div class="col-sm-5">
<p class="purple-numbers">01</p>
<h4>Labeling</h4>
<p>
Write <i>labeling functions (LFs)</i> to heuristically or noisily label some subset of the training examples.
Snorkel then models the quality and correlations of these LFs using novel, theoretically-grounded statistical
modeling techniques.
</p>
<a href="/blog/snorkel-programming/" class="btn btn--rounded btn--dark" target="_blank">Blog</a>
<a href="/use-cases/01-spam-tutorial" class="btn btn--rounded btn--dark" target="_blank">Tutorial</a>
<a href="https://arxiv.org/abs/1711.10160" class="btn btn--rounded btn--dark" target="_blank">VLDB'18 Paper</a>
<a href="https://arxiv.org/abs/1605.07723" class="btn btn--rounded btn--dark" target="_blank">NeurIPS'16
Paper</a>
<a href="https://arxiv.org/abs/1810.02840" class="btn btn--rounded btn--dark" target="_blank">AAAI'19 Paper</a>
<a href="https://arxiv.org/abs/1810.02840" class="btn btn--rounded btn--dark" target="_blank">ICML'19 Paper</a>
</div>
<div class="col-sm-1 hidden-xs"></div>
<div class="col-sm-6">
<img class="features-image" src="/doks-theme/assets/images/layout/Labeling.png" alt="Labeling" />
</div>
</div>
<div class="row half-row-spacing vertical-align mobile-padding">
<div class="col-sm-6 hidden-xs">
<img src="/doks-theme/assets/images/layout/Transforming.png" alt="Transforming" />
</div>
<div class="col-sm-1"></div>
<div class="col-sm-5">
<p class="purple-numbers">02</p>
<h4>Transforming</h4>
<p>
Write <i>transformation functions (TFs)</i> to heuristically generate new, modified training examples by
transforming existing ones, a strategy often referred to as <i>data augmentation</i>.
Rather than requiring users to tune these data augmentation strategies by hand, Snorkel uses data augmentation
policies that can be learned automatically.
</p>
<a href="/blog/tanda/" class="btn btn--rounded btn--dark" target="_blank">Blog</a>
<a href="/use-cases/02-spam-data-augmentation-tutorial" class="btn btn--rounded btn--dark"
target="_blank">Tutorial</a>
<a href="https://arxiv.org/abs/1709.01643" class="btn btn--rounded btn--dark" target="_blank">NeurIPS'17
Paper</a>
<a href="https://arxiv.org/abs/1803.06084" class="btn btn--rounded btn--dark" target="_blank">ICML'19 Paper</a>
</div>
<div class="col-sm-12 visible-xs-block">
<img class="features-image" src="/doks-theme/assets/images/layout/Transforming.png" alt="Transforming" />
</div>
</div>
<div class="row half-row-spacing vertical-align mobile-padding">
<div class="col-sm-5">
<p class="purple-numbers">03</p>
<h4>Slicing</h4>
<p>
Write <i>slicing functions (SFs)</i> to heuristically identify subsets of the data the model should
particularly care about, e.g. have extra representative capacity for, due to their difficulty and/or
importance.
Snorkel models slices in the style of multi-task learning and an attention-mechanism is then learned over
these heads.
</p>
<a href="/blog/slicing/" class="btn btn--rounded btn--dark" target="_blank">Blog</a>
<a href="/use-cases/03-spam-data-slicing-tutorial" class="btn btn--rounded btn--dark"
target="_blank">Tutorial</a>
<a href="https://arxiv.org/abs/1909.06349" class="btn btn--rounded btn--dark" target="_blank">NeurIPS'19
Paper</a>
</div>
<div class="col-sm-1"></div>
<div class="col-sm-6">
<img class="features-image" src="/doks-theme/assets/images/layout/Slicing.png" alt="Slicing" />
</div>
</div>
</div>
</div>