forked from HadoopGenomics/Hadoop-BAM
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGELOG.txt
232 lines (170 loc) · 9.23 KB
/
CHANGELOG.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
2014-09-10 --- 7.0.0:
- Switching from Picard/Samtools to HTSJDK
- First release to OSS Sonatype
- Renaming of packages
- Changes to VariantContextCodec: encoding of genotypes generated
by other means than VCF import (thanks to Joel Thibault)
- Change of JDK version requirements (>= 1.6)
2014-04-04 --- 6.2:
- Bugfix: the update of Picard/Tribble introduced a bug into the
VCFInputFormat that affects input files loaded from HDFS that are
larger than one split
- Update to pom.xml:
* SNAPSHOT versions are now two digits, as releases
* Added two missing dependencies for CLI tool
2014-03-17 --- 6.1:
- Update of Picard from 1.93 to 1.107
- Moving from ant to maven build
- Hadoop version 2 compatability
- Bugfixes for BAM reading
2013-07-08 --- 6.0:
- Input and output formats for VCF and BCF. There are no format-specific
InputFormat or OutputFormat classes; instead, AnySAM-like support for both
in one class is provided.
Both compressed and uncompressed BCF can be read, but only compressed BCF
can be output.
This required adding three .jar files to the Hadoop-BAM distribution, two
from Picard and one from Apache Commons: variant-<version>.jar,
tribble-<version>.jar, and commons-jexl-<version>.jar. All of these need
to be provided in the Hadoop classpath or in '-libjars' when using CLI
plugins that require the VCF/BCF functionality.
- Added new 'fixmate' plugin, akin to the samtools fixmate command or
Picard's FixMateInformation, i.e. recomputing mate information in the
input SAM/BAM files. Like 'sort', it can also merge multiple SAM/BAM files
together.
- Added new 'vcf-sort' plugin, which sorts a single VCF or BCF input file
while possibly performing format conversion, as it can also output either
VCF or BCF.
- Updated provided Picard from 1.76 to 1.93. Note that a breaking change
concerning the SeekableStream class occurred in Picard 1.84, so a version
older than that may not be used together with this version of Hadoop-BAM.
- The FASTQ and QSEQ input formats can now skip records that have failed
filtering: use the CONF_FILTER_FAILED_QC and CONF_INPUT_FILTER_FAILED_QC
properties.
- The FASTQ input format now accepts Illumina identifiers with a blank index
sequence.
- Fixed BAM records sometimes confusing the reference and mate reference
indices, and not always updating the reference names appropriately.
- Fixed various small misdecodings and misencodings in QSEQ I/O.
- Fixed 'premature EOF' crashes on some BAM inputs.
- Fixed crash on headerless SAM inputs.
- Fixed CLI crash on startup in newer Hadoop versions (at least CDH 4.2.0).
- Other minor fixes.
2012-11-26 --- 5.1:
- Removed the fi.tkk.ics.hadoop.bam.util.hadoop.BAMReader and
fi.tkk.ics.hadoop.bam.util.hadoop.BAMSort classes, which were deprecated
back in 3.0.
- MAJOR CHANGE: The command line plugins 'sort', 'summarize', and
'summarysort' now default to 1 reduce task. The amount can be customized
with the -r/--reducers command line argument. This bumps up the versions
of the plugins to 4.0, 3.0, and 2.0 respectively.
- Fix: BAMRecordReader.getKey now hashes unmapped keys instead of
randomizing them, to ensure consistent results.
- For compatibility with Hadoop 2.0 and any future Hadoop releases, custom
Hadoop classes are now only built and used when using a Hadoop release
that does not provide them. This means that bugs MAPREDUCE-1987 and
MAPREDUCE-2538, which were previously fixed internally, may cause problems
when using the MapReduce-using command line plugins with certain reducer
counts.
- Fixed crash on some BAM inputs caused by a bug in
fi.tkk.ics.hadoop.bam.BAMSplitGuesser.
- Fixed some Illumina identifier scanning issues in the FASTQ input format.
- Added FASTA input format.
- The command line plugins 'sort' and 'summarize' now use RandomSamplers for
input partitioning, as they probably should have all along.
2012-08-31 --- 5.0:
- MAJOR CHANGE: Hadoop-BAM no longer depends on, or even provides,
fi.tkk.ics.hadoop.bam.custom.samtools. In other words, users should now
import Picard classes from Picard itself, i.e. net.sf.samtools.
- Fix data loss/duplication and crash-on-valid issues in SAM input.
- Fix FASTQ record writer to also write the flow cell ID and to emit null
fields correctly.
- Fix crash on some inputs caused by a bug in
fi.tkk.ics.hadoop.bam.custom.hadoop.InputSampler. (Not the same bug as was
fixed in 4.0.)
- Updated provided Picard from 1.56 to 1.76.
- BAMRecordReader.getKey now randomizes the order of unmapped reads instead
of giving them all the same key, improving performance since they can now
be sent to different reduce tasks.
- AnySAMInputFormat now has a nullary constructor, allowing it to be used
directly in Job.setInputFormatClass.
- FASTQ and QSEQ input formats now report isSplitable correctly for
compressed files.
- QSEQ output format and record writer now use a Text instead of a
NullWritable key.
2012-05-03 --- 4.0:
- SAM input and output support. AnySAMInputFormat handles transparent
support of both SAM and BAM inputs even in the same Hadoop job. For
output, there is no SAMOutputFormat; only AnySAMOutputFormat, which can be
used to output either SAM or BAM. BAMOutputFormat will be deprecated in
the future.
- Fix longstanding regression in the embedded Picard library causing
end-of-file markers to be written into BAM files by every reduce task. For
this reason e.g. 'samtools view' refused to show the contents of BAM files
output by Hadoop-BAM.
- Fix crash on some inputs caused by a bug in
fi.tkk.ics.hadoop.bam.custom.hadoop.InputSampler.
- Fix possible crash-on-valid situations in heuristic BAM splitting.
- Various I/O classes from the Seal project are now incorporated. This
includes input formats for FASTQ and QSEQ and an output format for QSEQ.
Thanks to Luca Pireddu!
- Unmapped reads are now ordered after, not before, all other reads.
- Allow using Hadoop's "-libjars" command line argument instead of
HADOOP_CLASSPATH to specify the Picard .jars. This ended up being
fiendishly complicated and somewhat fragile.
- Partitioning files are now saved in the output, not input, directory.
- 'sort' plugin version 3.0:
* Important bug fix for merging: conflicting IDs from different files
weren't being properly corrected.
* SAM input and output support. Can input SAM and BAM files at the same
time and output to either format.
* When not using -o, each reducer now outputs headers into the BAM files.
- 'view' plugin version 1.1, with SAM input support.
- Add new 'cat' plugin version 1.0, for concatenating SAM/BAM files. The
main intended use case is joining the output of 'sort' when it is used
without -o.
- 'summarize' plugin version 2.0, with SAM input support.
- SplittingBAMIndexer can now be used from within the library as well as a
command line tool and can index files directly in HDFS. Thanks to Thomas
Robinson!
- Various minor bug fixes.
- Lots of documentation updates.
- Various clarifications in the README.
- Much quieter error messages when plugin loading fails.
- build.xml now looks in the HADOOP_HOME environment variable for Hadoop
.jars. As a result, the required minimum version of Ant is now 1.7.1.
- fi.tkk.ics.hadoop.bam.custom is now compiled with warnings off, for less
noisy builds.
2012-01-18 --- 3.3:
- Fix embedded Picard to not have an accidentally leftover dependency on
Picard 1.47.
- Clarify some .jar dependencies in the README.
2011-12-07 --- 3.2:
- Important bug fix to avoid looping infinitely on some BAM files.
2011-12-05 --- 3.1:
- Important data loss bug fixes!
- 'sort' plugin updated to 2.0: it can now take multiple input files,
merging them together.
- New 'summary' and 'summarysort' command line plugins, respectively for
creating and sorting Chipster summary files. Not very generically useful;
intended more as example code.
- Some minor command line argument handling bug fixes.
- Updated embedded and provided Picard from 1.47 to 1.56.
- As Hadoop-BAM now depends on Picard proper as well as the SAM-JDK, a
compatible JAR, currently picard-1.56.jar, is distributed together with
Hadoop-BAM.
2011-08-19 --- 3.0:
- Plugin-extensible command line interface.
- The 'view', 'sort', and 'index' command line plugins. These supersede the
fi.tkk.ics.hadoop.bam.util.hadoop.BAMReader and
fi.tkk.ics.hadoop.bam.util.hadoop.BAMSort classes, which have much less
functionality than the new plugins and are considered deprecated.
- Embedded Picard SAM-JDK parts updated from version 1.27 to 1.47.
- A compatible Picard SAM-JDK JAR, currently sam-1.47.jar, is now
distributed together with Hadoop-BAM.
2011-06-01 --- 2.0:
- Heuristic splitting of BAM and BGZF files: indexing is no longer required.
- build.xml now defaults to making a .jar file, no need for explicit 'ant
jar'.
2010-12-10 --- 1.0:
- Initial release.