This library provides support for the NAACCR XML format.
Information about the format and the Task Force that developped it can be found on this website: http://naaccrxml.org/.
The library will soon be available on Maven Central.
To include it to your Maven or Gradle project, use the group ID com.imsweb
and the artifact ID naaccr-xml
.
You can check out the release page for a list of the releases and their changes.
There are four ways to use this library:
- Using the stream classes
- Using the NAACCR XML Utility class (NaaccrXmlUtils)
- Using the Graphical User Interface (Standlone)
- Using the no-GUI batch class (BatchProcessor)
This is the recommended way to use the library; 4 streams are provided:
The readers provide a readPatient() method that returns the next patient available, or null if the end of the stream is reached. The writers provide a writePatient(patient) method.
Transforming a flat file into the corresponding XML file and vice-versa becomes very simple with those streams; just create the stream and write every patient you read...
A few higher-level utility methods have been defined in the NaaccrXmlUtils class (only the required parameters are shown for clarity):
Reading methods
- NaaccrData readXmlFile (File xmlFile, ...)
- NaaccrData readFlatFile (File flatFile, ...)
Writing methods
- void writeXmlFile (NaaccrData data, File xmlFile, ...)
- void writeFlatFile (NaaccrData data, File flatFile, ...)
Translation methods
- void flatToXml (File flatFile, File xmlFile, ...)
- void xmlToFlat (File xmlFile, File flatFile, ...)
There are other utility methods, but those are the main ones.
All those methods accept the following optional parameters (optional in the sense that null can be passed to the method):
- NaaccrXmlOptions - options for customizing the read/write and errors reporting operations
- NaaccrDictionary - a user-defined dictionary (if none is provided, the default user-defined dictionary will be used)
- NaaccrObserver - an observer allowing to report progress as the files are being processed.
The library contains an experimental GUI that wraps some of the utility methods and provides a more user-friendly environment for processing files.
To start the GUI, just double-click the JAR file created from this project; it will invoke the main GUI class (Standlone).
You can also type the following in a DOS prompt, after navigating to the folder containing the JAR file:
java -jar naaccr-xml-X.X.jar
where X.X is the downloaded version.
The library also contains an experimental no-GUI class that can be used to process files in batch (BatchProcessor).
Here is an example of how to start it:
java -cp naaccr-xml-X.X.jar BatchProcessor options.properties
where X.X is the downloaded version.
This assumes the options file is in the same folder as the JAR file (but it can be anywhere and a full path can be provided on the command line).
See the BatchProcessor class for a description of each individual option.
The project contains two dictionaries for each supported NAACCR versions: the main dictionary and the default user defined dictionary; here are the ones for NAACCR 15:
In addition, the project also contains a utility class (NaaccrXmlDictionaryUtils) to read, write and validate a given dictionary file. Note that there is no syntax differences between a base dictionary and a user-defined one.
That utility class also contains a method to create a NAACCR ID (used for the "naaccrId" attribute) from a given item name using the following rules:
- Spaces, dashes, slashes periods and underscores are considered as word separators and replaced by a single space
- Anything in parenthesis is removed (along with the parenthesis)
- Any non-digit and non-letter character is removed
- The result is split by spaces
- The first part is un-capitalized, the other parts are capitalized
- All the parts are concatenated back together
The following schemas are available in the project:
- naaccr_data.xsd - W3C Schema for the data files
- naaccr_dictionary.xsd - W3C Schema for the dictionary files