Skip to content

Attempt from the NAACCR XML Task Force to map the NAACCR flat-file format to an XML one.

License

Notifications You must be signed in to change notification settings

isaackcr/naaccr-xml

 
 

Repository files navigation

NAACCR XML

Build Status Maven Central

This library provides support for the NAACCR XML format.

Information about the format and the Task Force that developped it can be found on this website: http://naaccrxml.org/.

Download

The library will soon be available on Maven Central.

To include it to your Maven or Gradle project, use the group ID com.imsweb and the artifact ID naaccr-xml.

You can check out the release page for a list of the releases and their changes.

Usage

There are four ways to use this library:

  1. Using the stream classes
  2. Using the NAACCR XML Utility class (NaaccrXmlUtils)
  3. Using the Graphical User Interface (Standlone)
  4. Using the no-GUI batch class (BatchProcessor)

Using the stream classes

This is the recommended way to use the library; 4 streams are provided:

The readers provide a readPatient() method that returns the next patient available, or null if the end of the stream is reached. The writers provide a writePatient(patient) method.

Transforming a flat file into the corresponding XML file and vice-versa becomes very simple with those streams; just create the stream and write every patient you read...

Using the NAACCR XML Utility class (NaaccrXmlUtils)

A few higher-level utility methods have been defined in the NaaccrXmlUtils class (only the required parameters are shown for clarity):

Reading methods

  • NaaccrData readXmlFile (File xmlFile, ...)
  • NaaccrData readFlatFile (File flatFile, ...)

Writing methods

  • void writeXmlFile (NaaccrData data, File xmlFile, ...)
  • void writeFlatFile (NaaccrData data, File flatFile, ...)

Translation methods

  • void flatToXml (File flatFile, File xmlFile, ...)
  • void xmlToFlat (File xmlFile, File flatFile, ...)

There are other utility methods, but those are the main ones.

All those methods accept the following optional parameters (optional in the sense that null can be passed to the method):

  • NaaccrXmlOptions - options for customizing the read/write and errors reporting operations
  • NaaccrDictionary - a user-defined dictionary (if none is provided, the default user-defined dictionary will be used)
  • NaaccrObserver - an observer allowing to report progress as the files are being processed.

Using the Graphical User Interface (Standlone)

The library contains an experimental GUI that wraps some of the utility methods and provides a more user-friendly environment for processing files.

To start the GUI, just double-click the JAR file created from this project; it will invoke the main GUI class (Standlone).

You can also type the following in a DOS prompt, after navigating to the folder containing the JAR file:

java -jar naaccr-xml-X.X.jar

where X.X is the downloaded version.

Using the no-GUI batch class (BatchProcessor)

The library also contains an experimental no-GUI class that can be used to process files in batch (BatchProcessor).

Here is an example of how to start it:

java -cp naaccr-xml-X.X.jar BatchProcessor options.properties

where X.X is the downloaded version.

This assumes the options file is in the same folder as the JAR file (but it can be anywhere and a full path can be provided on the command line).

See the BatchProcessor class for a description of each individual option.

Dealing with dictionaries

The project contains two dictionaries for each supported NAACCR versions: the main dictionary and the default user defined dictionary; here are the ones for NAACCR 15:

In addition, the project also contains a utility class (NaaccrXmlDictionaryUtils) to read, write and validate a given dictionary file. Note that there is no syntax differences between a base dictionary and a user-defined one.

That utility class also contains a method to create a NAACCR ID (used for the "naaccrId" attribute) from a given item name using the following rules:

  1. Spaces, dashes, slashes periods and underscores are considered as word separators and replaced by a single space
  2. Anything in parenthesis is removed (along with the parenthesis)
  3. Any non-digit and non-letter character is removed
  4. The result is split by spaces
  5. The first part is un-capitalized, the other parts are capitalized
  6. All the parts are concatenated back together

XML Schemas

The following schemas are available in the project:

About

Attempt from the NAACCR XML Task Force to map the NAACCR flat-file format to an XML one.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 96.8%
  • HTML 3.2%