Reading XML in Node.js

nodejs
Published on December 27, 2018

Reading XML from a File

Node.js has no inbuilt library to read XML. The popular module xml2js can be used to read XML.

Install xml2js by :

npm install xml2js --save

The installed module exposes the Parser object, which is then used to read XML.

const xml2js = require('xml2js');
const fs = require('fs');
const parser = new xml2js.Parser({ attrkey: "ATTR" });

// this example reads the file synchronously
// you can read it asynchronously also
let xml_string = fs.readFileSync("data.xml", "utf8");

parser.parseString(xml_string, function(error, result) {
    if(error === null) {
        console.log(result);
    }
    else {
        console.log(error);
    }
});

It is important to note here that attributes of a node are listed under $ key as default. Here at the time of creating the object, an option is provided to override this default setting. In this example attributes will be stored under ATTR key. This is simply done to improve readability.

For the below XML :

<?xml version="1.0" encoding="UTF-8"?>
<FORMULA1>
    <USER year="2018">
        <NAME>VETTEL</NAME>
        <TEAM>FERRARI</TEAM>
    </USER>
   <USER year="2018">
        <NAME>RAIKKONEN</NAME>
        <TEAM>FERRARI</TEAM>
    </USER>
    <USER year="2018">
        <NAME>HAMILTON</NAME>
        <TEAM>MERCEDES</TEAM>
    </USER>
    <USER year="2018">
        <NAME>BOTTAS</NAME>
        <TEAM>MERCEDES</TEAM>
    </USER>
    <USER year="2018">
        <NAME>RICCIARDO</NAME>
        <TEAM>REDBULL</TEAM>
    </USER>
    <USER year="2018">
        <NAME>VERSTAPPEN</NAME>
        <TEAM>REDBULL</TEAM>
    </USER>
</FORMULA1>

The output result of the above code would be :

{ 
    FORMULA1: { 
        USER: [ 
            { 
                ATTR: { year: '2018' },
                NAME: [ 'VETTEL' ],
                TEAM: [ 'FERRARI' ] 
            },
            { 
                ATTR: { year: '2018' },
                NAME: [ 'RAIKKONEN' ],
                TEAM: [ 'FERRARI' ] 
            },
            { 
                ATTR: { year: '2018' },
                NAME: [ 'HAMILTON' ],
                TEAM: [ 'MERCEDES' ] 
            },
            { 
                ATTR: { year: '2018' },
                NAME: [ 'BOTTAS' ],
                TEAM: [ 'MERCEDES' ] 
            },
            { 
                ATTR: { year: '2018' },
                NAME: [ 'RICCIARDO' ],
                TEAM: [ 'REDBULL' ] 
            },
            { 
                ATTR: { year: '2018' },
                NAME: [ 'VERSTAPPEN' ],
                TEAM: [ 'REDBULL' ] 
            }
        ] 
    } 
}

Note that tag names are returned as an array, instead of an object.

Reading XML from a URL

The approach is similar to parsing XML from a file. The difference is instead of retrieving the XML content from a file, the same is retrieved by sending a GET HTTP request to the url. The XML response is then read by xml2js.

const http = require('http');
const xml2js = require('xml2js');
const parser = new xml2js.Parser({ attrkey: "ATTR" });

let req = http.get("http://site.com/data.xml", function(res) {
    let data = '';
    res.on('data', function(stream) {
        data += stream;
    });
    res.on('end', function(){
        parser.parseString(data, function(error, result) {
            if(error === null) {
                console.log(result);
            }
            else {
                console.log(error);
            }
        });
    });
});

Reading Large XML Files using Streams

Sometimes XML files become huge, of the order of hundreds of MBs to GBs. Reading such large files by loading the contents in the memory at one go can have detrimental effects on the system.

The solution to this problem is to read the large files as streams. xml2js does not have this feature, so another module node-xml-stream is used.

Install node-xml-stream :

npm install node-xml-stream --save

The module is imported and used as shown below :

const node_xml_stream = require('node-xml-stream');
const parser = new node_xml_stream();

The node-xml-stream parser object has the following events associated with it :

  • opentag — fired when an opening node is encountered
  • closetag — fired When a closing node is encountered
  • text — fired when a text is encountered inside a node
  • cdata — fired when a Character Data(CDATA) is encountered
  • instructions — fired when a processing instruction or XML declaration is encountered
  • error — fired when an error occurs
  • finish — fired when the stream is finished

The same XML file from before is being used here except this time the data is being read in streams. The events that are being used to parse this XML are opentag, closetag, text and finish.

const node_xml_stream = require('node-xml-stream');
const parser = new node_xml_stream();
const fs = require('fs');

// temporary variables to construct final object
let user = { 'USER': [] };
let driver, team, attr, year, t_name;

// callback contains the name of the node and any attributes associated
parser.on('opentag', function(name, attrs) {
    if(name === 'USER') {
        attr = attrs;
    }
    t_name = name;
});

// callback contains the name of the node.
parser.on('closetag', function(name) {
    if(name === 'USER') {
        user['USER'].push({ "name": driver, "team": team, "year": attr.year });
    }
});

// callback contains the text within the node.
parser.on('text', function(text) {
    if(t_name === 'NAME') {
        driver = text;
    }

    if(t_name === 'TEAM') {
        team = text;
    }
});

// callback to do something after stream has finished
parser.on('finish', function() {
    console.log(user);
});

let stream = fs.createReadStream('data.xml', 'UTF-8');
stream.pipe(parser);

The final output would be :

{
    USER: [ 
            { name: 'VETTEL', team: 'FERRARI', year: '2018' },
            { name: 'RAIKKONEN', team: 'FERRARI', year: '2018' },
            { name: 'HAMILTON', team: 'MERCEDES', year: '2018' },
            { name: 'BOTTAS', team: 'MERCEDES', year: '2018' },
            { name: 'RICCIARDO', team: 'REDBULL', year: '2018' },
            { name: 'VERSTAPPEN', team: 'REDBULL', year: '2018' } 
    ]
}
In this Tutorial