Reading large file using Node.js

April 04, 2020

I got a task to analyse a massive dataset of log files. When you open the file in Excel, it would simply freeze your laptop. Given the limitation of tools available, I try to parse the file using node.js script.

Problem: To read a small file, you may use the script below:

    var fs = require('fs');

    fs.readFile('path/mySmallFile.txt', 'utf-8', (err, data) => {
      if (err) {
        throw err;
      }
      console.log(data);
    })

Then you should be able to read this small file content. However, when the file size is large, you would encounter an error with buffer. Such as RangeError: Attempt to allocate Buffer larger than maximum size. The execution would stop with an error;

    Error: "toString" failed
      at stringSlice (buffer.js)
      at Buffer.toString(buffer.js)
      at FSReqWrap.readFileAfterClose [as oncomplete]

Solution: In order to read the large file, you may import the native library for readline

    var fs = require('fs');
    var readline = require('readline');

    const rl = readline.createInterface({
      input: fs.createReadStream('path/largeFile.csv'),
      output: process.stdout,
      terminal:false
    })

    rl.on('line', (line) => {
      console.log(line);
    })

    rl.on('pause', () => {
      console.log('Done!');
    })

Replace the file path with your path to the large file to process. You can process the file line by line inside the on(‘line’) function, such as parsing to json and increment the counter. The final sum can be displayed at the on(‘pause’) function after finish reading the file.

Now you should be able to process massive dataset with nodejs. For more information, please read the official documentation: https://nodejs.org/api/readline.html


Profile picture

Experience in software development, application architecture, and deploying cloud solutions for enterprise customers. Strong hands-on skills with a Master's degree in Computer Science and business acumen with a master of business administration (MBA) in Finance. Certified in Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure, Kubernetes (CKA, CKAD, CKS, KCNA) and Scrum (PSM, PSPO) with experience in building banking products from scratch. Connect on Linkedin

© 2022, @victorleungtw