Importing CSV from URL that has line breaks within one of the fields

Solution 1:

  • You want to put the CSV data to a Spreadsheet.
  • CSV data includes the values which have the line breaks.
    • The values which have the line breaks are enclosed by the double quotes.
    • Such values can be seen at the column "G".
  • You want to put the values including the line breaks.
  • You want to achieve this using Google Apps Script.

If my understanding is correct, how about this answer? Please think of this as just one of several answers.

Issue:

Unfortunately, in the current stage, it seems that Utilities.parseCsv() cannot be directly used for the data which has the cell values including the line breaks. In order to avoid this issue, I would like to propose the following 2 patterns.

Pattern 1:

In this pattern, the following flow is used.

Flow:

  1. In order to parse the data by Utilities.parseCsv(), at first, The values enclosed by the double quotes are replaced by the temporal string.
  2. The values are parsed by Utilities.parseCsv().
  3. Replace the temporal strings to the original values.
  4. Put the data to Spreadsheet.

Modified script:

When your script is modified, please modify as follows.

From:
var csvData = Utilities.parseCsv(csvContent);
To:
// The values enclosed by the double quotes are replaced by the temporal string.
var temp = [];
var tempStr = "###temp###";
var t = csvContent.replace(/\"[\w\s\S]+?\"/g, function(m) {
  temp.push(m.replace(/\"/g, ""));
  return tempStr;
});

// Parse data as CSV data.
var csvData = Utilities.parseCsv(t);

// eplace the temporal strings to the original values.
var h = 0;
csvData.forEach(function(e, i) {
  var j = e.indexOf(tempStr);
  if (j > -1) {
    csvData[i][j] = temp[h];
    h++;
  }
});

Pattern 2:

In this pattern, the following flow is used.

Flow:

  1. Convert the CSV file to Google Spreadsheet.
    • This converted Spreadsheet is used as the temporal Spreadsheet.
  2. Retrieve the values from the converted Spreadsheet.
  3. Delete the temporal Spreadsheet.
  4. Put the values to the destination Spreadsheet.

Sample script:

Before you use this script, please enable Drive API at Advanced Google services.

function importCampaignReport() {
  var csvUrl = "https://s3.amazonaws.com/redacted.csv";
  var blob = UrlFetchApp.fetch(csvUrl).getBlob(); // Modified


  // I added below script.
  var id = Drive.Files.insert({title: "temporalSpreadsheet", mimeType: MimeType.GOOGLE_SHEETS}, blob).id;
  var csvData = SpreadsheetApp.openById(id).getSheets()[0].getDataRange().getValues();
  Drive.Files.remove(id);


  var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Perm URL Campaign Dashboard Report');
  sheet.getRange(1, 1, csvData.length, csvData[0].length).setValues(csvData);
}

Note:

  • In this modified script, I could confirm that the script worked using your shared CSV data.
  • When the CSV file is manually imported to Spreadsheet, I could confirm that the result was the same with the above modified script.

References:

  • replace()
  • parseCsv()
  • Advanced Google services

Added 1:

I could notice that when Sheets API is used, the values retrieved from the URL are directly parsed and put to the Spreadsheet. So as one more pattern, I would like to propose this. The sample script is as follows. In this case, no temporal file is used and the process cost will be able to be lower than those of above 2 patterns.

Sample script:

Before you use this script, please enable Sheets API at Advanced Google services.

function importCampaignReport() {
  var csvUrl = "https://s3.amazonaws.com/redacted.csv";
  var csvContent = UrlFetchApp.fetch(csvUrl).getContentText();

  var spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = spreadsheet.getSheetByName('Perm URL Campaign Dashboard Report');
  var resource = {requests: [{pasteData: {data: csvContent, coordinate: {sheetId: sheet.getSheetId()}, delimiter: ","}}]};
  Sheets.Spreadsheets.batchUpdate(resource, spreadsheet.getId());
}

Reference:

  • PasteDataRequest

Added 2:

  • When the value is put to the Spreadsheet, you want to remove the line breaks from the values.

I could understand like above. If my understanding is correct, how about the following sample script?

Before you use this script, please enable Sheets API at Advanced Google services.

Sample script:

function importCampaignReport() {
  var csvUrl = "https://s3.amazonaws.com/redacted.csv";
  var data = UrlFetchApp.fetch(csvUrl).getContentText();

  var spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = spreadsheet.getSheetByName('Perm URL Campaign Dashboard Report');
  var sheetId = sheet.getSheetId();
  var resource = {requests: [
    {pasteData: {data: data, coordinate: {sheetId: sheetId}, delimiter: ","}},
    {findReplace: {range: {sheetId: sheetId, startColumnIndex: 6, endColumnIndex: 7}, find: "\n", replacement: " ", searchByRegex: true}}
  ]};
  Sheets.Spreadsheets.batchUpdate(resource, spreadsheet.getId());
}
  • In your comment, about the 5th column (G), I think that the 5th column is the column "E". But you say the column "G". I confuse about this. So in the current script, the column "G" can be processed. If you want to do this for the column "E", please set startColumnIndex: 4, endColumnIndex: 5.
  • If an error occurs, please provide your script for replicating the issue. By this, I would like to confirm it.