how to find out if csv file fields are tab delimited or comma delimited

Solution 1:

It's too late to answer this question but hope it will help someone.

Here's a simple function that will return a delimiter of a file.

function getFileDelimiter($file, $checkLines = 2){
        $file = new SplFileObject($file);
        $delimiters = array(
          ',',
          '\t',
          ';',
          '|',
          ':'
        );
        $results = array();
        $i = 0;
         while($file->valid() && $i <= $checkLines){
            $line = $file->fgets();
            foreach ($delimiters as $delimiter){
                $regExp = '/['.$delimiter.']/';
                $fields = preg_split($regExp, $line);
                if(count($fields) > 1){
                    if(!empty($results[$delimiter])){
                        $results[$delimiter]++;
                    } else {
                        $results[$delimiter] = 1;
                    }   
                }
            }
           $i++;
        }
        $results = array_keys($results, max($results));
        return $results[0];
    }

Use this function as shown below:

$delimiter = getFileDelimiter('abc.csv'); //Check 2 lines to determine the delimiter
$delimiter = getFileDelimiter('abc.csv', 5); //Check 5 lines to determine the delimiter

P.S I have used preg_split() instead of explode() because explode('\t', $value) won't give proper results.

UPDATE: Thanks for @RichardEB pointing out a bug in the code. I have updated this now.

Solution 2:

Here's what I do.

  1. Parse the first 5 lines of a CSV file
  2. Count the number of delimiters [commas, tabs, semicolons and colons] in each line
  3. Compare the number of delimiters in each line. If you have a properly formatted CSV, then one of the delimiter counts will match in each row.

This will not work 100% of the time, but it is a decent starting point. At minimum, it will reduce the number of possible delimiters (making it easier for your users to select the correct delimiter).

/* Rearrange this array to change the search priority of delimiters */
$delimiters = array('tab'       => "\t",
                'comma'     => ",",
                'semicolon' => ";"
                );

$handle = file( $file );    # Grabs the CSV file, loads into array

$line = array();            # Stores the count of delimiters in each row

$valid_delimiter = array(); # Stores Valid Delimiters

# Count the number of Delimiters in Each Row
for ( $i = 1; $i < 6; $i++ ){
foreach ( $delimiters as $key => $value ){
    $line[$key][$i] = count( explode( $value, $handle[$i] ) ) - 1;
}
}


# Compare the Count of Delimiters in Each line
foreach ( $line as $delimiter => $count ){

# Check that the first two values are not 0
if ( $count[1] > 0 and $count[2] > 0 ){
    $match = true;

    $prev_value = '';
    foreach ( $count as $value ){

        if ( $prev_value != '' )
            $match = ( $prev_value == $value and $match == true ) ? true : false;

        $prev_value = $value;
    }

} else { 
    $match = false;
}

if ( $match == true )    $valid_delimiter[] = $delimiter;

}//foreach

# Set Default delimiter to comma
$delimiter = ( $valid_delimiter[0] != '' ) ? $valid_delimiter[0] : "comma";


/*  !!!! This is good enough for my needs since I have the priority set to "tab"
!!!! but you will want to have to user select from the delimiters in $valid_delimiter
!!!! if multiple dilimiter counts match
*/

# The Delimiter for the CSV
echo $delimiters[$delimiter]; 

Solution 3:

There is no 100% reliable way to detemine this. What you can do is

  • If you have a method to validate the fields you read, try to read a few fields using either separator and validate against your method. If it breaks, use another one.
  • Count the occurrence of tabs or commas in the file. Usually one is significantly higher than the other
  • Last but not least: Ask the user, and allow him to override your guesses.

Solution 4:

I'm just counting the occurrences of the different delimiters in the CSV file, the one with the most should probably be the correct delimiter:

//The delimiters array to look through
$delimiters = array(
    'semicolon' => ";",
    'tab'       => "\t",
    'comma'     => ",",
);

//Load the csv file into a string
$csv = file_get_contents($file);
foreach ($delimiters as $key => $delim) {
    $res[$key] = substr_count($csv, $delim);
}

//reverse sort the values, so the [0] element has the most occured delimiter
arsort($res);

reset($res);
$first_key = key($res);

return $delimiters[$first_key]; 

Solution 5:

In my situation users supply csv files which are then entered into an SQL database. They may save an Excel Spreadsheet as comma or tab delimited files. A program converting the spreadsheet to SQL needs to automatically identify whether fields are tab separated or comma

Many Excel csv export have field headings as the first line. The heading test is unlikely to contain commas except as a delimiter. For my situation I counted the commas and tabs of the first line and use that with the greater number to determine if it is csv or tab