View Full Version : parsing CSV
milkmit
January 24th, 2010, 03:12 AM
I've searched everywhere and I'm not sure why it's so hard to find a simple, reliable answer...
I'm simply looking to convert the data in a CSV file into a 2-dimensional array (columns, rows, like so: data[row][column]).
However, the parsing must take into account double quotes with commas inside.
For instance, a sample of the data of 3 rows of 3 columns:
"Test, 1",100,150
"Test, 2",200,250
"Test, 3",300,350
It's got to correctly drop the quotes around the data in the first column, but also understand that the comma within is *not* delimiting. Everything I've seen so far does not seem to be able to do this.
Anyone have any suggestions? I've been going at this for far too long with little success... :(
mathew.er
January 24th, 2010, 11:13 AM
I guess you'll have to write your own parser... but what the hell, I'll do it :D
here ya go:
package
{
public class CSVParser
{
public static function parse ( data : String, newLine : String = "\n" ) : Array
{
var arr : Array = data.split ( newLine );
var len : uint = arr.length;
for ( var i : uint = 0; i < len; i++ ) {
arr[i] = parseRow ( arr[i] );
}
return arr;
}
public static function parseRow ( data : String ) : Array
{
var arr : Array = [];
var arrLen : uint = 0;
var len : uint = data.length;
var isLast : Boolean = false;
var isComplex : Boolean = false;
var wasComplex : Boolean = false;
var char : String;
var lastIndex : int = -1;
for ( var i : uint = 0; i < len; i++ ) {
char = data.charAt ( i );
isLast = i == ( len - 1 );
if ( char == '"' && !isLast ) {
wasComplex = isComplex;
isComplex = !isComplex;
} else if ( char == ',' && !isComplex && !isLast ) {
arr[arrLen] = data.substring ( lastIndex + ( wasComplex ? 2 : 1 ), i + ( wasComplex ? -1 : 0 ) );
arrLen++;
lastIndex = i;
} else if ( isLast ) {
arr[arrLen] = data.substring ( lastIndex + ( isComplex ? 2 : 1 ), i + ( isComplex ? 0 : 1 ) );
} else {
wasComplex = false;
}
}
return arr;
}
}
}
usage
var parsed : Array = CSVParser.parse ( data );
trace ( parsed[1][2] ); // traces 250 for your data
I didn't test it that much, but it should do. :)
milkmit
January 24th, 2010, 12:10 PM
Wow, I cannot thank you enough. It works perfectly. Seriously, I should have given up much earlier and came straight here at the first sign of trouble, rather than spending in excess of 4 hours on it last night. :)
I really, really, really appreciate it.
And for anyone else who might come across this thread in the future (I came across quite a few, none of which were clear or satisfactory enough -- so frustrating!), this is *the goods*! It will take a CSV with any number of rows/columns, and give you a nice clean 2-dimensional array mirroring that data, with no hang-ups on the double quotes and commas within.
THANKS!!!
mathew.er
January 24th, 2010, 12:49 PM
Actually, no, you've do it right. Spending endless nights in frustration over solving "impossible" things is just part of the learning process. Stop thinking in code snippets and tutorials (keep looking for them for reference), but look at how the code works. Figure out an algorithm of how would you do it manually an convert it into code.
And for the code - a little warning. I've looked at the "specification" of CSV format and this implementation doesn't count with double-quotes.
Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.
1997,Ford,E350,"Super ""luxurious"" truck"
Powered by vBulletin® Version 4.1.10 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.