csvrange

This module allows the parsing of CSV files and otherwise similar delimited files using alternative delimiters on-the-fly, with O(1) memory usage if the files are processed as they are read from disk. It attempts to handle issues like newlines and delimiters in quoted fields and escaped quotes correctly.

There are two recognized ways of escaping a quote in this library: The backslash method (\") and the double-quote ("") method. According to Wikipedia the double-quote method is teh standard. On the other hand, in my observations the backslash method occurs more often in practice.

References:


http:
//en.wikipedia.org/wiki/Comma-separated_values

Copyright (C) 2011 David Simcha

License:
Boost Software License - Version 1.0 - August 17th, 2003

Permission is hereby granted, free of charge, to any person or organization obtaining a copy of the software and accompanying documentation covered by this license (the "Software") to use, reproduce, display, distribute, execute, and transmit the Software, and to prepare derivative works of the Software, and to permit third-parties to whom the Software is furnished to do so, all subject to the following:

The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

template isCharRange (R)
Tests whether a type is a ranges of characters.

template isStringMatrixLike (R)
Tests whether a range is a range of ranges of character ranges.

class CsvLine (R) if (isCharRange!(R));
This class is an input range that iterates over the columns of a single line of a CSV file. It is meant to be instantiated only by the CsvFile object, and may be recycled when popFront() is called on the CsvFile object. Also, when popFront() is called on this object, the buffer is recycled. For usage details see csvFile().

struct CsvFile (R);


CsvFile!(R) csvFile (R)(R range, dchar delim);
Returns a struct that iterates over the rows of a CSV file (or more generally a similar file with any delimiter character). It is an input range of input ranges. Each call to front() produces a CsvLine object that can be used to iterate over the columns of the current row. Note that the CsvLine object is recycled with every call to popFront().

Examples:
// charRange must be a range of characters representing a CSV file.
auto charRange = getCharRange();
auto csvIter = csvFile(charRange, ',');

// Iterate over the lines of the CSV file, excluding line breaks embedded in
// quotes.
string[] rowArrays;
foreach(row; csvIter)
{
    // Convert the row to an array of strings.
    rowArrays ~= array(
        map!"a.idup"(  // Necessary because CsvLine recycles buffers.
            row
        )
    );

    // row is now empty.  The next call to csvIter.popFront() will recycle
    // the object.
}


class CsvStructRange (S,R);


enum Malformed ;
Determines whether CsvStructRange will ignore malformed lines or throw an exception.

ignore


throwException


CsvStructRange!(S,R) csvStructRange (S, R)(R csvRange, string[] colHeaders, Malformed malformed = Malformed.throwException);
Given a struct type S, a range of characters representing a CSV file, and an array of strings representing relevant column headers, read the CSV file into an array of structs, one for each row. The order of fields in colHeaders must correspond to the order of the fields in the struct. However, it is acceptable for colHeaders to include only a subset of the columns in the CSV file and to only contain information for the first colHeaders.length elements of the struct.

If a specified column header is not found, an exception is thrown.

All type conversions are done automatically, and if a type is to be converted to a numeric type (an integer or floating point number), leading and trailing whitespace are removed from the cell before this is attempted. The Malformed enum controls whether an exception is thrown when a type conversion fails, or whether the line is simply ignored. The default is to throw an exception.

Examples:
struct Person
{
    uint programmingSkill;
    float sleepHours;
    string favoriteLanguage;
    float reserved;
}

auto csvRange = csvFile(getCharRange(), '\t');
auto personRange = csvStructRange!Person(csvRange,
    ["Programming Skill", "Sleep Hours", "Favorite Language"]
);

foreach(person; personRange)
{
    if(person.sleepHours > 7 && person.programmingSkill > 8)
    {
        assert(person.favoriteLanguage == "D");
    }

    assert(reserved == float.init);  // No header for it, so it's not populated.
}


Page was generated with on Sat Jan 29 18:35:02 2011