csvrange
This module allows the parsing of CSV files and otherwise similar delimited
files using alternative delimiters on-the-fly, with O(1) memory usage if the
files are processed as they are read from disk. It attempts to handle issues
like newlines and delimiters in quoted fields and escaped quotes correctly.
There are two recognized ways of escaping a quote in this library: The
backslash method (\") and the double-quote ("") method. According to Wikipedia
the double-quote method is teh standard. On the other hand, in my observations
the backslash method occurs more often in practice.
References:
http:
//en.wikipedia.org/wiki/Comma-separated_values
Copyright (C) 2011 David Simcha
License:
Boost Software License - Version 1.0 - August 17th, 2003
Permission is hereby granted, free of charge, to any person or organization
obtaining a copy of the software and accompanying documentation covered by
this license (the "Software") to use, reproduce, display, distribute,
execute, and transmit the Software, and to prepare derivative works of the
Software, and to permit third-parties to whom the Software is furnished to
do so, all subject to the following:
The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software, in whole or in part, and
all derivative works of the Software, unless such copies or derivative
works are solely in the form of machine-executable object code generated by
a source language processor.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
- template
isCharRange
(R)
- Tests whether a type is a ranges of characters.
- template
isStringMatrixLike
(R)
- Tests whether a range is a range of ranges of character ranges.
- class
CsvLine
(R) if (isCharRange!(R));
- This class is an input range that iterates over the columns of a single line of
a CSV file. It is meant to be instantiated only by the CsvFile object,
and may be recycled when popFront() is called on the CsvFile object. Also,
when popFront() is called on this object, the buffer is recycled. For
usage details see csvFile().
- struct
CsvFile
(R);
- CsvFile!(R)
csvFile
(R)(R range, dchar delim);
- Returns a struct that iterates over the rows of a CSV file (or more generally a
similar file with any delimiter character). It is an input range of input
ranges. Each call to front() produces a CsvLine object that can be used to
iterate over the columns of the current row. Note that the CsvLine object is
recycled with every call to popFront().
Examples:
// charRange must be a range of characters representing a CSV file.
auto charRange = getCharRange();
auto csvIter = csvFile(charRange, ',');
// Iterate over the lines of the CSV file, excluding line breaks embedded in
// quotes.
string[] rowArrays;
foreach(row; csvIter)
{
// Convert the row to an array of strings.
rowArrays ~= array(
map!"a.idup"( // Necessary because CsvLine recycles buffers.
row
)
);
// row is now empty. The next call to csvIter.popFront() will recycle
// the object.
}
- class
CsvStructRange
(S,R);
- enum
Malformed
;
- Determines whether CsvStructRange will ignore malformed lines or throw an
exception.
-
ignore
-
throwException
- CsvStructRange!(S,R)
csvStructRange
(S, R)(R csvRange, string[] colHeaders, Malformed malformed = Malformed.throwException);
- Given a struct type S, a range of characters representing a CSV file, and
an array of strings representing relevant column headers, read the CSV
file into an array of structs, one for each row. The order of fields in
colHeaders must correspond to the order of the fields in the struct.
However, it is acceptable for colHeaders to include only a subset of the
columns in the CSV file and to only contain information for the first
colHeaders.length elements of the struct.
If a specified column header is not found, an exception is thrown.
All type conversions are done automatically, and if a type is to be converted
to a numeric type (an integer or floating point number), leading and trailing
whitespace are removed from the cell before this is attempted. The Malformed
enum controls whether an exception is thrown when a type conversion fails,
or whether the line is simply ignored. The default is to throw an exception.
Examples:
struct Person
{
uint programmingSkill;
float sleepHours;
string favoriteLanguage;
float reserved;
}
auto csvRange = csvFile(getCharRange(), '\t');
auto personRange = csvStructRange!Person(csvRange,
["Programming Skill", "Sleep Hours", "Favorite Language"]
);
foreach(person; personRange)
{
if(person.sleepHours > 7 && person.programmingSkill > 8)
{
assert(person.favoriteLanguage == "D");
}
assert(reserved == float.init); // No header for it, so it's not populated.
}
|