BSD-3-Clause licensed by Ozgun Ataman
Maintained by Daniel Vianna
This version can be pinned in stack with:csv-conduit-1.0.1.0@sha256:15a6a98f4aae9185001ca4e10a5dbb21fd57cc7191a20c4f6bf9b568f01c9039,3374

Module documentation for 1.0.1.0

README

cabal build stack build

CSV Files and Haskell

CSV files are the de-facto standard in many cases of data transfer, particularly when dealing with enterprise application or disparate database systems.

While there are a number of csv libraries in Haskell, at the time of this project’s start, there wasn’t one that provided all of the following:

  • Full flexibility in quote characters, separators, input/output
  • Constant space operation
  • Robust parsing and error resiliency
  • Battle-tested reliability in real-world datasets
  • Fast operation
  • Convenient interface that supports a variety of use cases

Over time, people created other plausible CSV packages like cassava. The major benefit from this library remains to be:

  • Direct participation in the conduit ecosystem, which is now quite large, and all the benefits that come with it.
  • Flexibility in CSV format definition.
  • Resiliency to errors in the input data.

This package

csv-conduit is a conduit-based CSV parsing library that is easy to use, flexible and fast. It leverages the conduit infrastructure to provide constant-space operation, which is quite critical in many real world use cases.

For example, you can use http-conduit to download a CSV file from the internet and plug its Source into intoCSV to stream-convert the download into the Row data type and do something with it as the data streams, that is without having to download the entire file to disk first.

Author & Contributors

  • Ozgun Ataman (@ozataman)
  • Daniel Bergey (@bergey)
  • BJTerry (@BJTerry)
  • Mike Craig (@mkscrg)
  • Daniel Corson (@dancor)
  • Dmitry Dzhus (@dzhus)
  • Niklas Hambüchen (@nh2)
  • Facundo Domínguez (@facundominguez)
  • Daniel Vianna (@dmvianna)

Introduction

  • The CSVeable typeclass implements the key operations.
  • CSVeable is parameterized on both a stream type and a target CSV row type.
  • There are 2 basic row types and they implement exactly the same operations, so you can chose the right one for the job at hand:
    • type MapRow t = Map t t
    • type Row t = [t]
  • You basically use the Conduits defined in this library to do the parsing from a CSV stream and rendering back into a CSV stream.
  • Use the full flexibility and modularity of conduits for sources and sinks.

Speed

While fast operation is of concern, I have so far cared more about correct operation and a flexible API. Please let me know if you notice any performance regressions or optimization opportunities.

Usage Examples

Example #1: Basics Using Convenience API

{-# LANGUAGE OverloadedStrings #-}

import Data.Conduit
import Data.Conduit.Binary
import Data.Conduit.List as CL
import Data.CSV.Conduit
import Data.Text (Text)

-- Just reverse te columns
myProcessor :: Monad m => Conduit (Row Text) m (Row Text)
myProcessor = CL.map reverse

test :: IO ()
test = runResourceT $
  transformCSV defCSVSettings
               (sourceFile "input.csv")
               myProcessor
               (sinkFile "output.csv")

Example #2: Basics Using Conduit API

{-# LANGUAGE OverloadedStrings #-}

import Data.Conduit
import Data.Conduit.Binary
import Data.CSV.Conduit
import Data.Text (Text)

myProcessor :: Monad m => Conduit (Row Text) m (Row Text)
myProcessor = awaitForever $ yield

-- Let's simply stream from a file, parse the CSV, reserialize it
-- and push back into another file.
test :: IO ()
test = runResourceT $
  sourceFile "test/BigFile.csv" $=
  intoCSV defCSVSettings $=
  myProcessor $=
  fromCSV defCSVSettings $$
  sinkFile "test/BigFileOut.csv"

Changes

1.0.1.0

  • Use ConduitT instead of ConduitM (prettier type inference with newer conduit imports)

1.0.0.2

  • Fixed #17, where CSV created with Excel in Mac OS failed to parse due to its newline characters.

1.0.0.1

  • Removed dependencies: mmorph, monad-control, mtl, unordered-containers, primitive

1.0.0.0

  • Removed return from the Monad instance for Parser, and transfered its definition to pure in the Applicative instance of Parser. This was necessary to support GHC 9.6.4.
  • Create new API to choose whether to handle empty CSV cells as empty strings or NULLs.
  • Added imports that were removed from Prelude in GHC 9.6.4.
  • Bumped the default Stack resolver to LTS-22.20.

0.7.3.0

  • Add ordered versions of named records for consistent, controllable header column ordering. PR 44
  • Add support for GHC 9.0.1

0.7.2.0

  • Remove some dependency upper bounds for forward compatibility.

0.7.1.0

  • Add MonadFail instance for Parser. PR 38

0.7.0.0

  • BREAKING: Switch from partial Monoid instance on Parser to total Semigroup instance.
  • Compatibility with GHC 8.4.x/base-4.11.1.0

0.6.8.1

  • Fix documentation mistake in FromNamedRecord/ToNamedRecord examples.

0.6.8

  • Haddocks improvements
  • Fix inlining and specialization rules around formatDecimal
  • Updates to permit newest conduit/resourcet packages

0.6.7

  • Fix build for GHC 8.0.1