cassava

A CSV parsing and encoding library

Version on this page:	0.5.2.0@rev:7
LTS Haskell 22.39:	0.5.3.2
Stackage Nightly 2024-10-31:	0.5.3.2
Latest on Hackage:	0.5.3.2

See all snapshots cassava appears in

BSD-3-Clause licensed by Johan Tibell

Maintained by [email protected]

This version can be pinned in stack with:cassava-0.5.2.0@sha256:5f2fb613e07a2318fccf5f994cd40bfdaeec2dcca9929737c6e3120e043461fd,6051

Module documentation for 0.5.2.0

Data
- Data.Csv

Depends on 14 packages(full list with versions):

array, attoparsec, base, bytestring, containers, deepseq, hashable, Only, scientific, text, text-short, transformers, unordered-containers, vector

Used by 18 packages in nightly-2022-02-01(full list with versions):

bnb-staking-csvs, cassava-conduit, cassava-megaparsec, closed, cointracking-imports, columnar, criterion, datasets, DBFunctor, detour-via-sci, graphite, hledger-lib, lapack-ffi-tools, lens-csv, pipes-csv, servant-cassava, solana-staking-csvs, streaming-cassava

`cassava`: A CSV parsing and encoding library

Please refer to the package description for an overview of cassava.

Usage example

Here’s the two second crash course in using the library. Given a CSV file with this content:

John Doe,50000
Jane Doe,60000

here’s how you’d process it record-by-record:

{-# LANGUAGE ScopedTypeVariables #-}

import qualified Data.ByteString.Lazy as BL
import Data.Csv
import qualified Data.Vector as V

main :: IO ()
main = do
    csvData <- BL.readFile "salaries.csv"
    case decode NoHeader csvData of
        Left err -> putStrLn err
        Right v -> V.forM_ v $ \ (name, salary :: Int) ->
            putStrLn $ name ++ " earns " ++ show salary ++ " dollars"

If you want to parse a file that includes a header, like this one

name,salary
John Doe,50000
Jane Doe,60000

use decodeByName:

{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative
import qualified Data.ByteString.Lazy as BL
import Data.Csv
import qualified Data.Vector as V

data Person = Person
    { name   :: !String
    , salary :: !Int
    }

instance FromNamedRecord Person where
    parseNamedRecord r = Person <$> r .: "name" <*> r .: "salary"

main :: IO ()
main = do
    csvData <- BL.readFile "salaries.csv"
    case decodeByName csvData of
        Left err -> putStrLn err
        Right (_, v) -> V.forM_ v $ \ p ->
            putStrLn $ name p ++ " earns " ++ show (salary p) ++ " dollars"

You can find more code examples in the examples/ folder as well as smaller usage examples in the Data.Csv module documentation.

Project Goals for `cassava`

There’s no end to what people consider CSV data. Most programs don’t follow RFC4180 so one has to make a judgment call which contributions to accept. Consequently, not everything gets accepted, because then we’d end up with a (slow) general purpose parsing library. There are plenty of those. The goal is to roughly accept what the Python csv module accepts.

The Python csv module (which is implemented in C) is also considered the base-line for performance. Adding options (e.g. the above mentioned parsing “flexibility”) will have to be a trade off against performance. There’s been complaints about performance in the past, therefore, if in doubt performance wins over features.

Last but not least, it’s important to keep the dependency footprint light, as each additional dependency incurs costs and risks in terms of additional maintenance overhead and loss of flexibility. So adding a new package dependency should only be done if that dependency is known to be a reliable package and there’s a clear benefit which outweights the cost.

Changes

Version 0.5.2.0

Add FromField/ToField instances for Identity and Const (#158)
New typeclass-less decoding functions decodeWithP and decodeByNameWithP (#67,#167)
Support for final phase of MFP / base-4.13

Version 0.5.1.0

Add FromField/ToField instance for Natural (#141,#142)
Add FromField/ToField instances for Scientific (#143,#144)
Add support for modifying Generics-based instances (adding Options, defaultOptions, fieldLabelModifier, genericParseRecord, genericToRecord, genericToNamedRecord, genericHeaderOrder) (#139,#140)
Documentation improvements

Version 0.5.0.0

Semantic changes

Don’t unecessarily quote spaces with QuoteMinimal (#118,#122,#86)
Fix semantics of foldl' (#102)
Fix field error diagnostics being mapped to endOfInput in Parser monad. (#99)
Honor encIncludeHeader in incremental API (#136)

Other changes

Support GHC 8.2.1
Use factored-out Only package
Add FromField/ToField instance for ShortText
Add MonadFail and Semigroup instance for Parser
Add Semigroup instance for incremental CSV API Builder & NamedBuilder
Port to ByteString builder & drop dependency on blaze-builder

Version 0.4.5.1

Restore GHC 7.4 support (#124)

Version 0.4.5.0

Support for GHC 8.0 added; support for GHC 7.4 dropped
Fix defect in Foldable(foldr) implementation failing to skip unconvertable records (#102)
Documentation fixes
Maintainer changed

Version 0.4.4.0

Added record instances for larger tuples.
Support attoparsec 0.13.
Add field instances for short bytestrings.

Version 0.4.3.0

Documentation overhaul with more examples.
Add Data.Csv.Builder, a low-level bytestring builder API.
Add a high-level builder API to Data.Csv.Incremental.
Generalize the default FromNamedRecord/ToNamedRecord instances.
Improved support for deriving instances using GHC.Generics.
Added some control over quoting.

Version 0.4.2.4

Support attoparsec 0.13.

Version 0.4.2.3

Support GHC 7.10.

Version 0.4.2.2

Support blaze-builder 0.4.
Make sure inlining doesn’t prevent rules from firing.
Fix incorrect INLINE pragmas.

Version 0.4.2.1

Support deepseq-1.4.

Version 0.4.2.0

Minor performance improvements.
Add 8 and 9 tuple instances for From/ToRecord.
Support text-1.2.

Version 0.4.1.0

Ignore whitespace when converting numeric fields.
Accept \r as a line terminator.
Support attoparsec-0.12.