What is a regular expression to parse csv files?

MSVS 16.9.2

I am using regex to parse input lines in a Comma Separated Value (csv) file and can't seem to get a correct RE. Using ECMAScript (https://docs.microsoft.com/en-us/cpp/standard-library/regular-expressions-cpp?view=msvc-160,

I am currently trying: (field separator is ':')

static const regex pattern(".*[^[:$]]");

Unfortunately, Miscrosoft Visual Studio crashes with this. Using

static const regex pattern(".*[^:]");
regex_search(line, matches, pattern);

Doesn't crash but with an input line like "Name:0" returns matches[0] = "Name:0".

I've tried other RE's put I can't seem to get it right.
If all you need is to split a string by a character, a regex is overkill.
1
2
3
4
5
6
7
8
9
10
11
12
13
std::vector<std::string> split(const std::string &s, char separator){
    std::vector<std::string> ret;
    std::string accum;
    for (auto c : s){
        if (c == separator){
            ret.emplace_back(std::move(accum));
            continue;
        }
        accum += c;
    }
    ret.emplace_back(std::move(accum));
    return ret;
}
CSV files are actually well and truly evil. They look simple, but there are some pretty hard caveats in there that’ll mess up your algorithm.

You said : was the separator.
• What is your data?
• Can your data include the : character? If yes, how?
• Can your data span lines? If yes, how is the newline embedded?
• Any special weirdness in formatting numbers or anything? (For example, did the data come from Excel?)
• How big is the data file? How big do you expect to to grow?

Best way to read a CSV is, sadly, a DFA tailored to your expected input.
If all the above answers to my above questions are 'no' and 'small' then helios’s solution will suffice.
and what about ignoring any white space either side of the delimiter? Can you provide a sample from the file.
I don't think that regex is the right tool.
Consider using a library.
https://github.com/d99kris/rapidcsv/
If the actual format of the csv is known and is simple and doesn't change and can't have variants, then something like helios's code above will be the simpler. However if you are to parse a file that is 'csv format' and it's content is not under your control, then don't try to parse yourself. It's very tricky to parse a general csv file correctly. Use a 3rd party library. Even if you write code to correctly parse a 3rd party csv file now, if the code isn't general and the 3rd party changes their format slightly (eg spaces around the delimiter), then you'll probably have to change your code.

Splitting a string on a delimiter is easy - fully parsing a general csv file is not.
It turns out that my csv output is dead simple. It contains only a ':' (wisely chosen I might add) delimiter with a terminal nothing (or \r if it's a DOS output). I have been convinced that the best way to proceed is to just write a simple piece of code to 'split' the input. To the convincers credit, it worked. And so I proceed gracefully into my future.

I do understand the uncertain world of csv input, and am glad to avoid it.

I might also add that I'm using Visual Studio (because NetBeans has a lack of manpower to provide C/C++ support). I find VS riddled with IDE and compiler errors, and VS seems to ignore some language constructs. Pity. I really like NetBeans.

Thanks to all.
Last edited on
Topic archived. No new replies allowed.