I'm trying to parse a 13.000 line file line by line. I tried parsing it with std::string::find and std::string::substr but it takes about 15 seconds. With sscanf it takes about 1.3 seconds. Although I've got an issue with sscanf. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
/*
Line format:
Column1[TAB]Column2[TAB]Column3[TAB]...
Example:
Value1 12 Value3 ...
*/
std::string format;
format += "%[^'\t']";
format += "%u\t";
format += "%[^'\t']";
// ..
std::getline(file, line);
sscanf(line.c_str(), format.c_str(), value1, &value2, value3 /*, ..*/);
The problem is some of the values can be empty. When that is the case sscanf ignores all the '\t' until the first character found and that results as columns being shifted, values being incorrect.
I can't comment on scanf, it's not something I've used much. Still, rather than std::string::find and std::string::substr, did you consider std::istringstream and std::getline() with a tab delimiter. There might be a difference in speed.
An example of a few lines from the file and which columns may be empty (any/all of them?) and whether you are only interested in specific columns, or you want all of them, might help.
@Chervil Hey thanks for the reply. I've used std::istringstream with the delimiter and that reduced the time to 8 seconds. Here's the header and few lines:
1 2
// Header
TID ItemName Level SuitableLevelMin SuitableLevelMax Class Race Gender Type SubType ExtraType BoundType EventItem PCBang Only Villain Only GroupID DropRate DropRank font color Spirit Type NPC Price PC Price NPC Charisma Price NPC Shrine Coin PC Charisma Price PC Shrine Coin Stack Hand Rank DUR MinSocket MaxSocket MinOption MaxOption ATK Range ATK Speed Physical_Min_Damage Physical_Max_Damage Physical_Defense_Point Physical_ATKRate PDR BlockRate Magic_Defense_Point Water_Defense_Point Fire_Defense_Point Earth_Defense_Point Air_Defense_Point CON_Bonus_Point STR_Bonus_Point DEX_Bonus_Point INT_Bonus_Point Wis_Bonus_Point Apply Effect Count Apply Effect Time Use Interval HP SP MP Rune Attribute Use skill id Use skill level Polymorph Id Polymorph Dur FirstCategory FirstCategoryName SecondCategory SecondCategoryName PhysicalRank Hp_Buff Mp_Buff Attack_Buff Defense_Buff Run_Buff Cash Destination RemainTime ExpireTime classify_id CanStopUsingItem CashItemUseType EnableOnRide OptionTID PotionType2 Link_id Skill_plus Gambling QuestItem Gacha_Type_Numer GachaRank RemainPetStamina GachaMinLv GachaMaxLv Item_section_num Heroic_Min_Damage Heroic_Max_Damage Heroic_Defense_Point
1 2
// Entry
50405 Absolute Cap of Antonio 75 2 1 1 3517399 1 10000 3 40 2 2 15 15 12000;5540
Another thing I tested is with release build time goes from given seconds to few hundred milliseconds(8s -> 120ms) but since I'm running the application in debug mode most of the time(and this file is a must everytime that I run the application) 8 seconds is really annoying.
@cire Thanks for the reply. Some fields have a string data type so I ended up manually calling std::getline(std::istringstream, field, '\t') for each field. Using std::istringstream reduced the loading time about 50% but it's still really annoying to wait 8 seconds each time I try to launch the application.