Issue with GetLine and Large files?

Is there a known issue (or perhaps, inadvertently by design) with using GetLine with large files? I'm writing my own platform-independant graphics engine. I wrote my own model exporter plugin for 3dsmax and the model file format more or less stays the same (Only differences is whether certain sections are included and how many, but each section starts with a unique identifier and ends with a "}").

Anyhow the problem I'm having is reading the model files, more specifically reading in the faces of the model. Now the vertexes in the file come physically before the indexes (As they really should); on a simple model like a cube, there's 12 faces and it reads it all ok (Vertexes and all), so I know my algorithms (Parsing for "}"s, etc) are correct. Anyhow on large models (Roughly 12mb files) where the vertex count is several thousand (In the 18k range) it reads all the vertexes fine, but when it reads the faces (Figure face count is 1/3 of vertex count), it reads for a while but then throws an "Access Violation" at roughly face number 5248. Looking at it in a text editor there's nothing out of the ordinary (There really shouldn't be anyhow, given that small models load right, and the file was automatically generated by a plugin). Here's the "for-loop" in question:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
ModelFileIn is of std::ifstream
Line buffer is of std::string

for (i = 0; i < FaceCount * 3; i+=3)
	{
		getline(ModelFileIn, LineBuffer);
			SeperatorPosition1 = LineBuffer.find(":", 0); // ex: 0
			SeperatorPosition1 = LineBuffer.find(":", SeperatorPosition1 + 1); //A:
			SeperatorPosition2 = LineBuffer.find(":", SeperatorPosition1 + 1); //B:
			SeperatorPosition3 = LineBuffer.find(":", SeperatorPosition2 + 1); //C:
			SeperatorPosition4 = LineBuffer.find("AB", SeperatorPosition3 + 1);
	
		TempString2 = LineBuffer.substr(SeperatorPosition1 + 1, SeperatorPosition2 - (SeperatorPosition1 + 3));
			trim(TempString2);
			VertexIndex = boost::lexical_cast<short>(TempString2);
			indicies[i] = VertexIndex;
			NumOfIndexes += 1;

		TempString2 = LineBuffer.substr(SeperatorPosition2 + 1, SeperatorPosition3 - (SeperatorPosition2 + 3));
			trim(TempString2);
			VertexIndex = boost::lexical_cast<short>(TempString2);
			indicies[i + 1] = VertexIndex;
			NumOfIndexes += 1;


		TempString2 = LineBuffer.substr(SeperatorPosition3 + 1, SeperatorPosition4 - (SeperatorPosition3 + 2));
			trim(TempString2);
			VertexIndex = boost::lexical_cast<short>(TempString2);
			indicies[i + 2] = VertexIndex;
			NumOfIndexes += 1;

	};


Just a side note:
At quick glance it looks like the for loop might be wrong, but it's not as an index is made up of 3 vertexes (Referred to as A, B, and C), and they are stored in a linear fashion one after another (D3D's design, not mine lol), so index[2] is treated as vertex C, and index[3] is treated as vertex A.

Anyhow when it loads large files (~12mb, 6000 faces and 18,000 vertices) it crashes and breaks me into the "string" file (guessing part of the namespaced std library) -- breaking on the "return (getline..." line in this snippet:

1
2
3
4
5
6
7
8
9
template<class _Elem,
	class _Traits,
	class _Alloc> inline
	basic_istream<_Elem, _Traits>& __CLRCALL_OR_CDECL getline(
		basic_istream<_Elem, _Traits>& _Istr,
		basic_string<_Elem, _Traits, _Alloc>& _Str)
	{	// get characters into string, discard newline
	return (getline(_Istr, _Str, _Istr.widen('\n')));
	}

It almost seems like as it keeps moving the file positioner further and further into the file it keeps streaming it into _Istr but I dunno.

Any one have any thoughts? Has anyone heard of something like this? Thanks in advance!

-- StakFallT
Most of the file functions in C/++ break when the file pointer is at or beyond the 2 GB mark. Since your file is nowhere near that, I'd say you look for other errors. Particularly buffer overflows.
hmm.. interesting.. What's the max length of string? ... (Nah that can't be it...that would mean the thousands of lines before it were under hitting the max string length..) hmm.. Definitely puzzling, thanks for that bit of info though, at least I can dismiss the file size possibility.
Thought I'd post an update for anyone else who is/was having a problem with GetLine (I hate when I find search results and the person didn't reply back a solution :P ).

Ok so basically I rewrote the routine and tightened it all up quite a bit.

Originally indicies was declared as this:
short indicies[15000]

so I made it [30000], that seemed to help, so I made it [90000] which was more than enough but it didn't seem to make much difference (And I hated to see such an enormous number for an array especially if half of it would be empty/wasted elements), but because it got better when I made it to [30000] I figured it HAD to be some issue with what you mentioned buffer overflow or some form of memory corruption. So I took a page out of my MaterialManager book. Here's what I did:

declared in the class: short *indicies;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
indicies = new short[FaceCount * 3];
	for (i = 0; i < (FaceCount - 1) * 3; i+=3)
	{
		getline(ModelFileIn, LineBuffer);
			SeperatorPosition1 = LineBuffer.find(":", 0); // ex: 0
			SeperatorPosition1 = LineBuffer.find(":", SeperatorPosition1 + 1); //A:
			SeperatorPosition2 = LineBuffer.find(":", SeperatorPosition1 + 1); //B:
			SeperatorPosition3 = LineBuffer.find(":", SeperatorPosition2 + 1); //C:
			SeperatorPosition4 = LineBuffer.find("AB", SeperatorPosition3 + 1);

		TempString2 = LineBuffer.substr(SeperatorPosition1 + 1, SeperatorPosition2 - (SeperatorPosition1 + 3));
			TempString2 = trim(TempString2);
			indicies[i] = boost::lexical_cast<short>(TempString2);
			NumOfIndexes += 1;

		TempString2 = LineBuffer.substr(SeperatorPosition2 + 1, SeperatorPosition3 - (SeperatorPosition2 + 3));
			TempString2 = trim(TempString2);
			indicies[i + 1] = boost::lexical_cast<short>(TempString2);
			NumOfIndexes += 1;


		TempString2 = LineBuffer.substr(SeperatorPosition3 + 1, SeperatorPosition4 - (SeperatorPosition3 + 2));
			TempString2 = trim(TempString2);
			indicies[i + 2] = boost::lexical_cast<short>(TempString2);
			NumOfIndexes += 1;
		
		TempString2 = "";
	};


In my Material Manager, I basically wrote a function call AddMaterialSlot, and all it does is add's a slot to a dynamic array with data-preservation, read about it in a pdf somewhere on the net (Can't recall the url). I started using this for the indicies, but because FaceCount is not a number that increases through a loop I couldn't implement a AddSlot type of technique at the begining of each for loop. I could have potentially done something like that for the For Loops i variable but then it would get really complicated adjusting for 3 elements per index plus whether elements are 0 indexed or not 0 indexed. blah blah so forth and so on lol

Basically the solution boiled down to:

1) I made the indicies dynamically set, this has the benefit of only making each model instance use only up to the amount of minimum memory required.

2) Instead of using TempString2 for each return of the boost lexical cast, I just set it right to indicies,

3) The For Loop needed to be changed to (FaceCount - 1) * 3. I'm not really sure why. FaceCount, I beleive, is 1 indexed as it comes directly from the model file. Either way -1'ing it seemed to help. Without -1'ing it, I was getting boost bad_cast errors which tipped me off that it was casting on data the routine wasn't designed for meaning it was at a different spot in the file than was anticipated.

Probably a few other changes, but it's fixed, yay! Hope this helps others!
To make sure you don't have any more latent bugs, I'd recommend you run the program through a memory debugger. Valgrind gave me great results and it's free, but it's only for Linux and it's a lot of extra CPU and RAM.
Yeah I definately plan to; the engine, so far, leaks like a stab victim, if you pardon the somewhat graphic yet decent, imho, metaphor. Thanks for the product recommendation :) The engine is designed to be cross-platform. It currently uses a funtion-pointer abstract factory with an std::map mapping std::strings to each Graphics API class driver all written in a Meyer's singleton pattern (So someone in runtime could type in what driver they want to use (or select on a drop-down list) and it'll instance that). So even though Valgrind is Linux, it theoretically should still apply to the code I'm writing; basically with the exception of the engine's d3d driver, if it's not cross-platform I avoid it like the plague :P

I probably should try to clean up some of the leaks before going too far, but I'm still debugging out model loading and texture issues, I need to get these at least working pretty stable (And powerful; multi-textures; shader support) before I do any of that, as I need something to keep my motivation going lol

Right now it's just been problem after problem, and task branching after task branching only to return to an original problem I had that caused me to branch out to about 3 other task levels heh.


EDIT: Oh I see. I just looked up Valgrind. I didn't realize it was such an "integrated" solution. I didn't know it actually acted as an inline debugger. I just thought it was like an intelligent code parser that kept track of whether objects were being destroyed and which typecasts were more dangerous than others. This is actually pretty impressive.
Last edited on
Topic archived. No new replies allowed.