So you're trying to preserve the whitespaces? I would suggest the use of getline(in,data); and a small adjustment to your search function to actively search for tags within the given data. Hopefully, you don't mind too much if the newlines get dropped. If you do, then try reading the whole file in character by character and searching for tags while this happens. :)