String Extraction

I have been working on this now for two full weeks and am still unable to have my code do what I want. I have 380 emails-converted-to-text files that I need to search for the team name, which always appears in the same location.

Here is my text file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Delivered-To: dan.panzarella@internationalquidditch.org
Received: by 10.152.5.162 with SMTP id t2cs54741lat;
        Mon, 24 Oct 2011 21:40:13 -0700 (PDT)
Received: by 10.236.180.101 with SMTP id i65mr38857762yhm.21.1319517612999;
        Mon, 24 Oct 2011 21:40:12 -0700 (PDT)
Return-Path: <webmaster+bncCND6l6uOFBCs-5j1BBoEIjHcPQ@internationalquidditch.org>
Received: from mail-yw0-f69.google.com (mail-yw0-f69.google.com [209.85.213.69])
        by mx.google.com with ESMTPS id z29si8081928yhn.26.2011.10.24.21.40.12
        (version=TLSv1/SSLv3 cipher=OTHER);
        Mon, 24 Oct 2011 21:40:12 -0700 (PDT)
Received-SPF: neutral (google.com: 209.85.213.69 is neither permitted nor denied by best guess record for domain of webmaster+bncCND6l6uOFBCs-5j1BBoEIjHcPQ@internationalquidditch.org) client-ip=209.85.213.69;
Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.213.69 is neither permitted nor denied by best guess record for domain of webmaster+bncCND6l6uOFBCs-5j1BBoEIjHcPQ@internationalquidditch.org) smtp.mail=webmaster+bncCND6l6uOFBCs-5j1BBoEIjHcPQ@internationalquidditch.org
Received: by ywt32 with SMTP id 32sf159999ywt.4
        for <multiple recipients>; Mon, 24 Oct 2011 21:40:12 -0700 (PDT)
Received: by 10.100.26.13 with SMTP id 13mr7581072anz.5.1319517612482;
        Mon, 24 Oct 2011 21:40:12 -0700 (PDT)
X-BeenThere: webmaster@internationalquidditch.org
Received: by 10.150.37.13 with SMTP id k13ls17677131ybk.0.gmail; Mon, 24 Oct
 2011 21:40:12 -0700 (PDT)
Received: by 10.236.155.74 with SMTP id i50mr38629000yhk.23.1319517612323;
        Mon, 24 Oct 2011 21:40:12 -0700 (PDT)
Received: by 10.236.155.74 with SMTP id i50mr38628998yhk.23.1319517612306;
        Mon, 24 Oct 2011 21:40:12 -0700 (PDT)
Received: from smtp.webfaction.com (mail6.webfaction.com. [74.55.86.74])
        by mx.google.com with ESMTP id d47si22408792yhn.147.2011.10.24.21.40.12;
        Mon, 24 Oct 2011 21:40:12 -0700 (PDT)
Received-SPF: pass (google.com: domain of iqa@web197.webfaction.com designates 74.55.86.74 as permitted sender) client-ip=74.55.86.74;
Received: from localhost (web197.webfaction.com [184.172.207.69])
	by smtp.webfaction.com (Postfix) with ESMTP id 15471207F874;
	Mon, 24 Oct 2011 23:40:12 -0500 (CDT)
To: gina.loiacono@ymail.com
Subject: Your IQA membership application
From: membership@internationalquidditch.org
CC: harrison.homel@internationalquidditch.org, webmaster@internationalquidditch.org
MIME-Version: 1.0
Message-Id: <20111025044012.15471207F874@smtp.webfaction.com>
Date: Mon, 24 Oct 2011 23:40:12 -0500 (CDT)
X-Original-Sender: membership@internationalquidditch.org
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain
 of iqa@web197.webfaction.com designates 74.55.86.74 as permitted sender) smtp.mail=iqa@web197.webfaction.com
Precedence: list
Mailing-list: list webmaster@internationalquidditch.org; contact webmaster+owners@internationalquidditch.org
List-ID: <webmaster.internationalquidditch.org>
X-Google-Group-Id: 184551246056
List-Help: <http://www.google.com/support/a/internationalquidditch.org/bin/static.py?hl=en_US&page=groups.cs>,
 <mailto:webmaster+help@internationalquidditch.org>
Content-Type: text/html; charset=ISO-8859-1


<p>Hello!</p><p>Thank you for registering your team, Gladstone High School,  with the International Quidditch Association.</p><p>The IQA manages the different regions of the world through regional representatives. This regional representative is the individual with whom you will be maintaining the greatest portion of your contact as a member of the IQA. If you encounter any issues, please contact your regional representative and they will be happy to assist you. The contact information for your representative(s) is below:</p><p>Name: Harrison Homel<br>Email: <a href='mailto:harrison.homel@internationalquidditch.org'>harrison.homel@internationalquidditch.org</a></p><p>Again, thank you for registering your team with us. You have a great adventure ahead of you!</p><p>With warm wishes,<br>The IQA Team</p>


I have used getline() and it always gives me the paragraph the team name is in, but never just the team name. I'm sure there is a way to easily get it seeing as it always follows a comma and is done by the next comma, but I can't seem to find it. Any pointers or links I should research would be very helpful. Thanks!

P.s. Here's my code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>

using namespace std;

    string line;
    //string teamname;
    //string test;
    ifstream testFile("E:\\emails\\IQA039.txt");
    ofstream outputFile("E:\\outputIQA039TEST.txt", std::ios::app);

int main()
{
    if (testFile.is_open())
    {
        while ( testFile.good() )
        {
            const int LINE_TO_FIND = 50;

            for (int i=0; i<LINE_TO_FIND;i++)
            {
                getline(testFile, line);
            }
            cout<< line << endl;
            //cout << teamname << endl;

        }

    }
    else
    {
        cout << "Error opening file to read from";
    }
    return 0;
}
http://cplusplus.com/faq/sequences/strings/split/

I usually use stringstreams and getline (with the third parameter that specifies a char delimiter) for this.

The trick is that you can use getline from the file, then put that line into the stringstream and use getline with a delimiter, and then put that in another stringstream and use getline with a different delimiter, etc., etc.. You can grind down into the data as far as you need to.
Last edited on
Topic archived. No new replies allowed.