Parsing Strategy for an html files

Hi all!
Sorry for my poor english.
I've a html file like this :



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
<tr class="from" id="n1" >

 			<td>
		           String Zero
			</td>
 			<td>
  			 String One
			</td>
			<td>
                         String Two 
 			</td>
			<td>
			 String Three
			</td>
			<td>
		         String Four
			</td>

</tr>

<tr class="from" id="n2" >

 			<td>
		           String Zero
			</td>
 			<td>
  			 String One
			</td>
			<td>
                         String Two 
 			</td>
			<td>
			 String Three
			</td>
			<td>
		         String Four
			</td>

</tr>
			
<tr class="from" id="n3" >

 			<td>
		           String Zero
			</td>
 			<td>
  			 String One
			</td>
			<td>
                         String Two 
 			</td>
			<td>
			 String Three
			</td>
			<td>
		         String Four
			</td>

</tr>
	


And so on..
For ever Table, i need to extract only String Two and String Three.
For this task it's better to use regex or libxml++ or other library?
Can someone give me some ideas for do this?
Thanks!
I have found it to be easier to do it yourself if the data is in a VERY simple format. When the format becomes nested or complicated, you should use a library.
This looks simple enough to hit with reg-ex or even just a find/substring grouping, something like find "<td>", extract string zero, find td a few times, extract string three, find </tr>, repeat...

Last edited on
About regex, please read: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

Yes, you can use libxml(++/2/whatever) if you deem it worthy of using a library. Otherwise, like jonnin said, if the html is simple enough, just find <td>, extract characters between, until you find </td>.
Last edited on
Topic archived. No new replies allowed.