Get Count Of An Occurence

I'm using printf and I would like to get a count of how many times the word "file" occurs in the data part. How would I do this in C++ (Windows)?

Code............
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
void printRawData(unsigned char *data, int length, int more)
{
	int i, c=0;
	printf("     -------------One Data Begins-------------\n");
	for (i=0; i<length; i++)
	{
		if ((data[i]>30 && data[i]<122) || 
			(((data[i]==10) || (data[i]==13) || (data[i]==123) || (data[i]==125))
            && (more>0)))
		{
			printf("%c", data[i]);
			c+=1;
                }
		else
		{
			printf("[%i]", data[i]);
			c+=3;
			if (data[i]>9) c++;
			if (data[i]>99) c++;
                }
		if (c>=47)
		{
			printf("\n");
			c=0;
                }
       }
}
Using standart C library it is not possible. Try smth like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <stdio.h>
#include <string.h>
#include <stdarg.h>


int cprintf(const char *format, const char *word, ...) // count-printf
{
	va_list args;
	int count = 0, word_len;
	char buf[1024]; // It's not very good to use 1024 bytes,
	                // you can modify it by printing output, divided in some parts,
	                // to have a chance to cprint larger output.

	word_len = strlen(word);

	va_start(args, word);
	vsprintf(buf, format, args);
	va_end(args);

	printf("%s", buf);

	for(format = buf; format = strstr(format, word); )
	{
		format += word_len;
		count++;
	}

	return count;
}


int main(void)
{
	const char *w = "run";
	printf("%d\n", cprintf("run %s 123run\n", w, w)); // output: 3

	return 0;
}
What about this?
1
2
3
4
5
6
7
8
for(i=0; i < length-3; i++)
{
    if( data[i] == 'f' && data[i+1] == 'i' && data[i+2] == 'l' && data[i+3]=='e' )
    {
        //Found "file" in data on position i
        count++;
    }
}
Last edited on
Maybe I don't understand your question, DSTR3A.
I'm a bit confused by this thread!

bujon's post solves the first part of the original post, though it would be better to make the solution more general.

But I don't get the relevance of printRawData() or va_lists?

I'm trying to pull a number of words from this.....

-------------IP Data Begins-------------
HTTP/1.1 200 OKP3P: policyref="http://g
oogleads.g.doubleclick.net/pagead/gcn_p3p_.xml"
, CP="CURa ADMa DEVa TAIo PSAo PSDo OUR IND UNI
PUR INT DEM STA PRE COM NAV OTC NOI DSP COR"
Content-Type: text/html; charset=UTF-8
X-Content-Type-Options: nosniffContent-
Encoding: gipDate: Mon, 29 Aug 201
1 21:19:36 GMTServer: cafeCache
-Control: privateContent-Length: 4863
X-XSS-Protection: 1; mode=block
▼[
w8W7@lC

If you look at my original post this line is where the count is to come from. Once I get the count I want to run logic from it.

printf("%c", data[i]);
c+=1;

I tried the first solution but I couldn't use it because this is in a header file. The second one gave me a problem with thw count word. I'm new to this so please excuse the ignorance.
You need to search data.
You are using C, not C++, stuff, so...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
size_t count_subblocks( void* haystack, size_t haylen, void* needle, size_t needlelen, int is_overlap_ok )
  {
  /* Make sure all the input is usable */
  if (!haystack || !needle || !needlelen || haylen < needlelen) return 0;

  size_t result = 0;

  /* These will help us keep track of what is to be searched */
  /* (They are char pointers so we can do arithmetic on them.) */
  unsigned char* haybegin     = haystack;
  unsigned char* lasthaybegin = haystack;
  haylen -= needlelen - 1;

  /* While we find any results... */
  for (haybegin = memchr( haybegin, c, haylen );  /* (Find potential match) */
       haybegin != NULL;
       haybegin = memchr( haybegin, c, haylen ))
    {
    /* If we have found an actual match */
    if (memcmp( haybegin, needle, needlelen ) == 0)
      {
      result++;
      haybegin += is_overlap_ok ? 1 : needlelen;
      }
    /* Adjust for the next search */
    haylen -= haybegin - lasthaybegin;
    lasthaybegin = haybegin;
    }

  return result;
  }

Not tested... (i've a baby on arm so i won't bother atm), but a cast or two might be needed on lines 10 and 11.

hope this helps.
Thank you, I'll give it a try. I remeber when I had little ones! :)))
All grown now!
I'm unsure of which is the first solution. And the second.

If you're processing an HTTP header, and packet, you might be able to use a line based approach, as the header lines all end \r\n.

This should able allow you to format the o/p of printRawData a bit better, assuming this is the o/p of your function? (based on style of "banner")

-------------IP Data Begins-------------
HTTP/1.1 200 OKP3P: policyref="http://g
oogleads.g.doubleclick.net/pagead/gcn_p3p_.xml"
, CP="CURa ADMa DEVa TAIo PSAo PSDo OUR IND UNI
PUR INT DEM STA PRE COM NAV OTC NOI DSP COR"
Content-Type: text/html; charset=UTF-8
X-Content-Type-Options: nosniffContent-
Encoding: gipDate: Mon, 29 Aug 201
1 21:19:36 GMTServer: cafeCache
-Control: privateContent-Length: 4863
X-XSS-Protection: 1; mode=block
▼[
w8W7@lC


Maybe like

-------------IP Data Begins-------------
HTTP/1.1 200 OK
P3P: policyref="http://googleads.g.doubleclick.net/pagead/gcn_p3p_.xml",
  CP="CURa ADMa DEVa TAIo PSAo PSDo OUR IND UNI PUR INT DEM STA
  PRE COM NAV OTC NOI DSP COR"
Content-Type: text/html; charset=UTF-8
X-Content-Type-Options: nosniff
Content-Encoding: gip
Date: Mon, 29 Aug 2011 21:19:36 GMT
Server: cafe
Cache-Control: private
Content-Length: 4863
X-XSS-Protection: 1; mode=block
▼[
w8W7@lC


(I assume the last two lines are the start of the packet data?)
Last edited on
Yeah your right, It is part of a packet. It's the begining of the data part, The http header to be exact. How would I do this lin based approach? Thank you.
Well, you know the start of the buffer. Then you find the first \r, or the end of the packet, and process that stretch/line. Then restart the process after the \n, if there is more data available, up until the double \r\n which marks the end of the header.

I've coded it with a while loop and pointers, wrapped up in a little class. When the end of a line is found, it calls a "decode line" function passing the stretch of data to process. In test apps I just copy the data into a std::string, as this makes the processing code simpler. But you can process it with raw pointers if you need the speed. If you're careful!

If you're not to worried about overall speed, you also use a stringstream. Use the stringstream's str() member to feed in the packet data and then use using getline() to extact it line by line. But if you do this you need to be careful to only feed the stringstream with the http header, not the body.

Andy

P.S. Are you looking the just the word "file", or are you wanting to count how many files are being pulled down?
Last edited on
Doing the word.
Well, on reflection, I'd do more or less what Duoas is more or less doing -- one handed -- above (c on line 15 isn't defined, so I swapped it to needle[0], which required an unsigned char* rather than a void*, etc. And switched to a while loop for reasons of laziness.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
#include <iostream>

unsigned char data[] = 
"HTTP/1.1 200 OK\r\n"
"P3P: policyref=\"http://googleads.g.doubleclick.net/pagead/gcn_p3p_.xml\","
" CP=\"CURa ADMa DEVa TAIo PSAo PSDo OUR IND UNI PUR INT DEM STA"
" PRE COM NAV OTC NOI DSP COR\"\r\n"
"Content-Type: text/html; charset=UTF-8\r\n"
"X-Content-Type-Options: nosniff\r\n"
"Content-Encoding: gip\r\n"
"Date: Mon, 29 Aug 2011 21:19:36 GMT\r\n"
"Server: cafe\r\n"
"Cache-Control: private\r\n"
"Content-Length: 4863\r\n"
"X-XSS-Protection: 1; mode=block\r\n"
"["
"w8W7@lC";

size_t count_subblocks( unsigned char* haystack, size_t haylen,
                        unsigned char* needle  , size_t needlelen, int is_overlap_ok )
  {
  /* Make sure all the input is usable */
  if (!haystack || !needle || !needlelen || haylen < needlelen)
    return 0;

  size_t result = 0;

  /* These will help us keep track of what is to be searched */
  /* (They are char pointers so we can do arithmetic on them.) */
  unsigned char* haybegin = haystack;
  haylen -= needlelen - 1;

  /* While we find any results... */
  while( haybegin = (unsigned char*)memchr(haybegin, needle[0], haylen)  )
    {
      int nudge = 1;

      /* If we have found an actual match */
      if ( memcmp( haybegin, needle, needlelen ) == 0 )
        {
        result++;
        if(!is_overlap_ok)
          nudge = needlelen;
        }

        /* Adjust for the next search */
        haybegin += nudge;
        haylen   -= nudge;
    }

    return result;
}

void Test()
  {
  unsigned char field[] = "Content";

  /* just using strlen for test purposes */
  /* counting "Content" as I can see them above */
  /* note this approach is case sensitive */
  size_t m = count_subblocks(data , strlen((char*)data),
                             field, strlen((char*)field), false);

  std::cout << m << std::endl;
  }

int main()
  {
  Test();

  return 0;
  }

Last edited on
closed account (DSLq5Di1)
Maybe I'm missing something here, but wouldn't it be simpler to use strstr() for this?

1
2
#include <stdio.h>
#include <string.h> 

1
2
3
4
5
6
7
8
9
char data[] = ...

const char* substr = "Content";
size_t count = 0;

for (char* ptr = data; ptr = strstr(ptr, substr); ptr++)
    count++;

printf("\"%s\" occurred in [data] %d times.", substr, count);
Binary data may contain zeros -- and strstr() stops on zeros. Alas. Too bad there isn't a memmem() function in the C standard library.
Also, strstr would have problems if there was no null terminator. The above alg has to works on a buffer of unsigned char values -- of size haylen -- even it there is no null terminator.

I have found strnstr() using Google, but it is not regularly available.
Last edited on
The strnstr() function also stops at the null character, even if encountered at some index < n.

In any case, despite being a fairly obvious function, it does have not just portability concerns, but certain implementations are positively broken. Alas.
http://www.mikeash.com/pyblog/dont-use-strnstr.html
closed account (DSLq5Di1)
Thank you guys, that's what I was missing. I should have realised that from the first bit of code "printRawData"..

It is a shame there is no binary strstr function! other than the memchr/memcmp implementation, one might consider using the C++ std::search algorithm.
Thank you. AndyWestKen. I'm trying your code. However: I'm getting that these are all undefined?

/* Make sure all the input is usable */
if (!haystack || !needle || !needlelen || haylen < needlelen)
return 0;
Thank you. AndyWestKen. I'm trying your code. However: I'm getting that these are all undefined?


1
2
3
/* Make sure all the input is usable */
if (!haystack || !needle || !needlelen || haylen < needlelen)
return 0;


As they are all parameters to the function, I do not see how that is possible.

What exactly is the error?
Topic archived. No new replies allowed.