Get webpage source code from BHO

Hello,

I'm working on a BHO for IE9 with VS2010.
My goal is to get the source code of the webpages I visit.


I have so far mostly used the following articles:

Building Browser Helper Objects with Visual Studio 2005
http://msdn.microsoft.com/en-us/library/bb250489(v=vs.85).aspx

Browser Helper Objects: The Browser the Way You Want It
http://msdn.microsoft.com/en-us/library/bb250436.aspx


I managed to do what's given in 1st article above. So I currently have a BHO that pops up a message box when the page is loaded, and removes images from the visited webpages.

What I want to do is what's explained in the 2nd article, but I'm struggling with the following part:

// Enable changes to the text
HWND hwnd = m_dlgCode.GetDlgItem(IDC_TEXT);
EnableWindow(hwnd, true);
hwnd = m_dlgCode.GetDlgItem(IDC_APPLY);
EnableWindow(hwnd, true);


m_dlgCode is undefined
>> I think I have to add it to the CViewSource class, but I'm not sure of what type this identifier should be. A class that has the GetDlgItem() method, I guess.

IDC_TEXT and IDC_APPLY are undefined too
>> Same for those. I have no idea of where they come from, and not sure of the values (const int ?) they should be assigned.


Or if you have other solutions to get the source code from visited webpages with a BHO, I'd be happy to see them. :)

Thanks in advance!
Chris.
I don't know if this is what you are looking for, but try right clicking the webpage then going down (or up) to "Veiw Source". Good luck!

- Kyle
Hi Kyle,

Indeed, that's the easiest way. :-)
But I want to develop my own source code reader for automation purposes.

Basically, how to show that source without having to right click the webpage.

Thanks anyway,
Chris.
Last edited on
Maybe try something like...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include<iostream>
#include<windows.h>
#include <tchar.h>
#include <urlmon.h>
#pragma comment(lib, "urlmon.lib")
#pragma comment(lib,"wininet.lib")

int main()
{
	HRESULT hr;
	LPCTSTR Url = _T("http://cplusplus.com/index.html"), 
		File = _T("C:\\Users\\Sean\\Downloads\\cplusplus.txt");
	hr = URLDownloadToFile (0, Url, File, 0, 0);
	return 0;
}


Note the saving as a txt file. I adapted this from Pluto is a planet's post here http://www.cplusplus.com/forum/windows/53113/
Those undefined identifiers for you are most likely resource identifiers for a user interface built using a resource editor. Looking at the screenshot of the UI, I'd say the IDC_TEXT identifier probably identifies the textbox that shows the HTML, and IDC_APPLY probably identifies the Apply Changes button.

I think that code uses MFC classes to instantiate the UI, and this is why you probably don't see a lot of code as it is assumed you know it to be standard MFC.
Hello,

Thanks very much Naraku! It seems that I can do something with that function. Just have to put the website in the Trusted Sites or another security zone without protected mode (or run as administrator). I will do more tests to see if that meets my needs.

Thanks Jose for you insight. Actually, I don't need a UI to view the source but I understand the Microsoft's example better now.

Cheers!


Edit:
In fact, I figured out that I was trying to sort out some piece of code I didn't need. ^^
I just got rid of the following, given in the MS article, and I can still play with the source code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// Enable changes to the text
    HWND hwnd = m_dlgCode.GetDlgItem(IDC_TEXT);
    EnableWindow(hwnd, true);
    hwnd = m_dlgCode.GetDlgItem(IDC_APPLY);
    EnableWindow(hwnd, true);
	
    // Set the text in the Code Window
    m_dlgCode.SetDlgItemText(IDC_TEXT, psz); 
    delete [] psz;
   }
  else   // The document isn't a HTML page
  {
    m_dlgCode.SetDlgItemText(IDC_TEXT, ""); 
    HWND hwnd = m_dlgCode.GetDlgItem(IDC_TEXT);
    EnableWindow(hwnd, false);
    hwnd = m_dlgCode.GetDlgItem(IDC_APPLY);
    EnableWindow(hwnd, false);
  }


Thanks for your replies guys, anyway!
Problem solved. :-)
Last edited on
Topic archived. No new replies allowed.