I'm looking at an HTML parser but it looks like it will be using something called MFC. I googled that out and turns that many people do not favor using MFC. Why is that? And, does anyone know a good c++ html text extractor?
Well, as soon as you use a little bit of MFC, you usually end up dragging in loads of stuff you don't want. Microsoft have done some work on reducing the dependencies, so this is no longer totally true: there are some basic elements of MFC that can be used without the rest of the library (these are mostly the bits which are now shared with ATL). But I would only consider using it to write a fully featured app. Usually I would use WTL/ATL.
In addition, MFC is only available with the paid-for versions of Visual C++. So avoiding it means you code should be buildable with the Express editions.
And then there are the "compromises" its designers made, to make it more attractive to WIN32 C programmers who were switching to MFC, which led to a poor O-O design.
I have also heard that Microsoft is slowly but surely abandoning MFC. I have also heard from a number of programmers that their companies would be out of business today had they staked their code on MFC when Microsoft first began touting it.
My advice is run from MFC as fast as you possibly can.
The reason i shied away from mfc is because alot of the things you learned how to do in win32 go out the window (no pun intended...get it window *waits for stale humor response*). Like winsock sockets are not supported. And while it only took me all of 2 days to get it working somewhat correctly (more or less depending on your perspective) it didn't really rub me the right way that i would have to drop a non-platform dependent socket api for a less universally adopted socket api that MFC is built around.
I've struggled mightily from replying to this post but I simply didn't have enough strength to resist.
I'm an 'old timer', and I've always been offended by things like MFC in particular and class frameworks in general, because I think they are dishonest. I pride myself on crafting code carefully and fully understanding every line of code in my apps. I think for many that bring into their apps thousands of lines of wizard or class framework code written by Microsoft - well, I think it is somewhat dishonest. For one thing, it forces a whole architecture upon one that one might not like (e.g., document / view architecture). For another, it creates bloated executables. For yet another, the code is ugly. Of course, that's subjective.
Nonetheless, this is a popular route taken by many C++ coders. And I do understand that in many contexts it does make sense. For example, class frameworks 'wrap' technology that would take beginning coders months or years to master, such as COM.
However, I personally believe that if a person wants to just have some complex functionality without fully understanding its inner workings, it would be more prudent (and honest) to just use Visual Basic or something like that where all the code to create the various objects is hidden within the runtime or external binaries. Then there wouldn't be piles of dependent headers, macros, template instantiations, etc., within the source code which the user is clueless as to its workings.
So in other words, rather than achieving the benifits of code reuse through adding wizard or template generated source code to my apps, I'd rather personally achieve code reuse by including functionality through adding binary components to my projects, e.g., COM objects, custom controls in Dlls, etc. In that manner I don't have any code in my apps I don't fully understand. All such functionality is wrapped or encapsulated away within binary black boxes that are someone else's creation and responsibility. So the end result is code I've written and fully understand, and then sharply demarked away from that is functionality I've brought in from outside and only header/interface definitions for it are in my apps. Just my opinion.
I was counting that you'd reply to this, and you did.
Thanks everyone, point well cleared. The thing is that I'm not that into software, but I like to get the code I'm writing when I write it.
My thing with downloading HTML is turning into a nightmare (someone suggested building my own web browser but I've seen an example on Microsoft I'm gonna investigate into, it involves IWebBrowser2).
I'll post something in case I have trouble (and I will).