Parse PDF file

Jun 20, 2015 at 10:47pm
Dear Forum,

Is there any library or way to parse PDF file with C++??

I have tried to use PoDoFo library but I never succeeded to install it, too many dependencies and I'm getting a lot of error during compilation with mingw.

Thank you for your time and help!!!

Jun 21, 2015 at 12:51am
closed account (z05DSL3A)
I have recently been looking into the possibility of using PDFs for the basis of a project. I have been looking around at libraries and keep coming back to Adobe PDF Library[1] but I have yet to ask what the pricing for this is.

As the project is an off the books (read work related but not work sanctioned) and I think that the Adobe library will not be cheap, I thought I would start with a book:
Developing with PDF: Dive Into the Portable Document
by Leonard Rosenthol[2]

Edit:
As you don't give much info about what you are doing, I'll just mention that there is also an Acrobat SDK that may be of interest (if Acrobat is something you use).

__________________________________
[1] http://www.datalogics.com/products/pdf/pdflibrary/
[2] http://shop.oreilly.com/product/0636920025269.do
[3] http://www.adobe.com/devnet/acrobat.html
Last edited on Jun 21, 2015 at 1:05am
Jun 21, 2015 at 2:25am
You're not the first person to wonder about this...

Googling "recommend pdf library c++" the first hit I get is:

Open source PDF library for C/C++ application? [closed]
http://stackoverflow.com/questions/58730/open-source-pdf-library-for-c-c-application

which mentions PoDoFo, but also LibHaru (more popular than PoDoFo), Hummus, ...

And

Open Source PDF Libraries and Tools
http://pdf-house.blogspot.co.uk/

mentions

QPDF
http://qpdf.sourceforge.net/

And

A list of open source C++ libraries
http://en.cppreference.com/w/cpp/links/libs

lists

HARU
PoDoFo
JagPDF

...

Andy
Jun 21, 2015 at 8:57am
closed account (z05DSL3A)
Finding a list of open libraries is the first step, finding a good one that does all you require is (unfortunately) quite different.
Jun 25, 2015 at 6:38pm
Thank you for your reply! I really appreciate your help

Well I want to create a GUI (vs2008 + Qt) that can read a PDF line by line and detect the presence of a checkboxes in the PDF. Store the state of those checkbox (true or false) in a data base. I proceeded as follow:
under vs2008 and with CMake.exe (I dropped MinGW)
1- Built zlib library
2- Built freetype library
3- Build jpeg library
4- Built png library
5- Built PoDoFo (set the appropriate path (debug and release) of the libraries)

When it comes to build the PoDoFo library, I got these errors:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
5
5>Linking...
3>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Done_FreeType referenced in function "public: __cdecl PoDoFo::PdfFontCache::~PdfFontCache(void)" (??1PdfFontCache@PoDoFo@@QEAA@XZ)
5>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Done_FreeType referenced in function "public: __cdecl PoDoFo::PdfFontCache::~PdfFontCache(void)" (??1PdfFontCache@PoDoFo@@QEAA@XZ)
2>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Done_FreeType referenced in function "public: __cdecl PoDoFo::PdfFontCache::~PdfFontCache(void)" (??1PdfFontCache@PoDoFo@@QEAA@XZ)
4>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Done_FreeType referenced in function "public: __cdecl PoDoFo::PdfFontCache::~PdfFontCache(void)" (??1PdfFontCache@PoDoFo@@QEAA@XZ)
5>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Init_FreeType referenced in function "protected: void __cdecl PoDoFo::PdfFontCache::Init(void)" (?Init@PdfFontCache@PoDoFo@@IEAAXXZ)
3>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Init_FreeType referenced in function "protected: void __cdecl PoDoFo::PdfFontCache::Init(void)" (?Init@PdfFontCache@PoDoFo@@IEAAXXZ)
2>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Init_FreeType referenced in function "protected: void __cdecl PoDoFo::PdfFontCache::Init(void)" (?Init@PdfFontCache@PoDoFo@@IEAAXXZ)
4>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Init_FreeType referenced in function "protected: void __cdecl PoDoFo::PdfFontCache::Init(void)" (?Init@PdfFontCache@PoDoFo@@IEAAXXZ)
5>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Get_Postscript_Name referenced in function "public: class PoDoFo::PdfFont * __cdecl PoDoFo::PdfFontCache::GetFont(struct FT_FaceRec_ *,bool,bool,class PoDoFo::PdfEncoding const * const)" (?GetFont@PdfFontCache@PoDoFo@@QEAAPEAVPdfFont@2@PEAUFT_FaceRec_@@_N1QEBVPdfEncoding@2@@Z)
4>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Get_Postscript_Name referenced in function "public: class PoDoFo::PdfFont * __cdecl PoDoFo::PdfFontCache::GetFont(struct FT_FaceRec_ *,bool,bool,class PoDoFo::PdfEncoding const * const)" (?GetFont@PdfFontCache@PoDoFo@@QEAAPEAVPdfFont@2@PEAUFT_FaceRec_@@_N1QEBVPdfEncoding@2@@Z)
2>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Get_Postscript_Name referenced in function "public: class PoDoFo::PdfFont * __cdecl PoDoFo::PdfFontCache::GetFont(struct FT_FaceRec_ *,bool,bool,class PoDoFo::PdfEncoding const * const)" (?GetFont@PdfFontCache@PoDoFo@@QEAAPEAVPdfFont@2@PEAUFT_FaceRec_@@_N1QEBVPdfEncoding@2@@Z)
3>podofo.lib(PdfFontCache.obj) : error LNK2019: unresolved external symbol FT_Get_Postscript_Name referenced in function "public: class PoDoFo::PdfFont * __cdecl PoDoFo::PdfFontCache::GetFont(struct FT_FaceRec_ *,bool,bool,class PoDoFo::PdfEncoding const * const)" (?GetFont@PdfFontCache@PoDoFo@@QEAAPEAVPdfFont@2@PEAUFT_FaceRec_@@_N1QEBVPdfEncoding@2@@Z)
5>podofo.lib(PdfFontMetricsFreetype.obj) : error LNK2001: unresolved external symbol FT_Get_Postscript_Name
4>podofo.lib(PdfFontMetricsFreetype.obj) : error LNK2001: unresolved external symbol FT_Get_Postscript_Name
2>podofo.lib(PdfFontMetricsFreetype.obj) : error LNK2001: unresolved external symbol FT_Get_Postscript_Name

Jun 26, 2015 at 3:26pm
Well, those three functions are all supposed to come from the FreeType library.

You mention you'd set the path for the libraries, so that's covered.

Open a Visual Studio command prompt and cd to the directory where the FreeType .lib file is found and

link /dump /linkermember:1 freetypeXXX.lib

(where XXX will be the version numbers.)

and check the function names look the same.

When building with MSVC I would expect names like

_FT_Init_FreeType
_FT_Done_FreeType
_FT_Get_Postscript_Name

i.e. with a leading underscore.

The names without the underscore look more like I would expect from GCC.

Andy
Last edited on Jun 26, 2015 at 3:29pm
Jun 29, 2015 at 10:21am
closed account (48T7M4Gy)
http://libharu.sourceforge.net/
Jun 29, 2015 at 4:04pm
btw, just curious, why you want to parse pdf files? is it for encryption?
Jun 29, 2015 at 4:54pm
closed account (z05DSL3A)
chipp wrote:
btw, just curious, why you want to parse pdf files?
Massi wrote:
...that can read a PDF line by line and detect the presence of a checkboxes in the PDF. Store the state of those checkbox (true or false) in a data base.
Jul 4, 2015 at 7:40pm
Hello chipp, I'm sorry for the late response, I have to create a GUI that can read a PDF file line by line and get the status of checkboxes (checked or not) .
Topic archived. No new replies allowed.