Hello,
I am new to this forum, and here is my problem.
Topic: accountancy program
I am currently making a C++ program that is gonna compare a huge amount of PDF files, and they are gonna compare number and dates, and find equal matches, the problem is that the dates and numbers varies on each pdf.
My idea so far is:
Folder 1 - one row of pdf files
Get compared with
Folder 2 - 2nd row of pdf files
But I am struggling for it to compare dates AND numbers.
Anyone that could help with ideas of how to code it?
Here the program need to say that there is an unequal amount of PDF files.
2nd.
The PDF files will have Date a random place in the pdf file, and there are alot of filler words, and I need it to only pick out the largest number in folder 1 PDF 1 and compare it to the largest number in PDF folder 2 PDF 1.
So it should come up
PDF 1 Date & Largest Amount = PDF 2 Date & largest Amount
The program has now read the PDF file and ignored the "filler" data.
3rd.
The program will not care about the inital name of each pdf file, since they can vary.
My problem is that I cant make the program read and compare 2 datas (Date and Amount) for each PDF file.
I think it has nothing to with getting paid.
It's a rather complicated task in C++ and not so many people here have experience with pdf libraries.
I only could it in C# or maybe Java but this might not be an option for you.
What is this project about - homework ?
You can do it in Java if you feel that is easier.
It is not a homework. It is to show how the changing in digitalization can fast change stable jobs as Accounting and controller jobs to High school kids, and that they should keep up with the digitalization trend since it will change most jobs.
I already have a deal with a GUI programmer that will put it in a format so it does not look so "boring"/Advanced.
Comparing the files on data in the files, regardless of bytes or file name.
It does not matter for me what programming language that is being used, I just thought C++ would be the best match for this.
But if you think that Java or C hash is better, that is fine with me
To compare the data in a pdf is going to be tricky, you have to extract the text and images and compare those. Java and C++ are more or less interchangeable for this; Java has some really weird "you can't do that" limitations that prevent doing some things (like math code, due to lack of operator overloading you can't do some math work cleanly) and pointer work (which is falling to the wayside in c++ but you CAN do it where you need to). But both languages can solve this problem. Java is more portable, c++ is usually a little faster (often, too small to even measure).
I would find a pdf library that you like and use a language that can interact with it.
All major languages can do the rest of the work around the library.
As with any complicated problem, break it down into parts:
- iterate over PDF files in a directory
- open a PDF files
- extract the text from a PDF files (presumably, using some kind of existing library, as others have said)
- identify the date in the text from a PDF file
- identify the largest number in the text from a PDF file
- compare dates
- compare numbers
Once you have those building blocks in place, you can hopefully put them together to get the functionality you want.