I need help - urgently

Pages: 12
Nov 29, 2017 at 4:35pm
Hello,
I am new to this forum, and here is my problem.
Topic: accountancy program

I am currently making a C++ program that is gonna compare a huge amount of PDF files, and they are gonna compare number and dates, and find equal matches, the problem is that the dates and numbers varies on each pdf.

My idea so far is:
Folder 1 - one row of pdf files
Get compared with
Folder 2 - 2nd row of pdf files

But I am struggling for it to compare dates AND numbers.


Anyone that could help with ideas of how to code it?
Nov 29, 2017 at 5:23pm
This is for helping high school students the benefits of digitalization.
I hope someone responds since I need this to be done.
Nov 30, 2017 at 10:07am
Please?
Nov 30, 2017 at 10:11am
What are this numbers/dates? How do you obtain them?

What is the difficulty when you try to compare?
Nov 30, 2017 at 11:14am
I have 2 folders:

Folder 1
Contains maybe 1000 PDF Files

Folder 2
Contains Maybe 1040 PDF files

Here the program need to say that there is an unequal amount of PDF files.

2nd.
The PDF files will have Date a random place in the pdf file, and there are alot of filler words, and I need it to only pick out the largest number in folder 1 PDF 1 and compare it to the largest number in PDF folder 2 PDF 1.

So it should come up
PDF 1 Date & Largest Amount = PDF 2 Date & largest Amount

The program has now read the PDF file and ignored the "filler" data.

3rd.

The program will not care about the inital name of each pdf file, since they can vary.



My problem is that I cant make the program read and compare 2 datas (Date and Amount) for each PDF file.
Nov 30, 2017 at 11:23am
I am not sure if it is an impossible task :(

I know it is really advanced, but I hope some of you might know what could be done.
Nov 30, 2017 at 12:20pm
It certainly is not impossible.
Can you upload one sample document so we can see the structure?
Here is one article that might be helpful.
https://www.codeproject.com/Articles/7056/Code-to-extract-plain-text-from-a-PDF-file

Nov 30, 2017 at 12:46pm

Folder 1
http://www.responsive.co.nz/ledgerplus/images/printcustomeraccount.gif

Folder 2 will just contain PDF with
Date - Amount.

I dont know if this is any help


Edit:

I am unsure how I upload PDF files in this forum.
Last edited on Nov 30, 2017 at 12:48pm
Nov 30, 2017 at 1:14pm
You can't upload files here in the forum.
You could upload it to Dropbox, Google Drive, One Drive.....
Nov 30, 2017 at 1:30pm
I can send a share folder to you in PM if you open that possibility
Dec 1, 2017 at 7:50am
I can pay if that is what I have to do
Dec 1, 2017 at 9:42am
I think it has nothing to with getting paid.
It's a rather complicated task in C++ and not so many people here have experience with pdf libraries.
I only could it in C# or maybe Java but this might not be an option for you.
What is this project about - homework ?
Dec 1, 2017 at 9:46am
You can do it in Java if you feel that is easier.
It is not a homework. It is to show how the changing in digitalization can fast change stable jobs as Accounting and controller jobs to High school kids, and that they should keep up with the digitalization trend since it will change most jobs.

I already have a deal with a GUI programmer that will put it in a format so it does not look so "boring"/Advanced.
Dec 1, 2017 at 2:14pm
are you comparing the files (byte by byte) or the OS level data (OS timestamp an file name) or data in the files or something else (?).

Dec 1, 2017 at 2:30pm
Comparing the files on data in the files, regardless of bytes or file name.

It does not matter for me what programming language that is being used, I just thought C++ would be the best match for this.
But if you think that Java or C hash is better, that is fine with me
Dec 1, 2017 at 3:32pm
To compare the data in a pdf is going to be tricky, you have to extract the text and images and compare those. Java and C++ are more or less interchangeable for this; Java has some really weird "you can't do that" limitations that prevent doing some things (like math code, due to lack of operator overloading you can't do some math work cleanly) and pointer work (which is falling to the wayside in c++ but you CAN do it where you need to). But both languages can solve this problem. Java is more portable, c++ is usually a little faster (often, too small to even measure).

I would find a pdf library that you like and use a language that can interact with it.
All major languages can do the rest of the work around the library.
Last edited on Dec 1, 2017 at 3:33pm
Dec 1, 2017 at 4:03pm
I dont know how to make it.
Could anyone help?
Dec 1, 2017 at 4:13pm
In .NET iTextSharp would be a good option.
https://sourceforge.net/projects/itextsharp/
or
http://www.pdfsharp.net/

In what language would the GUI be written?
Dec 1, 2017 at 4:21pm
As with any complicated problem, break it down into parts:

- iterate over PDF files in a directory
- open a PDF files
- extract the text from a PDF files (presumably, using some kind of existing library, as others have said)
- identify the date in the text from a PDF file
- identify the largest number in the text from a PDF file
- compare dates
- compare numbers

Once you have those building blocks in place, you can hopefully put them together to get the functionality you want.
Dec 1, 2017 at 7:55pm
has anyone programmed this before?
Pages: 12