Convert PDF files to TEXT files (OCR optical character recognition)
$30-250 USD
Pagado a la entrega
I am looking for a person who has vast experience using OCR software (optical character recognition software) and who can help me choose a solution for my needs.
I have many PDF documents that need to be recognized (OCR) and converted into delimited text/ASCII format so that they can be loaded into a database on a Windows machine, in a Windows Server environment.
The PDF documents are lists of data (I have included an example which is explained below). The PDF files all have essentially the same format, with very small variations.
The final solution will be called from a management system (spawning a DOS box) and run either via an executable or a batch file, for example:
C:\> [login to view URL] [login to view URL] [login to view URL]
or
C:\> [login to view URL] [login to view URL] [login to view URL]
The [login to view URL] file is the PDF that needs to be scanned and converted into text, and the results of which are then sent to the [login to view URL] file.
Included in this project is a 3 page test PDF document which contains test data using the format I need interpreted.
The second page is slightly crooked, and this is intentional. And the third page is a copy of the first page, but placed upside down, which also intentional.
The actual data is printed on paper from an old legacy system and then scanned with a photocopier-scanner to create a PDF document.
Since the pages are manipulated by a human, placing them in the photocopier, some pages can be crooked, and some may be upside down.
This is why your program must recognize these anomalies and correct them.
There are usually several hundred pages of names, and therefore many pages in the PDF files.
The data to be extracted from the PDF file is primarily the list of names with the various columns of data next to each name.
In the header section of the top of the page, there is some data that needs to be identified:
Date
Time
ESTABLISHMENT:
LOCKERS :
LOCK :
FILES / OPTI.:
The resulting text file should have the data at the top of the page clearly identified so that the program reading can understand how to get it, so for example you could put lines in the text output file like this:
ESTABLISMENT=A7F
LOCKERS=AV
LOCK=8
FILES/OPTI.=0
Then each of the lines of name data can be put like this, with the pipe symbol (|) as the field delimiter:
NAMEINFO=6AAX|FAFORGE|NATASHA|4JA-01A01X-27|DF|A|1|0|1
Here is what I want you to do:
1) Look at the test PDF file provided and determine if you can do this work;
2) If you can do the work, then choose the correct OCR tool to do the job (example: Tesseract/Tensorflow/Open CV);
3) Create a program or script to execute the task (via .exe or .bat);
4) Provide the program or script to me to test;
5) If it works, you will then explain in a DETAILED document, how you made it work so that I can understand each step;
6) Then you get paid;
I understand that most OCR solutions require you to tweek or configure them to be able to better understand the content of the PDF file. You will need to explain this in your documentation.
PYTHON USERS: If you are using python and installing special packages, all of these packages MUST run on Windows Server. If you are creating an executable from all of them, you must explain all the steps showing how you did it.
Please read and understand the project details carefully. Your bid on this project is your final bid. If you are awarded the project you cannot ask for more money or a tip after the project is awarded. You will be paid what you bid. If you have any questions, please ask them before you bid. Your level of professionalism will determine if I do future work with you, as this is the first phase of a multi-phase project (There are 2 more, more complex, PDF formats I need interpreted, plus more stuff).
Thank you
Open CV
tensorflow
Tesseract
pytorch
caffe
keras
Nº del proyecto: #36595615
Sobre el proyecto
Adjudicado a:
Hi, I understand from your project description that you are looking for a person who has vast experience using OCR software (optical character recognition software) and who can help you choose the best solution for you Más
32 freelancers están ofertando un promedio de $201 por este trabajo
I have read project requirements Convert PDF files to TEXT files (OCR). Also, if you want see my past work related to this then I will show you. I am managing director of software company and I have team for developme Más
Using googles ocr api for this task would be perfect. Hey there, i am developer from the UK with over 9 years experience in web development. Upon reading your project description this seems like a task which i can star Más
Hi there! I run a small business that provides mechanical, electrical, electronic, civil, structural, chemical, industrial and control services. I have extensive experience using optical character recognition softwa Más
Warm Greetings! Hope you're doing well. Thanks for posting the project. As a senior full stack developer with 8+ years of experience and I have tons of experience in OCR and pdf2text using python. I've reviewed your po Más
I understand that you are looking for someone who has vast experience using OCR software (optical character recognition software) and who can help you choose the right solution for your needs. With 3dCube Agency's vast Más
Hello there, Please read this proposal thoroughly; it is not an automated proposal. I'm pleased to submit my proposal for your project Convert PDF files to TEXT files (OCR optical character recognition). I understand Más
I possess excellent organizational skills, time management abilities, and a keen eye for identifying errors and inconsistencies. I am a quick learner and can adapt to new tools and software easily. I am confident that Más
I am a software engineer with experience in developing and testing OCR software. I have the necessary skills to develop and test this type of software quickly and efficiently.
My name is Tom Racanelli, and I have looked at your job description. I can do this using the following stack. Open CV tensorflow pytorch keras Feel free to discuss further.
Hello, I am currently a software engineer I can guarantee that python is my field you can check up my linkedin for references
I CAN COMPLETE YOUR PROJECT EASILY Hello, I am programmer with enough experiences on OCR for 10+ years. C/C++, Qt, Python is nice programming language and I am majoring on it for long years so I have confident to compl Más
I understand that you are looking for someone who has vast experience using OCR software (optical character recognition software) and who can help you choose the solution for your needs. Specifically, you need someone Más
Hello I can do this professionally Please contact me through the private mails so that I can send you a sample and we will discuss this