Convert PDF files to TEXT files (OCR optical character recognition)

Completado Publicado hace 11 meses Pagado a la entrega
Completado Pagado a la entrega

I am looking for a person who has vast experience using OCR software (optical character recognition software) and who can help me choose a solution for my needs.

I have many PDF documents that need to be recognized (OCR) and converted into delimited text/ASCII format so that they can be loaded into a database on a Windows machine, in a Windows Server environment.

The PDF documents are lists of data (I have included an example which is explained below). The PDF files all have essentially the same format, with very small variations.

The final solution will be called from a management system (spawning a DOS box) and run either via an executable or a batch file, for example:

C:\> [login to view URL] [login to view URL] [login to view URL]

or

C:\> [login to view URL] [login to view URL] [login to view URL]

The [login to view URL] file is the PDF that needs to be scanned and converted into text, and the results of which are then sent to the [login to view URL] file.

Included in this project is a 3 page test PDF document which contains test data using the format I need interpreted.

The second page is slightly crooked, and this is intentional. And the third page is a copy of the first page, but placed upside down, which also intentional.

The actual data is printed on paper from an old legacy system and then scanned with a photocopier-scanner to create a PDF document.

Since the pages are manipulated by a human, placing them in the photocopier, some pages can be crooked, and some may be upside down.

This is why your program must recognize these anomalies and correct them.

There are usually several hundred pages of names, and therefore many pages in the PDF files.

The data to be extracted from the PDF file is primarily the list of names with the various columns of data next to each name.

In the header section of the top of the page, there is some data that needs to be identified:

Date

Time

ESTABLISHMENT:

LOCKERS :

LOCK :

FILES / OPTI.:

The resulting text file should have the data at the top of the page clearly identified so that the program reading can understand how to get it, so for example you could put lines in the text output file like this:

ESTABLISMENT=A7F

LOCKERS=AV

LOCK=8

FILES/OPTI.=0

Then each of the lines of name data can be put like this, with the pipe symbol (|) as the field delimiter:

NAMEINFO=6AAX|FAFORGE|NATASHA|4JA-01A01X-27|DF|A|1|0|1

Here is what I want you to do:

1) Look at the test PDF file provided and determine if you can do this work;

2) If you can do the work, then choose the correct OCR tool to do the job (example: Tesseract/Tensorflow/Open CV);

3) Create a program or script to execute the task (via .exe or .bat);

4) Provide the program or script to me to test;

5) If it works, you will then explain in a DETAILED document, how you made it work so that I can understand each step;

6) Then you get paid;

I understand that most OCR solutions require you to tweek or configure them to be able to better understand the content of the PDF file. You will need to explain this in your documentation.

PYTHON USERS: If you are using python and installing special packages, all of these packages MUST run on Windows Server. If you are creating an executable from all of them, you must explain all the steps showing how you did it.

Please read and understand the project details carefully. Your bid on this project is your final bid. If you are awarded the project you cannot ask for more money or a tip after the project is awarded. You will be paid what you bid. If you have any questions, please ask them before you bid. Your level of professionalism will determine if I do future work with you, as this is the first phase of a multi-phase project (There are 2 more, more complex, PDF formats I need interpreted, plus more stuff).

Thank you

Open CV

tensorflow

Tesseract

pytorch

caffe

keras

Pytorch Keras Python OCR

Nº del proyecto: #36595615

Sobre el proyecto

32 propuestas Proyecto remoto Activo hace 11 meses

Adjudicado a:

prdraco

Feel free to contact me for Convert PDF files to TEXT files (OCR optical character recognition .Shoot me message to discuss further more details .I provide the comments,images,videos,demos and live sessions in order t Más

$150 USD en 7 días
(12 comentarios)
5.1
syedazainab339

Hi, I understand from your project description that you are looking for a person who has vast experience using OCR software (optical character recognition software) and who can help you choose the best solution for you Más

$100 USD en 7 días
(2 comentarios)
3.1

32 freelancers están ofertando un promedio de $201 por este trabajo

alamineee

Hi, Dear Employer; I am Al.A from Korea. I have read your job post carefully. I can write clean, validated Python and Machine Learning code and make a device-supported M. File. I have over seven-plus years of experien Más

$300 USD en 7 días
(32 comentarios)
5.5
vorasiddh4it

I have read project requirements Convert PDF files to TEXT files (OCR). Also, if you want see my past work related to this then I will show you. I am managing director of software company and I have team for developme Más

$450 USD en 7 días
(8 comentarios)
4.7
dvcontact

Dear Sir, Are you looking for an experienced OCR software user to help you choose the most suitable solution for your needs? I am Smith, a professional with vast experience in using OCR software (optical character reco Más

$140 USD en 7 días
(5 comentarios)
4.6
QASIM7862001

Using googles ocr api for this task would be perfect. Hey there, i am developer from the UK with over 9 years experience in web development. Upon reading your project description this seems like a task which i can star Más

$250 USD en 1 día
(4 comentarios)
4.1
arbu1499

Hello! My name is Arbaz and I'm a freelance designer and developer with extensive experience in the industry. I specialize in designing clean, modern layouts that are easy to navigate and visually appealing; I also pro Más

$150 USD en 2 días
(4 comentarios)
3.2
mohsinali48

Hi there! I run a small business that provides mechanical, electrical, electronic, civil, structural, chemical, industrial and control services. I have extensive experience using optical character recognition softwa Más

$220 USD en 7 días
(2 comentarios)
2.9
mishaantisma

Warm Greetings! Hope you're doing well. Thanks for posting the project. As a senior full stack developer with 8+ years of experience and I have tons of experience in OCR and pdf2text using python. I've reviewed your po Más

$150 USD en 7 días
(3 comentarios)
2.5
The3dCubeAgency

I understand that you are looking for someone who has vast experience using OCR software (optical character recognition software) and who can help you choose the right solution for your needs. With 3dCube Agency's vast Más

$140 USD en 7 días
(0 comentarios)
0.0
V2FSolutions

Hello there, Please read this proposal thoroughly; it is not an automated proposal. I'm pleased to submit my proposal for your project Convert PDF files to TEXT files (OCR optical character recognition). I understand Más

$140 USD en 7 días
(0 comentarios)
0.0
Nehaarpit15

I possess excellent organizational skills, time management abilities, and a keen eye for identifying errors and inconsistencies. I am a quick learner and can adapt to new tools and software easily. I am confident that Más

$50 USD en 7 días
(0 comentarios)
0.0
vortm97

Hello there! My name is DANIL and I am a professional programmer with extensive experience in the web, mobile, desktop developing languages. I have used OCR software before and understand how it works to convert PDF fi Más

$250 USD en 7 días
(0 comentarios)
0.0
Kagamix

I am a software engineer with experience in developing and testing OCR software. I have the necessary skills to develop and test this type of software quickly and efficiently.

$140 USD en 7 días
(0 comentarios)
0.0
Fazal213

Hello, my name is Fazal and I'm an experienced graphic designer with experience in video editing and data entry. I've got 3+ years of experience in graphic design, video editing, and data entry. I understand you are l Más

$100 USD en 3 días
(0 comentarios)
0.0
obsurf

My name is Tom Racanelli, and I have looked at your job description. I can do this using the following stack. Open CV tensorflow pytorch keras Feel free to discuss further.

$250 USD en 2 días
(0 comentarios)
0.0
abdeljaliltabit2

Hello, I am currently a software engineer I can guarantee that python is my field you can check up my linkedin for references

$140 USD en 7 días
(0 comentarios)
0.0
gavrusha2

How do I send you the test file back? It’s done HI i can get this done for you please message me I am quick and efficient and I am very creative this project requires Quick efficiency and creativity so please message Más

$140 USD en 69 días
(0 comentarios)
0.0
DragonGril0923

I CAN COMPLETE YOUR PROJECT EASILY Hello, I am programmer with enough experiences on OCR for 10+ years. C/C++, Qt, Python is nice programming language and I am majoring on it for long years so I have confident to compl Más

$200 USD en 7 días
(0 comentarios)
0.0
Kainatnaseer773

I understand that you are looking for someone who has vast experience using OCR software (optical character recognition software) and who can help you choose the solution for your needs. Specifically, you need someone Más

$200 USD en 5 días
(0 comentarios)
1.9
officeSP

Hello I can do this professionally Please contact me through the private mails so that I can send you a sample and we will discuss this

$200 USD en 2 días
(0 comentarios)
0.0