Find Jobs
Hire Freelancers

Shockingly inefficient PERL script on Google n-gram

$30-250 USD

Terminado
Publicado hace más de 10 años

$30-250 USD

Pagado a la entrega
Some colleagues developed a Perl script that compares the similarity of two sentences using Google n-grams. The n-gram files are huge, and without knowing Perl, we believe they have done nothing to optimize retrieval from the n-gram files. Each sentence comparison now takes an average of 7 minutes, and since we have about 500,000 sentence pairs to compare, this task would take almost 7 years to run. We need the speed improved by two orders of magnitude, to an average of 4.2 seconds per comparison. We suspect a simple initial indexing of the n-gram files to at the start of the process may take care of the problem. It would be ok for the system to take up to an hour at the startup to do any indexing and storing in memory. Up to 20GB of memory may be used to store the indexed data.
ID del proyecto: 5337779

Información sobre el proyecto

10 propuestas
Proyecto remoto
Activo hace 10 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
Adjudicado a:
Avatar del usuario
My name is Elias Hamaz, a Perl Coder based in London UK. I can load the ngram files into a reference tree, so that the query is done on RAM memory. I can then modify the code to query the tree. My initial assessment is that: 1: The 20GB limit means that a file can be in memory only while its data is being queried. 2: A maximum of 2 files will be in memory at one time. 3: The order of the list of comparisons can be optimised so that queries on a particular file are performed sequentially, so as to minimise the number of disk read operations. Please get in touch to discuss the details of the comparison process. Regards, Elias Hamaz
$164 USD en 1 día
5,0 (1 comentario)
2,7
2,7
10 freelancers están ofertando un promedio de $181 USD por este trabajo
Avatar del usuario
Definitely an interesting issue, I'd be glad to take the challenge and work on it :) Thank you. Is it a Linux system you're working on? (PS. Good that you aren't in Tom-Sawyer- mood right now: you'd reverse the bid, to reward the job to the bidder offering most :) )
$200 USD en 5 días
4,9 (27 comentarios)
5,2
5,2
Avatar del usuario
I'm interested in that project. I'm experienced (15+) perl developer and linux administrator. The bid is just for 2-3 hours of work, it may or may not be enough to solve the problem. Cannot guarantee without seeing the code. regards.
$77 USD en 3 días
4,8 (17 comentarios)
5,5
5,5
Avatar del usuario
Hi, I have experience with Perl and have done such string comparisons before. Indexing can save a lot of time yes.
$222 USD en 3 días
5,0 (1 comentario)
2,5
2,5
Avatar del usuario
Have optimized mime-64bit encryption perl scripts with 1 pass decoding/encoding. Might also need hardware tuning. Can provide a portfolio of work.
$255 USD en 7 días
4,0 (1 comentario)
0,8
0,8
Avatar del usuario
I am new to freelancer but having extensive experience working on Perl. I have executed lot of automation/optimization project in Perl with employer. I want to understand your full requirement and will provide you with my approach, If you are satisfied then only you can give me this project. I will assure you to meet your expectation.
$155 USD en 10 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
I have extensive knowledge of Perl and of creating indexed data structures to allow for efficient data comparisons/manipulations; based on the project description, I propose using a nested hash structure to first load the n-gram data (actual implementation details depend on your data files, such as your "n-" number and how many files are being used) before reading in your sentences for comparison. Provided sample data (n-gram files and comparison sentence input files) and your output requirements, I am confident I can deliver an efficient solution to help you achieve your goal in a timely manner. I look forward to discuss this in detail at your earliest convenience.
$222 USD en 3 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
I have 4 years of experience in unix, perl I can modify the perl script. My bid is low only to gain experience in freelancer.com , not because I am inefficient. If you send the perl script I can tell actually long it takes to modify the script. You pay only if the end result is satisfying. Thanks, Santhanalekshmi
$35 USD en 5 días
0,0 (0 comentarios)
0,0
0,0

Sobre este cliente

Bandera de UNITED STATES
Boulder, United States
4,9
13
Forma de pago verificada
Miembro desde jun 29, 2007

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.