Shockingly inefficient PERL script on Google n-gram

$30-250 USD

Terminado

Publicado

hace más de 10 años

$30-250 USD

Pagado a la entrega

Some colleagues developed a Perl script that compares the similarity of two sentences using Google n-grams. The n-gram files are huge, and without knowing Perl, we believe they have done nothing to optimize retrieval from the n-gram files. Each sentence comparison now takes an average of 7 minutes, and since we have about 500,000 sentence pairs to compare, this task would take almost 7 years to run. We need the speed improved by two orders of magnitude, to an average of 4.2 seconds per comparison. We suspect a simple initial indexing of the n-gram files to at the start of the process may take care of the problem. It would be ok for the system to take up to an hour at the startup to do any indexing and storing in memory. Up to 20GB of memory may be used to store the indexed data.

Perl

ID del proyecto: 5337779

Información sobre el proyecto

10 propuestas

Proyecto remoto

Activo hace 10 años

¿Buscas ganar dinero?

Dirección de email

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto

Cobra por tu trabajo

Describe tu propuesta

Es gratis registrarse y presentar ofertas en los trabajos

Adjudicado a:

@auxilnet

My name is Elias Hamaz, a Perl Coder based in London UK. I can load the ngram files into a reference tree, so that the query is done on RAM memory. I can then modify the code to query the tree. My initial assessment is that: 1: The 20GB limit means that a file can be in memory only while its data is being queried. 2: A maximum of 2 files will be in memory at one time. 3: The order of the list of comparisons can be optimised so that queries on a particular file are performed sequentially, so as to minimise the number of disk read operations. Please get in touch to discuss the details of the comparison process. Regards, Elias Hamaz

$164 USD en 1 día

5,0

(1 comentario)

2,7

10 freelancers están ofertando un promedio de $181 USD por este trabajo

@PerlIsFun

Definitely an interesting issue, I'd be glad to take the challenge and work on it :) Thank you. Is it a Linux system you're working on? (PS. Good that you aren't in Tom-Sawyer- mood right now: you'd reverse the bid, to reward the job to the bidder offering most :) )

$200 USD en 5 días

4,9

(27 comentarios)

5,2

@lepoitr

I'm interested in that project. I'm experienced (15+) perl developer and linux administrator. The bid is just for 2-3 hours of work, it may or may not be enough to solve the problem. Cannot guarantee without seeing the code. regards.

$77 USD en 3 días

4,8

(17 comentarios)

5,5

@MuradMurad

Hi, I have experience with Perl and have done such string comparisons before. Indexing can save a lot of time yes.

$222 USD en 3 días

5,0

(1 comentario)

2,5

@jammarshall

Have optimized mime-64bit encryption perl scripts with 1 pass decoding/encoding. Might also need hardware tuning. Can provide a portfolio of work.

$255 USD en 7 días

4,0

(1 comentario)

0,8

@mohitora

I am new to freelancer but having extensive experience working on Perl. I have executed lot of automation/optimization project in Perl with employer. I want to understand your full requirement and will provide you with my approach, If you are satisfied then only you can give me this project. I will assure you to meet your expectation.

$155 USD en 10 días

0,0

(0 comentarios)

0,0

@jeffjeffhinson

I have extensive knowledge of Perl and of creating indexed data structures to allow for efficient data comparisons/manipulations; based on the project description, I propose using a nested hash structure to first load the n-gram data (actual implementation details depend on your data files, such as your "n-" number and how many files are being used) before reading in your sentences for comparison. Provided sample data (n-gram files and comparison sentence input files) and your output requirements, I am confident I can deliver an efficient solution to help you achieve your goal in a timely manner. I look forward to discuss this in detail at your earliest convenience.

$222 USD en 3 días

0,0

(0 comentarios)

0,0

@santhana25

I have 4 years of experience in unix, perl I can modify the perl script. My bid is low only to gain experience in freelancer.com , not because I am inefficient. If you send the perl script I can tell actually long it takes to modify the script. You pay only if the end result is satisfying. Thanks, Santhanalekshmi

$35 USD en 5 días