Find Jobs
Hire Freelancers

MapReduce Function

$30-250 USD

Cerrado
Publicado hace alrededor de 9 años

$30-250 USD

Pagado a la entrega
MapReduce on Amazon AWS Follow Amazon Web Services Setup Guidelines ([login to view URL]~song/adc/Assignments/[login to view URL]) to create a free account and setup EC2 and S3. We will use data from the Google Books n-gram viewer corpus. N-grams are fixed size tuples of items. In this case the items are words extracted from the Google Books corpus. The n specifies the number of elements in the tuple, so a 5-gram contains five words. This data set is freely available on Amazon S3 in a Hadoop friendly file format and is licensed under a Creative Commons Attribution 3.0 Unported License. The original dataset is available from [login to view URL] You can use a subset of the original Google dataset for this assignment, for example a subset of the 4- gram English dataset. The data is in a simple txt file, and each row of the dataset is formatted like: ngram TAB year TAB match_count TAB page_count TAB volume_count NEWLINE (note that the file is TAB delimited). For example, 2 sample lines in the dataset could be: analysis is often described 1991 10 1 1 analysis is often described 1992 30 2 1 where ‘analysis is often described’ is a 4-gram and line tells us that it occurred 10 times in the year 1991 in 1 book in the Google books sample, 30 times in the year 1992 and so on. Refer to the setup guidelines to see how to set the data as input to your MapReduce job. It a l s o provides a screenshot to configure the EMR cluster, which demonstrates how to access input data from some given bucket. Q 1. (50 points) Plot the frequency distribution for the occurrence counts of the k-grams (e.g., k=4), i.e., a plot where the x-axis is the occurrence count (say n), and y-axis is the number of 4-grams which occur n times. Occurrence count is the just the total number of times a particular k-gram has occurred over all the years in the sample. Hint: It will be easiest if you write a MapReduce job to pull out just the occurrence information from the dataset; download it to your local machine and then compute and plot the distribution again locally on your machine. Q 2. (40 points) Write a MapReduce job to output all 2-grams using the same dataset. Store the output (i.e. the 2-gram dataset) in a bucket in S3. Q 3. (10 points) Run the same code for Q 1 on the dataset generated above, and similarly plot the frequency distribution for the occurrence counts of the 2-grams this time. i.e. a plot where the x-axis is the occurrence count (say n), and y-axis is the number of 2- grams which occur n times. Homework Deliverables: For Q 1: Submit the mapper and reducer files in addition to the plot. For Q 2: Submit the mapper and reducer files for computing the 2-grams from the dataset. For Q 3: Submit your plot.
ID del proyecto: 7425523

Información sobre el proyecto

10 propuestas
Proyecto remoto
Activo hace 9 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
10 freelancers están ofertando un promedio de $331 USD por este trabajo
Avatar del usuario
Hello I am Java expert and interested in this project. I have reviewed your requirements and confident to handle this project perfectly. Please communicate to discuss further. Regards Anshu
$400 USD en 7 días
4,7 (409 comentarios)
7,5
7,5
Avatar del usuario
Hi, I am working on hadoop and nosql from last 2 years, befor this I worked on java for 2 years. I had experience to set up single/multinode cluster. Strong in configuring machine on aws Knowledge on windows/Linux/Unix/SUSE Linux /AWS etc. Strong in training and project work, In training I use gotomeeting tool. Worked on some hadoop projects Like: 1. with redis and postgresql database to generate recommender 2. with cassandra, java AMQ and mahout 3. Hadoop with hbase and mapreduce 4. Hadoop, oracle, sybase replication & sybase ase database Know most popular DB: NOSQL= Hbase,Cassandra,MongoDB,Redis RDBMS= SQL, MySQL, PostgreSQL, ORACLE, In Memory= Sap HANA, Sybase ASE, Replication= Sap Sybase Replication, SAP Event Stream Processing Java Skills: core java, advance java, J2EE, Spring - Hibernate, HTML, JS, AJAX, CSS.etc. Please let me know If you feel suitable. Skype: dineshrajput007
$166 USD en 3 días
5,0 (8 comentarios)
4,1
4,1
Avatar del usuario
I have 1.5 year experience in hadoop mapreduce and big data technologies. I am currently working as a team lead in big data analytics software company. I have read your requironments and i can do you task efficiently and with proper documentation. The account creation requires to be discussed as the link you posted requires username and password and i once created free account on aws it requires credit card info so i think you need to do that part.
$200 USD en 3 días
5,0 (13 comentarios)
3,9
3,9
Avatar del usuario
A proposal has not yet been provided
$166 USD en 3 días
5,0 (3 comentarios)
0,8
0,8
Avatar del usuario
Expert in writing map reduce jobs and have rich experience in other big data technologies like spark, storm , scala.
$333 USD en 3 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
A proposal has not yet been provided
$250 USD en 10 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
I write map/reduce jobs for a living. Also have experience working with Amazon Web Services and their Elastic MapReduce service.
$250 USD en 3 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
A proposal has not yet been provided
$333 USD en 3 días
0,0 (0 comentarios)
0,0
0,0

Sobre este cliente

Bandera de UNITED STATES
Denton, United States
5,0
9
Forma de pago verificada
Miembro desde abr 3, 2015

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.