In this assignment, you will be designing and implementing MapReduce algorithms for a variety of common data processing tasks. Problem 6: Assume you have two matrices A and B in a sparse matrix format, where each record is of the form i, j, value. Design a MapReduce algorithm to compute matrix multiplication: A x B

# Posts Tagged with *Introduction to Data Science*

In this assignment, you will be designing and implementing MapReduce algorithms for a variety of common data processing tasks. Problem 5: Consider a set of key-value pairs where each key is sequence id and each value is a string of nucleotides, e.g., GCTTCCGAAATGCTCGAA.... Write a MapReduce query to remove the last 10 characters from each string of nucleotides, then remove any duplicates generated.

In this assignment, you will be designing and implementing MapReduce algorithms for a variety of common data processing tasks. Problem 4 The relationship "friend" is often symmetric, meaning that if I am your friend, you are my friend. Implement a MapReduce algorithm to check whether this property holds. Generate a list of all non-symmetric friend relationships.

In this assignment, you will be designing and implementing MapReduce algorithms for a variety of common data processing tasks. Problem 3: Consider a simple social network dataset consisting of key-value pairs where each key is a person and each value is a friend of that person. Describe a MapReduce algorithm to count he number of friends each person has.

Problem 2 Implement a relational join as a MapReduce query Consider the query:

S E LECT * FROM Orders, LineItem WHERE Order.order_id = LineItem.order_id

Your MapReduce query should produce the same information as this SQL query. You can consider the two input tables, Order and LineItem, as one big concatenated bag of records which gets fed into the map function record by record.

In this assignment, you will be designing and implementing MapReduce algorithms for a variety of common data processing tasks.

Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington

Problem 6: Top ten hash tags

Write a Python script, top_ten.py, that computes the ten most frequently occurring hash tags from the data you gathered in Problem 1.

Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington

Problem 5: Which State is happiest?

Write a Python script, happiest_state.py, that returns the name of the happiest state as a string.

Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington

Problem 4: Compute Term Frequency

Problem 3: Derive the sentiment of new terms In this part you will be creating a script that computes the sentiment for the terms that do not appear in the file AFINN-111.txt.