CSE 17 Fall 18 Final Project: Autocomplete Me
Write a program to implement autocomplete for a given set of N terms, where a term is a query string and an associated nonnegative weight. That is, given a prefix, find all queries that start with the given prefix, in descending order of weight.
Few Notes before we begin:
You should carefully read the Trie data structure documentation and any other Javadoc included in this document For this project, you have only three free submissions. Subsequent submissions will cost you 5 points (up to six submissions total)
Use Eclipse’s code coverage and debugging tool (we will go over them in class)
Include the Big-O of the method in the description (javadoc) when requested (6 points). Do not post your code on Piazza when asking questions
Autocomplete is pervasive in modern applications. As the user types, the program predicts the complete query (typically a word or phrase) that the user intends to type. Autocomplete is most effective when there are a limited number of likely queries. For example, the Internet Movie Database uses it to display the names of movies as the user types; search engines use it to display suggestions as the user enters web search queries; cell phones use it to speed up text input.
In these examples, the application predicts how likely it is that the user is typing each query and presents to the user a list of the top-matching queries, in descending order of weight. These weights are determined by historical data, such as box office revenue for movies, frequencies of search queries from other Google users, or the typing history of a cell phone user. For the purposes of this assignment, you will have access to a set of all possible queries and associated weights (and these queries and weights will not change).
The performance of autocomplete functionality is critical in many systems. For example, consider a search engine which runs an autocomplete application on a server farm. According to one study, the application has only about 50ms to return a list of suggestions for it to be useful to the user. Moreover, in principle, it must perform this computation for every keystroke typed into the search bar and for every user!
In this assignment, you will implement autocomplete by using a Trie data structure.
Read about the Trie data structure before starting the project!
Your program will find all the query strings (suggestions) that start with a given prefix; and sort the matching terms by weight or lexicograpic order.
Task 1: Term
Write a data type Term.java that represents an autocomplete term: a string and an associated integer weight. You must implement the following API, which supports comparing terms by three different orders:
Lexicographic order by query string (the natural order) Descending order by weight (an alternate order)
Lexicographic order by query string but using only the first r characters (a family of alternate orderings). The last order may seem a bit odd, but you will use it in the program to find all query strings that start with a given prefix (of length r)
public class Term implements Comparable<Term> { // Initializes a term with the given query string and weight. public Term(String query, long weight) // Compares the two terms in descending order by weight. public static Comparator<Term> byReverseWeightOrder() // Compares the two terms in lexicographic order but using only the first r characters of each query. public static Comparator<Term> byPrefixOrder(int r) // Compares the two terms in lexicographic order by query. public int compareTo(Term that) // Returns a string representation of this term in the following format: // the weight, followed by a tab character, followed by the query (no space). public String toString() }
Corner cases:
The constructor should throw a java.lang.IllegalArgumentException if query is null and a java.lang.IllegalArgumentException if weight is negative. The byPrefixOrder() method should throw a java.lang.IllegalArgumentException if r is negative.
Your Comparators should only implement the compare(T o1, T o2) method. Take a look at the Inheritance vs Composition,
Exception handling lab.
Task 2: Node
Write a data type Node.java that represents a node in your Trie. Your Node data type encapsulates a Term data field. The node class must also keep track of the number of words (in the subtrie rooted at node) that match the query string, the number of words with prefixes (in the subtrie rooted at node), and references to the next nodes. The node class will have the following fields:
Term: This field will hold the term/word and its weight
words: As we want to know the number of words that match with a given string, every node should have a field to indicate that this vertex represents a complete word or only a prefix (for simplicity, a complete word is considered also a prefix) and how many words in the dictionary are represented by that prefix (there can be repeated words in the dictionary). This task can be done with only one integer field words.
prefixes: As we want to know the number of words that have as prefix a given string, we need another integer field prefixes that indicates how many words have the prefix of the vertex.
References to its children: Each vertex must have references to all his possible children in our case 26 (we are only working with english alphabet, lower case letters). You must initialize all references to null in your constructor. Think about the collection you want to use to store the 26 references.
Accordingly, implement the Node.java with a no-arg constructor, a constructor that takes a word and its weight, and getters and setters.
Task 3: autocomplete (I of II) In this part, you will implement a data type that provides autocomplete functionality for a given set of string and weights, using Term and Node.
Your Autocomplete.java class will implement a Trie data structure. Your class should keep reference to the root of the Trie. Organize your program by creating a data type Autocomplete.java with the following API:
public class Autocomplete { // Initializes the data structure, created the root of the Trie. public Autocomplete() //Adds a new word and its associated weight to the Trie - provide Big-O. public void addWord(String word, int weight) //Returns the root of the subTrie corresponding to the last character of the prefix - provide Big-O. public Node getSubTrie(String prefix) }
Testing. Do not forget to test your code. It is a good idea to draw the expected Trie to have a visual representation of what is happening.
Task 4: autocomplete (II of II). In this part, you will complete the implemention of Autocomplete.java program.
Your Autocomplete.java class will implement a Trie data structure. Your class should keep reference to the root of the Trie. Organize your program by creating a data type Autocomplete.java with the following API:
public class Autocomplete { Task 4 methods // The method returns the number of words that start with prefix. public int countPrefixes(String prefix) // The method returns a List containing all the Terms objects with query starting with prefix - provide Big-O. public List <Term> getSuggestions(String prefix) }
Testing. Do not forget to test your code. It is a good idea to draw the expected Trie to have a visual representation of what is happening.
You are free to create additional methods if you think they will help you in implementing your solution.
Note that the final project is checked for evidence of plagiarism. Academic dishonesty carries tremendous penalty, and a software is used to catch offenders.
Complete project is due Monday, December 10 at 23:59PM.
This assignment was created by Eric Fouh, based on Matthew Drabick and Kevin Wayne nifty assignment. Copyright © 2018.