当前位置:天才代写 > R语言代写,r语言代做-无限次修改 > blackboard代写 r代写 INFS7410 ASSIGNMENT 3

blackboard代写 r代写 INFS7410 ASSIGNMENT 3

2018-10-09 08:00 星期二 所属: R语言代写,r语言代做-无限次修改 浏览:1242

blackboard代写 Cranfield Assumptions: This assignment Where to Submit: Electronic submission via blackboard,r代写 案例。

INFS7410 ASSIGNMENT 3

 

Semester 2/2018

 

Marks: 10 marks (10%)

blackboard代写
blackboard代写

Assessment Date:  Tutorial Session on 9 October 2018 (No later than 9 October) Submission Due Date: 11.59PM, 12 October 2018 (No late submission is allowed) What to Submit: Zipped source code with detailed comments

Where to Submit: Electronic submission via blackboard

 


 

 

The goal of this project is to gain practical experience in using the vector space

model with tf.idf weight and cosine similarity measure for document retrieval.

 

You must work on this project individually. The standard academic honesty rules apply.

Dataset: Cranfield

 

Assumptions: This assignment builds on top of Assignment 1 and 2, assuming that the corpus has been tokenized and transformed into lower cases, all SGML tags and stopwords have been removed, and the corpus is indexed by the inverted index.

Task 1 – Building the vector space model representations for the corpus: Write the necessary code to build the vector space model representations for all the documents in the corpus. In this representation, tf.idf weight is used to indicate the term weight. Assume that only the top 1000 most frequent words in the corpus are used to construct the term dictionary. (2 marks)

Task 2 – Using the vector space model representations to perform search: Write the code to implement search: In the following cases, constructing its vector space model representation, and returning top 10 documents that are ranked based on their cosine similarities to the query vector, by comparing the query vector with all the document vectors in the dataset.

 

(1) Query = “method” (0.5 mark)

(2) Query = “transfer equations” (1 mark)

(3) Query = “free problem case” (1 mark)

 

Task 3 – Using the Inverted Index to speed up the search:

 

r代写
r代写

Write the code to speed up the search process in Task 2 by combining the inverted index. The idea is to first select the documents which contain the query words using the inverted index, followed by comparing the selected documents’ vectors with the query vector and ranking them based on their cosine similarities. (2 marks)

Code: Your implementation should be coded in some general programming language (e.g., C, Java, Python, etc.) without using any external IR packages. Your code should provide a simple interface (on console) that provides the following functions: (0.5 mark)

· Allow user to enter the name of the corpus directory (assume that corpus directory is in the same directory as your executable code)

· Allow user to enter the keywords of a search query

 

Deliverables: Your submission includes the following components:

 

1) Program: (5 marks in total)

· Source code and its brief description

· Interface for input

2) Output: (2 marks in total)

· Reporting the query, query results (see Task 2)

3) Performance Bonus: (3 marks in total)

· Efficiency: Report average query execution time for both Task 2 and 3

respectively over 10 executions of the same query.

· Retrieval Models: Implement two or more retrieval models including Vector Space Model. (except Boolean Retrieval)

最先出自天才代写 cs代写 作业代写 代写r
合作:幽灵代写
 

天才代写-代写联系方式