当前位置 |首页 > CS代写 > 他类 >
分享这个代写网站给同学或者朋友吧!

Word Frequency Table

Finding the most common words in a text file is a famous “programming pearl”, originally posed by Jon Bentley (1984). Several interesting solutions have been proposed by Knuth (an exquisite model of “literate programming”, 1986), McIlroy (an engineering example of combining a timeless set of “UNIX tools”, 1986), Hanson (an alternate efficient solution, 1987).

This assignment requires a simple solution, comparable in style and performance to McIlroy’s pipelined solution (though not as performant as the much more complex solutions of Knuth and Hanson). For this, we will use a simple functional programming style based on .NET LINQ framework.

Problem statement: Given a text file and an integer k, you are to print the words (and their frequencies of occurrence) whose frequencies of occurrence are the k largest (upper limit) in order of decreasing frequency; breaking any ties in increasing words’ alphabetical order. Words are consecutive sequences of ASCII alphabetical letters (A-Za-z), where letter case is ignored, and all other characters are ignored. A short minimalistic example will clarify these specs.

Sample text file, called test1.txt:

.....


Sample result, for k=3: The output must be a text file with two columns. Extra white spaces are permitted but discarded. Please use this format, because your output will be assessed by a script, using whitespace tolerant textual comparisons.

4 AA
2 CCC
2 DD

To understand this, let us consider the following intermediate steps. Word list extracted from this text file:

Canonical word list – upper case versions to ignore the case:

DD DD CCC AA AA AA CCC BB AA

An unsorted frequency table:

4 AA
1 BB
2 CCC
2 DD



A sorted frequency table: 1st by decreasing frequencies, 2nd by increasing alphabetical word order:

4 AA
2 CCC
2 DD
1 BB



Taking the first k=3 rows only, we obtain the above shown result.

 

 

Programs: (1) a C# solution (required); (2) a Node JS solution (required); and

(3) an F# solution (optionally, bonus). Essentially, each program must be totally contained in one single file and use only the standard libraries extant in the labs.

The C# solution must use the .NET LINQ framework and System.IO.

The Node JS solution must only use linq-es2015 (external) and fs (internal). The F# solution must only use the F# core modules and System.IO.

These programs must be named after your upi, e.g. jbon007.cs, jbon007.js, jbon007.fs. The dropbox will reject programs that do not follow this convention.

Compilation: Assuming the C# and F# compilers are on the current PATH, the required C# and the optional F# programs must be compilable in the labs with these simple commands (no external libraries are permitted):

csc jbon007.cs
fsc jbon007.fs



The required Node JS program will not be compiled, but executed directly by Node.


Execution – C# and F#

After the above compilation, the C# and F# programs will have a name like jbon007.exe. This program will be given (1) a mandatory parameter, the text file name (path); and (2) an optional parameter, the integer k, with the default value k=3. If errors occur, an error line must appear. Sample runs and outputs:

>jbon007.exe test1.txt 2
4 AA
2 CCC
>jbon007.exe test1.txt 10
4 AA
2 CCC
2 DD
1 BB
>jbon007.exe test1.txt
4 AA
2 CCC
2 DD
>jbon007.exe test1.txt x
*** Error: ...
>jbon007.exe testx.txt
*** Error: ...
>jbon007.exe
*** Error: ...


In the error texts, in place of ellipses (...), display either system created messages, or your custom messages (presumably more adequate). However, we do not prescribe these texts. The important part is that the programs must catch and display something meaningful for all possible errors.


Execution – Node JS

The JS program will be executed under Node, so the command lines will be longer and errors may look differently - but still all errors must be caught.

>node jbon007.js test1.txt 2
4 AA
2 CCC
 
 
>node jbon007.js test1.txt 10
4 AA
2 CCC
2 DD
1 BB
 
 
>node jbon007.js test1.txt
4 AA
2 CCC
2 DD
 
 
>node jbon007.js test1.txt x
*** Error: ...
 
 
>node jbon007.js testx.txt
*** Error: ...
 
 
>node jbon007.js
*** Error: ...


Functional style: Besides the results’ accuracy, your submission will also be assessed on its high-level functional style. For this assignment, we will be specifically looking at a consistent formatting, a reasonable size, and at the number of low-level explicit loops used, i.e. for, foreach, while statements.

Generally, a functional style naturally results in crisper and shorter programs (without artificially squeezing many expressions on one single line). The model solutions have about 50-60 lines, including a few blank lines inserted for readability, and no explicit loops.

As we will see in the lectures, functional programming offers high-level functions that encapsulate and hide most required loops.

Submission

Please submit to the Assignment Dropbox (https://adb.auckland.ac.nz/) the following items:

(1) Your C# source file, upi.cs - required

(2) Your Node JS source file, upi.js - required

(3) Your F# source file, upi.fs – allowed (optional bonus)

 



References (just in case you are interested to read more about this problem and its very efficient solutions)

Bentley, J., Knuth, D., McIlroy, D.: Programming pearls: A literate program. Commun. ACM 29(6), 471-483 (Jun 1986),

http://doi.acm.org/10.1145/5948.315654

Appendix

A sample bash script solution, based on McIlroy’s pipeline of UNIX tools. Again, no explicit loops. Just in case, as you do not have to study this if are not familiar with UNIX.

cat test1.txt |
tr -cs A-Za-z '\n' | awk 'NF' |
tr a-z A-Z | gsort  | uniq -c |
gsort -k1rn -k2 | head -3
#sed 3q

 


代写CS&Finance|建模|代码|系统|报告|考试

编程类:C代写,JAVA代写 ,数据库代写,WEB代写,Python代写,Matlab代写,GO语言,R代写

金融类统计,计量,风险投资,金融工程,R语言,Python语言,Matlab,建立模型,数据分析,数据处理

服务类:Lab/Assignment/Project/Course/Qzui/Midterm/Final/Exam/Test帮助代写代考辅导

天才写手,代写CS,代写finance,代写statistics,考试助攻

E-mail:850190831@qq.com   微信:BadGeniuscs  工作时间:无休息工作日-早上8点到凌晨3点


如果您用的手机请先保存二维码到手机里面,识别图中二维码。如果用电脑,直接掏出手机果断扫描。

1513067809116994.png


代写