Python数据结构代写 SQL代写 Python代写

Midterm 1, Spring 2021: Music recommender Version 1.0

Python数据结构代写 This problem builds on your knowledge of basic Python data structures and string processing. It has seven (7) exercises, numbered 0 to 6.

This problem builds on your knowledge of basic Python data structures and string processing. It has seven (7) exercises, numbered 0 to 6. There are eleven (11) available points. However, to earn 100%, the threshold is just 10 points. (Therefore, once you hit 10 points, you can stop. There is no extra credit for exceeding this threshold.)

Each exercise builds logically on the previous one, but you may solve them in any order. That is, if you can’t solve an exercise, you can still move on and try the next one. However, if you see a code cell introduced by the phrase, “Sample result(s) for …”, please run it. Some demo cells in the notebook may depend on these precomputed results.

The point values of individual exercises are as follows:

Exercise 0: 1 point
Exercise 1: 1 point
Exercise 2: 2 points
Exercise 3: 2 points
Exercise 4: 2 points
Exercise 5: 1 point
Exercise 6: 2 points

Pro-tips. Python数据结构代写

– Many or all test cells use randomly generated inputs. Therefore, try your best to write solutions that do not assume too much. To help you debug, when a test cell does fail, it will often tell you exactly what inputs it was using and what output it expected, compared to yours.

– If you need a complex SQL query, remember that you can define one using a triple-quoted (multiline) string.

– If your program behavior seem strange, try resetting the kernel and rerunning everything.

– If you mess up this notebook or just want to start from scratch, save copies of all your partial responses and use Actions → Reset Assignment to get a fresh, original copy of this notebook. (Resetting will wipe out any answers you’ve written so far, so be sure to stash those somewhere safe if you intend to keep or reuse them!)

– If you generate excessive output that causes the notebook to load slowly or not at all (e.g., from an ill-placed print statement), use Actions → Clear Notebook Output to get a clean copy. The clean copy will retain your code but remove any generated output. However, it will also rename the notebook to clean.xxx.ipynb. Since the autograder expects a notebook file with the original name, you’ll need to rename the clean notebook accordingly. Be forewarned: we won’t manually grade “cleaned” notebooks if you forget!

Good luck!

1.1 Background and overview: Spotify playlist data Python数据结构代写

Suppose you are running a musical service and would like to help your users discover artists based on artists they already like. In this problem, you’ll prototype a simple recommender by mining a dataset of user-generated playlists from Spotify, circa 2015.

Your overall workflow will be as follows:

Manually inspect the data and how it is stored
Gather some preliminary statistics to get a “feel” for the data
Clean the data a bit, namely by “normalizing” artist names
Use ideas from Notebook 2 to analyze artist co-occurrences in playlists

With that in mind, let’s start!

Modules and data. Run the following two code cells, which load some modules this notebook needs as well as the data itself.

The data for this problem are several hundred megabytes in size and so may take a minute to load.

In [1]: ### 
        ### AUTOGRADER TEST - DO NOT REMOVE 
        ### 

       from pprint import pprint
       from testing_tools import load_pickle
        print("Ready!")

Opening pickle from './resource/asnlib/publicdata/user_ids.pickle'...
Opening pickle from './resource/asnlib/publicdata/artist_names.pickle' ...
Opening pickle from './resource/asnlib/publicdata/playlist_names.pickle' ...
Opening pickle from './resource/asnlib/publicdata/track_titles.pickle' ...
Opening pickle from './resource/asnlib/publicdata/artist_translation_table.pickle' ... Ready!

In [2]: !date
        spotify_users = load_pickle('user_playlists.pickle')
        print("==> Finished loading the data.")
        !date

Sat 19 Feb 2022 02:50:39 PM PST

Opening pickle from './resource/asnlib/publicdata/user_playlists.pickle' ...
==> Finished loading the data.
Sat 19 Feb 2022 02:50:48 PM PST

1.1.1 Familiarize yourself with these data Python数据结构代写

The variable spotify_users holds the data you’ll need. It consists of a list of about 15,000 or so users:

In [3]: print(f"`spotify_users`: type == {type(spotify_users)}, number of elements == {len(spotify_users):,}.")

`spotify_users`: type == <class 'list'>, number of elements == 15,918.

Each element of this list corresponds to a distinct user. Have a look at the user at position 2526 of this list:

In [4]: pprint(spotify_users[2526])

{'playlists': [{'name': 'Favoritas de la radio',
                'tracks': [{'artist': 'Vico C', 'title': 'Desahogo'},
                           {'artist': 'Vico C',
                            'title': 'El Bueno, El Malo Y El Feo (The Good, ' 
                                     'The Bad & The Ugly) - Feat. Tego ' 
                                     'Calderón And Eddie Dee'},
                           {'artist': 'Vico C', 'title': 'Quieren'},
                           {'artist': 'Vico C',
                            'title': "Vamonos Po' Encima"}]},
                {'name': 'Starred',
                 'tracks': [{'artist': 'Vico C', 'title': 'El'},
                            {'artist': 'Strike 3', 'title': 'Enamorado De Ti'},
                            {'artist': 'Strike 3', 'title': 'Es Por Ti'}]},
                {'name': 'Two',
                 'tracks': [{'artist': 'Walk the Moon', 'title': 'Quesadilla'},
                            {'artist': 'Two Door Cinema Club',
                             'title': 'Sleep Alone'},
                            {'artist': 'Two Door Cinema Club',
                             'title': 'Something Good Can Work'},
                            {'artist': 'Two Door Cinema Club',
                             'title': 'Sun'}]}],
'user_id': '22c5af0c50b557327894d0c9ea6aa5fa'}

Every user has a unique user ID (a hex string) as well as a list of playlists that they have created. Each playlist is named and consists of a list of songs or tracks. Each track has a title and is performed by an artist (musician or group).

Take a minute to understand how this data is stored: note what data structures are being used (e.g., dictionaries versus lists), for what purpose, and how they are nested.

If you understand the storage scheme, you should be able to verify the following facts about the above user:

The user’s ID is ’22c5af0c50b557327894d0c9ea6aa5fa’.
The user has three playlists, one named ‘Favoritas de la radio’, another named ‘Starred’, and the last named ‘Two’.
The ‘Favoritas de la radio’ playlist has four songs, all of which were performed by the same artist, ‘Vico C’.
The ‘Starred’ playlist has one song also by ‘Vico C’, but includes two songs by a different artist, ‘Strike 3’.
The ‘Two’ playlist has four songs: one by ‘Walk the Moon’ and three by ‘Two Door Cinema Club’.

Other users may have only one playlist with just one song, or many playlists with many songs by many artists.

1.2 Part A: Preliminary analysis Python数据结构代写

To make sure you know how to navigate these data, let’s start with two basic exercises.

1.2.1 Exercise 0: count_playlists (1 point)

Given a user playlist dataset, users, complete the function, count_playlists(users) so that it returns the total number of playlists.

For instance, suppose the user dataset consists of the following two users:

In [5]: ex0_demo_users = [{'user_id': '0c8435917bd098dce8df8f62b736c0ed',
                           'playlists': [{'name': 'Starred',
                                          'tracks': [{'artist': 'André Rieu',
                                                      'title': 'Once Upon A Time In The West - Main Title Theme'},
                                                     {'artist': 'André Rieu',
                                                      'title': 'The  Second Waltz - From Eyes Wide Shut'}]}]},
                          {'user_id': 'fc799d71e8d2004377d6d8e861479559',
                           'playlists': [{'name': 'Liked from Radio',
                                          'tracks': [{'artist': 'The Police', 'title': 'Every Breath You Take'},
                                                     {'artist': 'Lucio Battisti', 'title': 'Per Una Lira'},
                                                     {'artist': 'Alicia Keys ft. Jay-Z', 'title': 'Empire State of Mind'}]},
                                         {'name': 'Starred', 'tracks': [{'artist': 'U2', 'title': 'With Or Without You'}]}]}]

Then count_playlists(ex0_demo_users) would return 1+2=3, because the first user has one playlist (named ‘Starred’) and the second has two playlists (one named ‘Liked from Radio’ and the other also named ‘Starred’).

In [6]: def get_user(users):
           for user in users:
               return user

        get_user(ex0_demo_users)
 
Out[6]: {'user_id': '0c8435917bd098dce8df8f62b736c0ed',
         'playlists': [{'name': 'Starred',
           'tracks': [{'artist': 'André Rieu',
             'title': 'Once Upon A Time In The West - Main Title Theme'},
           {'artist': 'André Rieu',
            'title': 'The Second Waltz - From Eyes Wide Shut'}]}]}

In [7]: def count_playlists(users):
            #count number of list 
            #users_dict = len(users) 
            #num_playlists = 0 
           return sum([len(user['playlists']) for user in users])

In [8]: # Demo cell 
        count_playlists(ex0_demo_users) # should return 3 

Out[8]: 3

In [9]: # Test cell 0: `mt1_ex0_count_playlists` (1 point) 

        ### 
        ### AUTOGRADER TEST - DO NOT REMOVE 
        ### 

        #from testing_tools import mt1_ex0__check 
        #print("Testing...") 

       assert count_playlists(spotify_users) == 231_844

       from testing_tools import mt1_ex0__check
       for trial in range(250):
        mt1_ex0__check(count_playlists)

print("\n(Passed!)")

(Passed!)

1.2.2 Exercise 1: count_artist_strings (1 point) Python数据结构代写

For your next task, suppose we wish to count how many distinct case insensitive artist strings are in the dataset (across all users and playlists). By “distinct case-insensitive,” we mean two strings a and b would be “equal” if, after conversion to lowercase, they are equal in the Python sense of a == b. For example, we would treat ‘Jay-Z’ and ‘JAY-Z’ as equal, but we would regard ‘Jay-Z’ (with a hyphen) and ‘Jay Z’ (without a hyphen) as unequal.

In a subsequent exercise, we will try to normalize names in a different way.

Your task.

Given a user playlist dataset, users, complete the function count_artist_strings(users) below so that it counts the number of distinct case-insensitive artist strings contained in users.

For example, recall the demo dataset from Exercise 0:

In [10]: pprint(ex0_demo_users)
[{'playlists': [{'name': 'Starred',
                 'tracks': [{'artist': 'André Rieu',
                             'title': 'Once Upon A Time In The West - Main '
                                      'Title Theme'},
                           {'artist': 'André Rieu',
                            'title': 'The Second Waltz - From Eyes Wide '
                                     'Shut'}]}],
  'user_id': '0c8435917bd098dce8df8f62b736c0ed'},
{'playlists': [{'name': 'Liked from Radio',
                'tracks': [{'artist': 'The Police',
                            'title': 'Every Breath You Take'},
                           {'artist': 'Lucio Battisti',
                            'title': 'Per Una Lira'},
                           {'artist': 'Alicia Keys ft. Jay-Z',
                            'title': 'Empire State of Mind'}]},
               {'name': 'Starred',
                'tracks': [{'artist': 'U2', 'title': 'With Or Without You'}]}],
'user_id': 'fc799d71e8d2004377d6d8e861479559'}]

Looking across all users and playlists, this dataset has five (5) distinct artist strings: ‘André Rieu’, ‘The Police’, ‘Lucio Battisti’, ‘Alicia Keys ft. Jay-Z’, and ‘U2’. Observe that ‘André Rieu’ appears twice, but for our tally, we would count it just once. And if ‘the POLICE’ had been in the data, then it would be consider the same as ‘The Police’.

Note: Your function must not modify the input dataset. Even if your code returns a correct result, if it changes the input data, the autograder will mark it as incorrect.

In [11]: def count_artist_strings(users):
             #for track in playlist['tracks]itemartist 

             art_set = {track['artist'] for user in users for                                    playlist in user['playlists'] for track in playlist['tracks']}
            return len(art_set)

In [12]: # Demo: Should return '5' 
         count_artist_strings(ex0_demo_users)

Out[12]: 5

In [13]: # Test cell 0: `mt1_ex1_count_artist_strings` (1 point) 

         ### 
         ### AUTOGRADER TEST - DO NOT REMOVE  
         ### 

         #assert count_artist_strings(spotify_users) == 282_555 

        from testing_tools import mt1_ex1__check
         print("Testing...")
        for trial in range(250):
              mt1_ex1__check(count_artist_strings)

         print("\n(Passed!)")

Testing…

(Passed!)

Answer for this dataset. If your function works correctly, running it on the full Spotify dataset would result in 282,555 distinct case-insensitive artist strings. That’s a lot of artists! (We have omitted this check to reduce the running time of the notebook.)

1.3 Part B: Data cleaning Python数据结构代写

Unfortunately, artist names are encoded in a messy fashion. Here are some examples:

The artist “Jay-Z” is written as “Jay-Z” and “JAY Z”, with several other variations having different capitalization.
Worse, there is no consistent standard for encoding multiple artists who worked together on a song. For example, here is how several of Jay-Z’s collaborations appear:

– 'Alicia Keys ft. Jay-Z' ==>('ft.' used as an artist-separator) 
– 'A-Trak x Kanye x Jay-Z' ==> (' x ' used as an artist-separator) 
– 'JAY Z Featuring Beyoncé' ==> (variation on "Jay-Z" and yet another variation on "featuring" to separate artists) 
– 'Jay-Z Featuring Beyoncé Knowles' ==> (Beyoncé’s last name included in this variation) 
– 'Jay-Z/Kanye West/Lil Wayne/T.I.' ==> (... you get the idea ...) 
– 'Jay Z (Dr. Dre, Rakim, & Truth Hurts)'
– 'Young Jeezy Ft. Jay-Z & Fat Joe'
– 'Lil Wayne Drake Jay-Z And Gif Majorz' ==> (spaces used ambiguously: there are four artists in this example!) 
– 'Timbaland & Magoo feat Jay-Z'
– 'OutKast/Jay-Z/Killer Mike'
– 'Jay-Z Ft.Rihanna And Kanye West'
– 'Pat Benetar vs. Beyonce vs. 3OH!3 Feat. Britney Spears, Christina Aguilera, & M.I.A.' ==> ("Benatar" is misspelled as 'Benetar') 
– 'jay z with the roots. s' ==> (yes, excess spaces between . and trailing s are real)

It is difficult to design a robust algorithm to extract individual artist names. Instead, let’s use the following approximate algorithm, given an artist-string as input. 1. Lowercase: First, convert all characters to lowercase.

2. Space-equivalents: Next, convert any hyphen (‘-‘), period (‘.’), question mark (‘?’), exclamation point (‘!’), and underscores (‘_’) into a space character.

3. Separators: Then split the string, treating the following patterns as artist-name separators: 1. All of the following words, but only when there are spaces both before and after: ‘and’, ‘with’, ‘ft’, ‘feat’, ‘featuring’, ‘vs’, and ‘x’. 2. All of the following symbols: ‘/’ (forward slash), ‘&’ (ampersand), comma (‘,’), semicolon (‘;’), and each enclosing parenthesis or bracket (‘(‘, ‘)’, ‘[‘, ‘]’, ‘{‘, ‘}’)

4. Whitespace compression: Lastly, for any artist name-string following the above separation steps, strip out any preceding and trailing whitespace and collapse multiple consecutive whitespace characters into a single space.

When applying this algorithm, we’ll perform steps 1-4 in the exact same sequence as shown above.

1.3.1 Exercise 2: extract_artists (2 points) Python数据结构代写

Complete the function extract_artists(artist) so that it applies the artist name-separation algorithm described above, returning a Python set consisting of the separate artist names. For example:

'Alicia Keys ft. Jay-Z' ==> {'jay z', 'alicia keys'}
'A-Trak x Kanye x Jay-Z' ==> {'a trak', 'jay z', 'kanye'}
'JAY Z Featuring Beyoncé' ==> {'jay z', 'beyoncé'}
'Jay-Z Featuring Beyoncé Knowles' ==> {'beyoncé knowles', 'jay z'}
'Jay-Z/Kanye West/Lil Wayne/T.I.' ==> {'lil wayne', 'jay z', 't i', 'kanye west'}
'Young Jeezy Ft. Jay-Z & Fat Joe' ==> {'fat joe', 'jay z', 'young jeezy'}
'Lil Wayne Drake Jay-Z And Gif Majorz' ==> {'gif majorz', 'lil wayne drake jay z'}
'Timbaland & Magoo feat Jay-Z' ==> {'jay z', 'timbaland', 'magoo'}
'OutKast/Jay-Z/Killer Mike' ==> {'outkast', 'jay z', 'killer mike'}
'Jay-Z Ft.Rihanna And Kanye West' ==> {'rihanna', 'jay z', 'kanye west'}
'Pat Benetar vs. Beyonce vs. 3OH!3 Feat. Britney Spears, Christina Aguilera, & M.I.A.' ==> {'m i a', 'beyonce', 'christina aguilera', 'pat benetar', 'britney spears', '3oh 3'}
'jay z with the roots. s' ==> {'jay z', 'the roots s'}

Note 0: Pay close attention to the target output.

Note 1: This procedure is imperfect. For example, observe that ‘Lil Wayne Drake Jay-Z And Gif Majorz’ is, in reality, four artists (Lil’ Wayne, Drake, Jay-Z, and Gif Majorz), but the algorithm cannot disambiguate the intention of spaces. Also, in the last example, even though in reality ‘the roots. s’ should resolve to ‘the roots’, it instead becomes ‘the roots s’. And a band like ‘Tom Petty and the Heartbreakers’ will be erroneously split into two artists (‘Tom Petty’ and ‘the Heartbreakers’). But it is what it is.

In [14]: def extract_artists(artist):
            import re 
             artist =artist.lower()#lower case conversion 

             #Space-equivalents: Next, convert any hyphen ('-'), 
             #period ('.'), question mark ('?'), exclamation point ('!'), and underscores ('_') into a space character. 
             special_chars = "-.?!_"
            for special_char in special_chars:
                  artist = artist.replace(special_char, ' ')

             #separator 
             #All of the following words, but only when there are spaces both before 
             #and after: 'and', 'with', 'ft', 'feat', 'featuring', 'vs', and 'x'. 
             #All of the following symbols: '/' (forward slash), '&' (ampersand), # 
             #comma (','), semicolon (';'), and each enclosing parenthesis or bracket ('(', ')', '[', ']', '{', '}') 
             special_chars2 ="/&#,;()[]{}"
            for special_char2 in special_chars2:
                  artist = artist.replace(special_char2, '*')
             special_chars3 = ['and', 'with', 'ft', 'feat', 'featuring', 'vs', 'x']
            for special_char3 in special_chars3:
                  artist = artist.replace(special_char3, '*')

              artist = artist.split('*')
              artist = [a.strip() for a in artist]

            # collapse multiple consecutive whitespace characters into a single space. 

           return {re.sub('\s+',' ',a) for a in artist if a!=''}

In [15]: # Demo 
         ex0_inputs = ['Alicia Keys ft. Jay-Z',
                       'A-Trak x Kanye x Jay-Z',
                       'JAY Z Featuring Beyoncé',
                       'Jay-Z Featuring Beyoncé Knowles',
                       'Jay-Z/Kanye West/Lil Wayne/T.I.',
                       'Young Jeezy Ft. Jay-Z & Fat Joe',
                       'Lil Wayne Drake Jay-Z And Gif Majorz',
                       'Timbaland & Magoo feat Jay-Z',
                       'OutKast/Jay-Z/Killer Mike',
                       'Jay-Z Ft.Rihanna And Kanye West',
                       'Pat Benetar vs. Beyonce vs. 3OH!3 Feat. Britney Spears, Christina Aguilera, & M.I.A.',
                       'jay z with the roots. s']
    for a in ex0_inputs:
         print(f"'{a}' ==> {extract_artists(a)}")

'Alicia Keys ft. Jay-Z' ==> {'jay z', 'alicia keys'}
'A-Trak x Kanye x Jay-Z' ==> {'a trak', 'jay z', 'kanye'}
'JAY Z Featuring Beyoncé' ==> {'uring beyoncé', 'jay z'}
'Jay-Z Featuring Beyoncé Knowles' ==> {'jay z', 'uring beyoncé knowles'}
'Jay-Z/Kanye West/Lil Wayne/T.I.' ==> {'lil wayne', 'kanye west', 'jay z', 't i'}
'Young Jeezy Ft. Jay-Z & Fat Joe' ==> {'jay z', 'fat joe', 'young jeezy'}
'Lil Wayne Drake Jay-Z And Gif Majorz' ==> {'lil wayne drake jay z', 'gif majorz'}
'Timbaland & Magoo feat Jay-Z' ==> {'magoo', 'timbal', 'jay z'}
'OutKast/Jay-Z/Killer Mike' ==> {'outkast', 'jay z', 'killer mike'}
'Jay-Z Ft.Rihanna And Kanye West' ==> {'kanye west', 'rihanna', 'jay z'}
'Pat Benetar vs. Beyonce vs. 3OH!3 Feat. Britney Spears, Christina Aguilera, & M.I.A.' ==> {'m i a', 'beyonce', '3oh 3', 'pat benetar', 'christina aguilera', 'britney spears'}
'jay z with the roots. s' ==> {'jay z', 'the roots s'}

In [16]: # Test cell: `mt1_ex2_extract_artists` (2 points) 
         ### 
         ### AUTOGRADER TEST - DO NOT REMOVE 
         ### 

        from testing_tools import mt1_ex2__check
         print("Testing...")
        for trial in range(250):
              mt1_ex2__check(extract_artists)

         extract_artists__passed = True 
         print("\n(Passed!)")

Testing...
=== Failing test case ===
* Input: 'Inft'
* Expected output: {'inft'} (type <class 'set'>)
* Your soln: {'in'} (type <class 'set'>)

---------------------------------------------------------------------------

AssertionError                      Traceback (most recent call last)

       <ipython-input-16-5beb43b5b1e0> in <module>
        8 print("Testing...")
        9 for trial in range(250):
  ---> 10 mt1_ex2__check(extract_artists)
       11
       12 extract_artists__passed = True

       ~/testing_tools.py in mt1_ex2__check(extract_artists)
       284 assert isinstance(your_soln, set), \
       285 f"Your function returned an object of type {type(your_soln)} instead of a Python `set`."
   --> 286 assert your_soln == soln, "Solutions do not match!"
       287 except:
       288 print("=== Failing test case ===")

AssertionError: Solutions do not match!

1.3.2 Sample results for Exercise 2: artist_translation_table

If you had a working solution to Exercise 2, then in principle you could use it to normalize and separate the artist names. We have precomputed these translations for you, for every artist name that appears in the data; run the cell below to load a name-translation table, stored in the variable, artist_translation_table.

Read and run this cell even if you skipped or otherwise did not complete Exercise 2.

In [17]: from testing_tools import mt1_artist_translation_table as artist_translation_table

         print("\n=== Examples ===")
        for q in ex0_inputs[:5]:
             print(f"artist_translation_table['{q}'] \\\n 

== {artist_translation_table[q]}")
=== Examples ===
  artist_translation_table['Alicia Keys ft. Jay-Z'] \
== {'jay z', 'alicia keys'}
  artist_translation_table['A-Trak x Kanye x Jay-Z'] \
== {'a trak', 'jay z', 'kanye'}
  artist_translation_table['JAY Z Featuring Beyoncé'] \
== {'beyoncé', 'jay z'}
  artist_translation_table['Jay-Z Featuring Beyoncé Knowles'] \
== {'jay z', 'beyoncé knowles'}
  artist_translation_table['Jay-Z/Kanye West/Lil Wayne/T.I.'] \
== {'lil wayne', 'kanye west', 'jay z', 't i'}

1.4 Part C: Gathering playlists Python数据结构代写

The data structure has a complicated nesting. Let’s “flatten” it by collecting just the playlists. And for each playlist, let’s keep only the artist names.

For example, recall the demo dataset from before:

In [18]: pprint(ex0_demo_users)

[{'playlists': [{'name': 'Starred',
                 'tracks': [{'artist': 'André Rieu',
                             'title': 'Once Upon A Time In The West - Main '
                                      'Title Theme'},
                            {'artist': 'André Rieu',
                             'title': 'The Second Waltz - From Eyes Wide '
                                      'Shut'}]}],
   'user_id': '0c8435917bd098dce8df8f62b736c0ed'},
  {'playlists': [{'name': 'Liked from Radio',
                  'tracks': [{'artist': 'The Police',
                              'title': 'Every Breath You Take'},
                             {'artist': 'Lucio Battisti',
                              'title': 'Per Una Lira'},
                             {'artist': 'Alicia Keys ft. Jay-Z',
                              'title': 'Empire State of Mind'}]},
                             {'name': 'Starred',
                              'tracks': [{'artist': 'U2', 'title': 'With Or Without You'}]}],
                   'user_id': 'fc799d71e8d2004377d6d8e861479559'}]

The first user has one playlist with two tracks by the same artist. The second user has two playlists, one playlist with three tracks and four artists (since one track has a compound artist name), and the other playlist with one track.

For our next task, we’d like to construct a copy of this data with the following simpler structure:

In [19]: ex3_demo_output = [{'André Rieu'},
                            {'The Police', 'Lucio Battisti', 'Alicia Keys ft. Jay-Z'},
                            {'U2'}]

This object is simply a Python list of Python sets, with the “outer” list containing playlists and each playlist consisting only of distinct artist strings (without postprocessing per Exercise 2—we’ll handle that later).

1.4.1 Exercise 3: extract_playlists (2 points) Python数据结构代写

Complete the function, extract_playlists(users), so that it returns the simplified list of artist names as shown above. For instance, calling extract_playlists(ex0_demo_users) should return an object that matches ex3_demo_output.

Note 0: You should not process the artist names per Exercise 2; that step comes later.

Note 1: You should preserve the exact order of playlists from the input. That is, you should loop over users and playlists in the order that they appear in the input and produce the corresponding output in that same order.

Note 2: Do not forget that the final output should be a Python list (holding playlists) of Python sets (unprocessed artist names).

Note 3: Your function should not modify the input dataset.

In [19]: def extract_playlists(users):

In [20]: # Demo cell 
         ex3_your_output = extract_playlists(ex0_demo_users)
         print("=== Your output ===")
         pprint(ex3_your_output)

        assert all(a == b for a, b in zip(ex3_your_output,           ex3_demo_output)), "Your output does not match the demo output!"
         print("\n(Your output matches the demo output so far, so good!)")

In [21]: # Test cell: `mt1_ex3_extract_playlists` (2 points) 
 
         ### 
         ### AUTOGRADER TEST - DO NOT REMOVE 
         ### 

        from testing_tools import mt1_ex3__check
         print("Testing...")
        for trial in range(250):
              mt1_ex3__check(extract_playlists)

         extract_playlists__passed = True 
         print("\n(Passed!)")

1.4.2 Sample results for Exercise 3: simple_playlists

If you had a working solution to Exercise 3, then in principle you could use it to construct simpli- fied playlists for the full Spotify dataset. Instead, we have precomputed these for you, for playlist in that dataset; run the cell below to load it into a variable named simple_playlists.

Read and run this cell even if you skipped or otherwise did not complete Exercise 3.

In [22]: simple_playlists = load_pickle('simple_playlists.pickle')

         print("\n=== Examples (first three playlists) ===")
         pprint(simple_playlists[:3])

1.5 Part D: An itemset representation Python数据结构代写

Our artist-recommender system will reuse ideas from Notebook 2 (pairwise association rule mining). The next two exercises do so.

But first, we’ll need to identify analogues of baskets (or receipts) and items for our artist-recommender problem. Here is how we’ll do that.

Receipts (baskets): Let’s consider each playlist to be a receipt.
Items: Let’s consider each distinct artist (after name normalization per Exercise 2!) to be an item.

Example. Recall the simplified playlists example, ex3_demo_output, from Exercise 3:

In [23]: print(ex3_demo_output)

Since there are three playlists, there are three “receipts.” We want to treat each one as an itemset consisting of normalized artist names, per Exercise 2.

In [24]: ex4_demo_output = [{'andré rieu'}, {'the police', 'lucio battisti', 'alicia keys', 'jay z'}, {'u2'}]

Observe that the second playlist includes one track having a compound artist name, ‘Alicia Keys ft. Jay-Z’. In these instances, each collaborating artist should become an element of the itemset. Here, both ‘alicia keys’ and ‘jay z’ appear in the output.

Whether your Exercise 2 works or not, recall that we precomputed translations from raw artist name strings to itemsets. These are stored in artist_translation_table, e.g.:

In [25]: artist_translation_table['Alicia Keys ft. Jay-Z']

Code reuse from Notebook 2. In addition to its concepts, Notebook 2 also has a lot of code we want you to reuse.

For example, recall the make_itemsets(receipts) function. Given a bunch of receipts, it converts each receipt into an itemset, a Python set of its items. Here is a generalized version of that code, which allows the user to supply a function, make_set, for converting one receipt into an itemset.

In [26]: def make_itemsets(receipts, make_set=set):
            return [make_set(r) for r in receipts]

For example, recall how this function worked in the case where “words” are receipts and the individual letters are itemsets. Furthermore, simply calling the default set on one receipt creates an itemset:

In [27]: make_itemsets(['hello', 'world'])

To use make_itemsets for our problem, we need to create a function that is compatible with the requirements of the make_set argument. That is your next task.

1.5.1 Exercise 4: normalize_artist_set (2 points) Python数据结构代写

Complete the function, normalize_artist_set(artist_set), where artist_set is a Python set of unprocessed artist names. It should return a Python set of normalized artist names, per Exercise 2.

For instance,

normalize_artist_set({'Alicia Keys ft. Jay-Z', 'Lucio Battisti', 'The Police'})

should return

{'the police', 'lucio battisti', 'alicia keys', 'jay z'}

Note:

You may reuse your function from Exercise 2, if you are confident it is bug-free; otherwise, we recommend using the precomputed values in artist_translation_table.

In [28]: def normalize_artist_set(artist_set):
             ### 
             ### YOUR CODE HERE 
             ### 

In [29]: # Demo cell: 
         normalize_artist_set({'Alicia Keys ft. Jay-Z', 'Lucio Battisti', 'The Police'})
             # expected output: `{'alicia keys', 'jay z', 'lucio battisti', 'the police'}` 

In [30]: # Test cell: `mt1_ex4_normalize_artist_set` (2 points) 

         ### 
         ### AUTOGRADER TEST - DO NOT REMOVE 
         ### 

        from testing_tools import mt1_ex4__check
         print("Testing...")
        for trial in range(250):
              mt1_ex4__check(normalize_artist_set)

         normalize_artist_set__passed = True 
         print("\n(Passed!)")

1.5.2 Sample results for Exercise 4: artist_itemsets

If you had a working solution to Exercise 4, then in principle you could use it to construct artist itemsets for all of the playlists. Instead, we have precomputed these for you; run the cell below to load it into a variable named artist_itemsets.

Read and run this cell even if you skipped or otherwise did not complete Exercise 4.

In [31]: artist_itemsets = load_pickle('normalized_artist_sets.pickle')

         print("\n=== Examples (first three playlists) ===")
         pprint(artist_itemsets[:3])

1.5.3 Exercise 5: get_artist_counts (1 point) Python数据结构代写

For the Notebook 2 analysis, we also needed a way to count in how many receipts each item occurred. That’s your next task.

Given a collection of artist itemsets, complete the function get_artist_counts(itemsets) so that it returns a dictionary-like object with artist names as keys and the number of occurrences as values.

For example, suppose you start with these three itemsets:

itemsets = [{'alicia keys', 'jay z', 'lucio battisti', 'the police'}, {'u2', 'the police'}, {'jay z'}]

Then get_artist_counts(itemsets) should return:

{'alicia keys': 1, 'jay z': 2, 'lucio battisti': 1, 'the police': 2, 'u2': 1}

Note: By “dictionary-like,” we mean either a conventional Python dictionary or a collections.defaultdict, as you prefer.

Hint: Recall update_item_counts from Notebook 2, which we’ve provided again in the code cell below.

In [32]: def update_item_counts(item_counts, itemset):
            for a in itemset:
             item_counts[a] += 1

        def get_artist_counts(itemsets):
             ### 
             ### YOUR CODE HERE 
             ### 

In [33]: # Demo cell: 
         itemsets = [{'alicia keys', 'jay z', 'lucio battisti', 'the police'}, {'u2', 'the police'}, {'jay z'}]
         get_artist_counts(itemsets)

In [34]: # Test cell: `mt1_ex5_get_artist_counts` (1 point) 

         ### 
         ### AUTOGRADER TEST - DO NOT REMOVE 
         ### 

        from testing_tools import mt1_ex5__check
         print("Testing...")
        for trial in range(250):
              mt1_ex5__check(get_artist_counts)

         get_artist_counts__passed = True 
         print("\n(Passed!)")

1.5.4 Sample results for Exercise 5: artist_counts

If you had a working solution to Exercise 5, then in principle you could run get_artist_counts(artist_itemsets) to count the number of occurrences of all artists. Instead, we have precomputed these for you; run the cell below to load it into the object, artist_counts.

Read and run this cell even if you skipped or otherwise did not complete Exercise 5.

In [35]: artist_counts = load_pickle('artist_counts.pickle')

         print("Examples:")
        for a in ['lady gaga', 'fats domino', 'kishi bashi']:
             print(f"* Artist '{a}' appears in {artist_counts[a]:,} playlists.")

1.6 Part E: A simple artist-recommender system Python数据结构代写

We now have all the pieces we need to build a recommender system to help users find artists they might like, building on Notebook 2’s pairwise association-rule miner. However, we’ll need a modified procedure.

Why? Recall how many artists there are (run the cell below):

In [36]: print(f'The dataset has {len(artist_counts):,} artists! (After name normalization per Exercise 2.)')

That’s a lot! So rather than finding all association rules, let’s use the following procedure instead.

First, suppose a user has given us the name of one artist they already like. Call that the root artist.
Filter all playlists to only those containing the root artist. Call these the root playlists (or root itemsets).
For each root playlist, remove any artists that are “uncommon,” based on a given threshold. However, do not remove the root artist; those should always be kept, whether common or not. Call these resulting playlists the pruned playlists.
Run the pairwise association rule miner on these pruned playlists, which should be smaller and thus faster to process, and report the top result(s).

For your last exercise, we’ll give you code for Step 2 and need you to combine it with Step 3. We will provide the rest, and if your procedure works, you’ll be able to try it out!

Filtering step. Here is code we are providing for Step 2 of this proposed recommender algorithm (filter playlists).

In [37]: def filter_itemsets(root_item, itemsets):
            return [s for s in itemsets if root_item in s]

Here is a demo of filter_itemsets, which generates “root playlists” for the artist, “Kishi Bashi.”

Pop-up Video / Behind The Lyrics trivia: At the time of this exam (Spring 2021), Kishi Bashi lives in Athens, Georgia, USA, about 90-minutes or so outside Atlanta!

In [38]: root_playlists_for_kishi_bashi = filter_itemsets('kishi bashi', artist_itemsets)
         print(f"Found {len(root_playlists_for_kishi_bashi)} playlists containing 'kishi bashi.'")
         print("Example:", root_playlists_for_kishi_bashi[2])

1.6.1 Exercise 6: prune_itemsets (2 points) Python数据结构代写

Complete the function,

def prune_itemsets(root_item, itemsets, item_counts, min_count):
     ...

so that it implements Step 2 and Step 3 of the recommender. That is, the inputs are:

root_item: The root item (i.e., the root artist name)
itemsets: A collection of itemsets
item_counts: A pre-tabulated count of how many times each item appears in an itemset
min_count: The minimum number of itemsets in which an item should appear to be considered a recommendation

Your function should return the playlists pruned as follows:

Filter the itemsets to only those containing root_item. The resulting itemsets are the filtered itemsets.
For each filtered itemset, remove any item where item_counts[a] < min_count. However, do not remove root_item, regardless of its count.
The resulting itemsets are the pruned itemsets. Discard any pruned itemsets that contain only the root item. Return the remaining pruned itemsets as a Python list of sets.

Note 0: Although the procedure above is written as though your function will modify its input arguments, it must not do so. Use copies as needed instead. The test cell will not pass if you modify the input arguments.

Note 1: You can return pruned itemsets in any order. (So if the test cell does not pass, it is not because it assumes results in a particular order.)

Example. Suppose the itemsets and item counts are given as follows:

In [39]: ex6_demo_itemsets = [{'alicia keys', 'jay z', 'lucio battisti', 'the police'}, {'u2', 'the police'}, {'jay z'}]
         ex6_demo_item_counts = {'alicia keys': 1, 'jay z': 2, 'lucio battisti': 1, 'the police': 2, 'u2': 1}

Then

prune_itemsets('the police', ex6_demo_itemsets, ex6_demo_item_counts, 2)

will end up returning a list with just one itemset, [{‘the police’, ‘jay z’}]. That’s because only two itemsets have ‘the police’ in them, and of those, only one has at least one item whose count exceeds min_count=2.

In [40]: def prune_itemsets(root_item, itemsets, item_counts, min_count):
             ### 
             ### YOUR CODE HERE 
             ### 

In [41]: # Demo cell: 
         prune_itemsets('the police', ex6_demo_itemsets, ex6_demo_item_counts, 2)

In [42]: # Test cell: `mt1_ex6_prune_itemsets` (2 points) 

    ### 
    ### AUTOGRADER TEST - DO NOT REMOVE 
    ### 
 
    from testing_tools import mt1_ex6__check
     print("Testing...")
    for trial in range(250):
         mt1_ex6__check(prune_itemsets)

     prune_itemsets__passed = True 
     print("\n(Passed!)")

1.7 Fin! Python数据结构代写

If you passed the preceding exercise, then you have all the pieces necessary to try your recommendation algorithm! It is optional to do so, but if you have any time left, pick your favorite artist (assuming they are in the dataset) and see if you get reasonable results.

Otherwise, you’ve reached the end of this problem. Don’t forget to restart and run all cells again to make sure your code works when running all code cells in sequence; and make sure your work passes the submission process. Good luck!

In [43]: assert prune_itemsets__passed == True, "Are you sure you passed Exercise 6?"

         # `recommend` implements the complete recommender algorithm 
        def recommend(root_artist, conf=0.2, min_count=1000, verbose=True):
            from cse6040nb2 import find_assoc_rules, print_rules
            global artist_itemsets, artist_counts
              print("Pruning...")
              pruned_playlists = prune_itemsets(root_artist,                    artist_itemsets, artist_counts, min_count)
              num_artists = sum(len(p) for p in pruned_playlists)
              print("\t", len(pruned_playlists), "itemsets remain with", num_artists, "artists.")
              print("Finding association rules...")
              rules = find_assoc_rules(pruned_playlists, conf)
              rules = {(a, b): c for (a, b), c in rules.items() if a == root_artist}
              print("\t", len(rules), f"rules of the form `conf('{root_artist}' => x) >= {conf}")
              print(f"\n=== Our top recommendations for '{root_artist}' ===")
              print_rules(rules, limit=20)

         # DEMO: 'kishi bashi' produces some spurious results because 
         # both "Of Monsters and Men" and "Mumford and Sons" are 
         # erroneously split into two. 
         recommend('kishi bashi')

合作平台：essay代写论文代写写手招聘英国留学生代写