web designer and developer

MusicNet

January - May 2014

MusicNet is a social network for music enthusiasts. It allows users to search through 1+ teribyte music database, rate, play and buy music, befriend other users, discover events based on user location, and buy personalized ads.

I worked on planing, documenting, designing, setting up and optimized the MySQL database and developing the dynamic website using PHP, HTML, CSS, and Bootstrap.

- GitHub Repository
- Development Plan

Music Net

ER Model:

  • Participation Constraints: albums must have at least one song, albums must have at least one artist
  • Overlap Constraint: disallow any overlap constraints. User are either a super user or an ordinary user.
  • Covering Constraints: Yes, there is a covering constraint where users have to be either a super user or an ordinary user.

Music Net

Optimization:

  • We received a 1TB data file, which we loaded into our MySQL database. When attempting to load the data from the corpus given to us, we realized that there were issues concerning duplicate entries. In order to comply with our previously decided schema structure our team decided to filter out duplicates by removing only the first of the duplicate entries. We did this by using Unix (sort, uniq) and Python (extracting all duplicate entries that use the same album_id, artist_id, etc.)
  • Next, we worked on writing out complex queries in which we knew might cause some performance issues. From these intial queries, we attempt to optimize them so that they run, upperbound 5s, aiming at around 1s.
  • The most difficult query we had to work with was:

    Find the songs that have never been played.
    Intially, we wrote this using a subquery, but we discovered that this method took to long. The solution was to use a left join or “not in”. See results

    select * from Songs S where S.song_id not in(select P.song_id from Plays P);
    238064 rows in set (4.55 sec)

  • Another example of queries we tested:

    For each song that is rated more than 1,000 times, find the song id and the average age of the users who rate the song.
    Performance Issue: The inital query written took about 34 seconds to execute
    Optimization: Use materialization. Create a table of from Ratings of all songs that have been rated more than 1,000 times
    Result:
    mysql> create table UserRates as( Select song_id, user_id, count(rating) as counter from Ratings group by song_id having (counter>1000));
    mysql> select R.song_id, avg(U.age) as avgAge from UserRates R, Users U where R.user_id=U.user_id group by R.song_id
    2178 rows in set (0.69 sec)

  • Other optimization methods we used were: indexing, materialization and creating temporary tables.