R Studio, the power of R

 

The Power of R,

 

My second blog post will be about R, the power of Free statistical software. To be honest I am brand new to R-Studio, I have just recently finished the R-Studio training on codeschool.com/  at first I was a bit sceptical, but I did really enjoy it. At first glance, it looks easy; the syntax feels natural and can be easily mastered with more practice.

For this assignment, I decided to produce the graph on movie genres changes. I am sure with the right graph we can find real variation between various movie genres. I got data set from http://grouplens.org/datasets/movielens/ the data set need some tidying up (extracting year of publishing, clearing the genres from movies etc. The data scrubbing is one of the most important parts of the any project. The data must be in uniform state, it should be consistent across all data set. In my case, some of the release years were missing. The biggest problem I had the movie genres were listed in one line:

movieId title genres
6365 Matrix Reloaded, The (2003) Action|Adventure|Sci-Fi|Thriller|IMAX

 

When I have finished working with Excel the data has the following form.

Year Genres % Year
1900 Romance 100.00%
1901 Documentary 100.00%
1902 Action 25.00%
1902 Adventure 25.00%
1902 Fantasy 25.00%
1902 Sci-Fi 25.00%
1903 Crime 50.00%

 

I will save this spreadsheet as .CSV and I will import it to R-Studio. After some research and many failed attempts, I finally got result; a chart-displaying movie releases in 1990-2000.

Movie RChart
Movie RChart

All this was achieved with simple code:

movies=read.csv(“movies_csv.csv”)

barplot(sci, main =”Movie distribution”, xlab = “Xlabel”, col=c(“lightblue”,”red”), legend= rownames(sci), besides=TRUE  )

barplot(table(movies$genres,movies$Year_),col=rainbow(19),main=”Movie releases by genre 2000 – 2010″)

legend(“topright”,

       legend = (unique(movies$genres)),

       fill = c(rainbow(19)))

I did only use R for few days, but from my experience I have two observations:

  • R is very powerful and has a lot of built in features and great community support.
  • The software had almost no GUI everything is code based, this is by far not a bad thing, just for some users with no coding experience it add unnecessary complexity.

Leave a Reply

Your email address will not be published. Required fields are marked *