Friday, April 20, 2012

How to use ggplot to group and show top X categories?

I am trying to use use ggplot to plot production data by company and use the color of the point to designate year. The follwoing chart shows a example based on sample data:
enter image description here



However, often times my real data has 50-60 different comapnies wich makes the Company names on the Y axis to be tiglhtly grouped and not very asteticly pleaseing.



What is th easiest way to show data for only the top 5 companies information (ranked by 2011 quanties) and then show the rest aggregated and shown as "Other"?



Below is some sample data and the code I have used to create the sample chart:



# create some sample data
c=c("AAA","BBB","CCC","DDD","EEE","FFF","GGG","HHH","III","JJJ")

q=c(1,2,3,4,5,6,7,8,9,10)
y=c(2010)
df1=data.frame(Company=c, Quantity=q, Year=y)

q=c(3,4,7,8,5,14,7,13,2,1)
y=c(2011)
df2=data.frame(Company=c, Quantity=q, Year=y)

df=rbind(df1, df2)

# create plot
p=ggplot(data=df,aes(Quantity,Company))+
geom_point(aes(color=factor(Year)),size=4)
p


I started down the path of a brute force approach but thought there is probably a simple and elegent way to do this that I should learn. Any assistance would be greatly appreciated.





No comments:

Post a Comment