forked from theofpa/datascience
-
Notifications
You must be signed in to change notification settings - Fork 2
/
overview.Rmd
137 lines (118 loc) · 6.65 KB
/
overview.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: "Data Scientists' bookmarks"
output: html_document
---
Taxonomy of links given in the coursera data science specialization, since the courses are no longer available after finishing them.
I would also call it the data scientists' bookmarks.
# News
* [O'Reilly Radar - Data](http://radar.oreilly.com/data)
* [R-bloggers](http://www.r-bloggers.com)
* [Flowing Data](http://flowingdata.com)
* [r/machinelearning](http://www.reddit.com/r/machinelearning)
* [DataTau](http://www.datatau.com)
* [Simply Statistics](http://simplystatistics.org)
* [R-statistics](http://www.r-statistics.com)
* [ErrorStatistics](http://errorstatistics.com)
# Curriculum
* [Zipfian Academy](http://www.zipfianacademy.com/blog/post/46864003608/a-practical-intro-to-data-science)
* [OpenSource DataScience](http://datasciencemasters.org)
* [Metacademy](http://metacademy.org)
# Toolbox
* [R](http://www.r-project.org/)
* [R short reference card](http://cran.r-project.org/doc/contrib/Short-refcard.pdf)
* [Bioconductor project](http://master.bioconductor.org/install/)
* [CRAN](http://cran.r-project.org)
* [RStudio](http://www.rstudio.com/)
* [GitHub](http://github.com)
* Plotting systems
* R base
* lattics
* [ggplot2](http://docs.ggplot2.org/current/)
* Markdown
* LaTeX
* R Markdown
* [RPubs](https://rpubs.com)
* [WolframAlpha](http://www.wolframalpha.com)
* [rCharts](http://rcharts.io)
* [rMaps](http://rmaps.github.io)
* [slidify](http://ramnathv.github.io/slidify/)
* [Shiny](http://shiny.rstudio.com), [Shiny server](http://www.rstudio.com/shiny/server/)
* [devtools](https://github.com/hadley/devtools)
* [roxygen2](https://github.com/klutometis/roxygen)
* [testthat](https://github.com/hadley/testthat)
* [Google Ngram](https://books.google.com/ngrams)
* [httr](http://cran.r-project.org/web/packages/httr/httr.pdf) for web scraping
* [plyr](http://plyr.had.co.nz/09-user/)
* [reshape](http://www.slideshare.net/jeffreybreen/reshaping-data-in-r)
* [ProjectTemplate](http://projecttemplate.net)
* [Caret](http://caret.r-forge.r-project.org/)
* [Model training and tuning](http://caret.r-forge.r-project.org/training.html)
* [Statistics Toolbox](http://www.mathworks.nl/help/stats/index.html)
* [Neural Network Toolbox](http://www.mathworks.nl/products/neural-network/)
* [ML resources](http://www.sciencemag.org/site/feature/data/compsci/machine_learning.xhtml)
* [Ramdom Forests](http://www.stat.berkeley.edu/%7Ebreiman/RandomForests/cc_home.htm)
* [Google correlate](http://www.google.com/trends/correlate)
* [Google Vis API](https://developers.google.com/chart/interactive/docs/gallery). [GoogleVis](http://cran.r-project.org/web/packages/googleVis/googleVis.pdf)
# Data stores
* [XML](http://www.stat.berkeley.edu/~statcur/Workshop2/Presentations/XML.pdf)
* [RPostresSQL](https://code.google.com/p/rpostgresql/)
* [RODBC](http://cran.r-project.org/web/packages/RODBC/vignettes/RODBC.pdf)
* [RMongo](http://www.r-bloggers.com/r-and-mongodb/)
* [jpeg](http://cran.r-project.org/web/packages/jpeg/index.html)
* [readbitmap](http://cran.r-project.org/web/packages/readbitmap/index.html)
* [png](http://cran.r-project.org/web/packages/png/index.html)
* [EBImage](http://www.bioconductor.org/packages/2.13/bioc/html/EBImage.html)
* [rgdal](http://cran.r-project.org/web/packages/rgdal/index.html)
* [rgeos](http://cran.r-project.org/web/packages/rgeos/index.html)
* [raster](http://cran.r-project.org/web/packages/raster/index.html)
* [tuneR](http://cran.r-project.org/web/packages/tuneR/)
* [seewave](http://rug.mnhn.fr/seewave/)
# Data resources
* [Gapminder](http://www.gapminder.org/)
* [Survey data](http://www.asdfree.com/)
* [Marketplace](http://www.infochimps.com/marketplace)
* [Kaggle](http://www.kaggle.com/)
* [PLOS](http://api.plos.org/)
* [rOpenSci](http://ropensci.org/packages/index.html)
* [Stanford Large Network Data](http://snap.stanford.edu/data/)
* [UCI Machine Learning](http://archive.ics.uci.edu/ml/)
* [KDD Nugets Datasets](http://www.kdnuggets.com/datasets/index.html)
* [CMU Statlib](http://lib.stat.cmu.edu/datasets/)
* [Gene expression omnibus](http://www.ncbi.nlm.nih.gov/geo/)
* [ArXiv Data](http://arxiv.org/help/bulk_data)
* [Public Data Sets on Amazon Web Services](http://aws.amazon.com/publicdatasets/)
* Hilary Mason http://bitly.com/bundles/hmason/1
* Peter Skomoroch https://delicious.com/pskomoroch/dataset
* Jeff Hammerbacher http://www.quora.com/Jeff-Hammerbacher/Introduction-to-Data-Science-Data-Sets
* Gregory Piatetsky-Shapiro http://www.kdnuggets.com/gps.html
* [http://blog.mortardata.com/post/67652898761/6-dataset-lists-curated-by-data-scientists](http://blog.mortardata.com/post/67652898761/6-dataset-lists-curated-by-data-scientists)
* [twitter](https://dev.twitter.com/) and [twitteR](http://cran.r-project.org/web/packages/twitteR/index.html) package
* [figshare](http://api.figshare.com/docs/intro.html) and [rfigshare](http://cran.r-project.org/web/packages/rfigshare/index.html)
* [PLoS](http://api.plos.org/) and [rplos](http://cran.r-project.org/web/packages/rplos/rplos.pdf)
* [rOpenSci](http://ropensci.org/packages/index.html)
* [Facebook](https://developers.facebook.com/) and [RFacebook](http://cran.r-project.org/web/packages/Rfacebook/)
* [Google maps](https://developers.google.com/maps/) and [RGoogleMaps](http://cran.r-project.org/web/packages/RgoogleMaps/index.html)
* [CKAN](http://docs.ckan.org/en/latest/user-guide.html) is a tool for making open data websites
# Books
* [OpenIntro textbook](http://www.openintro.org/stat/textbook.php)
* [Elements of statistical learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/)
* [Boosting](http://webee.technion.ac.il/people/rmeir/BoostingTutorial.pdf)
* [Advanced Data Analysis From An Elementary Point of View](http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf)
* [Mathematical Statistics and Data Analysis](http://ihmsi.org/upload/John%20A.%20Rice%20-%20Mathematical%20Statistics%20and%20Data%20Analysis%20-%202nd%20Edition.pdf)
* [Introduction to Statistical Learning](http://www-bcf.usc.edu/%7Egareth/ISL/)
* [Model based clustering](http://www.stat.washington.edu/raftery/Research/PDF/fraley2002.pdf)
# Discussion/help
* [R mailing list](http://www.r-project.org/mail.html)
* [Stackoverflow R](http://stackoverflow.com/questions/tagged/r)
* [DataScience](http://datascience.stackexchange.com)
* [CrossValidated](http://stats.stackexchange.com)
* [MetaOptimize](http://metaoptimize.com/qa/)
# Online profile
* [GitHub](http://github.com)
* [RPubs](https://rpubs.com)
* [figshare](http://figshare.com/account/my_data)
* [Google Scholar](http://scholar.google.com)
* [PLOSONE](http://www.plosone.org/), peer-review
* [ORCID](http://orcid.org/), sso for academia
* [HackerRank](https://www.hackerrank.com)
* [KhanAcademy](https://www.khanacademy.org)