-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
57 lines (38 loc) · 1.85 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
output:
md_document:
variant: markdown_github
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(collapse = FALSE,
comment = '#',
cache = TRUE)
```
# abbagadabba
R package *abbagadabba* (**a**mplicon-**b**ased **b**iodiversity **a**ssessment, **g**ap **a**nalysis, **da**ta**b**ase **b**uilding **a**nd more) provides functions to download DNA sequence data from NCBI based on species names.
## Installation
*abbagadabba* is in active development and only exists on GitHub. It can be installed using *devtools* (which is a stable package available on CRAN).
```{r install, eval=FALSE}
library(devtools)
install_github("Maine-eDNA/abbagadabba")
```
## Example usage
Suppose we have a list of species for which we want to retrieve sequence data. This list might include "bad names" (e.g. synonyms of misspellings). We want to first correct as many of those bad names as we can. Simultaneously, we'll compile all information about the taxonomic hierarchy of those species
```{r cleanNames, message=FALSE}
library(abbagadabba)
cleanNames <- getNCBITaxonomy(c('Idiomyia sproati', 'Drosophil murphy', 'no body'))
cleanNames
```
We were able to fix `Drosophil murphy`, making it `Drosophila murphyi`, and `Idiomyia sproati`, making it `Drosophila sproati`, but we couldn't match `no body`.
Now we need to get all sequence identifiers associated with those species:
```{r seqIDs, dependson='cleanNames'}
goodNames <- cleanNames$ncbi_name[!is.na(cleanNames$ncbi_name)]
seqIDs <- getNCBISeqID(goodNames)
head(seqIDs)
```
Finally we can feed those IDs into the function to retrieve the sequences themselves. For the purpose of this example, we'll just look at a few sequences
```{r seqData, dependson='seqIDs'}
seqData <- getGenBankSeqs(seqIDs[1:2])
seqData
```