Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support to pass filetype and rt unit #68

Merged
merged 17 commits into from
Sep 3, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]
### Added
- __main__.py + cli/LoadDataAction.py: Added required passing of filetype and rt unit. [#64](https://github.com/RECETOX/RIAssigner/issues/64) [#67](https://github.com/RECETOX/RIAssigner/issues/67) [#68](https://github.com/RECETOX/RIAssigner/pull/68)
### Changed
- utils.py: `get_extension` function now returns extension without `.` [#68](https://github.com/RECETOX/RIAssigner/pull/68)
- data/Data.py: Added `filetype` to constructor and made `rt_unit` non-optional. [#67](https://github.com/RECETOX/RIAssigner/issues/67) [#68](https://github.com/RECETOX/RIAssigner/pull/68)
- data/MatchMSData.py: Added `filetype` to constructor and made `rt_unit` non-optional. [#67](https://github.com/RECETOX/RIAssigner/issues/67) [#68](https://github.com/RECETOX/RIAssigner/pull/68)
- data/PandasData.py: Added `filetype` to constructor and made `rt_unit` non-optional. [#67](https://github.com/RECETOX/RIAssigner/issues/67) [#68](https://github.com/RECETOX/RIAssigner/pull/68)
### Removed

## [0.2.0] - 2021-08-18
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
[![Python package](https://github.com/RECETOX/RIAssigner/actions/workflows/python-package.yml/badge.svg)](https://github.com/RECETOX/RIAssigner/actions/workflows/python-package.yml)
[![Python Package using Conda](https://github.com/RECETOX/RIAssigner/actions/workflows/python-package-conda.yml/badge.svg?branch=main)](https://github.com/RECETOX/RIAssigner/actions/workflows/python-package-conda.yml)
[![Anaconda Build](https://github.com/RECETOX/RIAssigner/actions/workflows/anaconda.yml/badge.svg?branch=main)](https://github.com/RECETOX/RIAssigner/actions/workflows/anaconda.yml)
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=hechth_RIAssigner&metric=alert_status)](https://sonarcloud.io/dashboard?id=hechth_RIAssigner)
## Overview
RIAssigner is a python tool for retention index (RI) computation for GC-MS data developed at [RECETOX](https://www.recetox.muni.cz/en).

Expand Down
16 changes: 11 additions & 5 deletions RIAssigner/__main__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import sys
from typing import Tuple
from RIAssigner.compute.ComputationMethod import ComputationMethod
import argparse

Expand All @@ -14,12 +16,16 @@ def create_parser():
required=True,
type=str,
action=LoadDataAction,
help="Reference dataset containing retention times and indices. Path to CSV or MSP.")
nargs=3,
help="""Reference dataset containing retention times and indices.
Path to CSV or MSP, filetype and retention time unit.""")
required.add_argument("--query",
required=True,
type=str,
action=LoadDataAction,
help="Query dataset for which to compute retention indices. Path to CSV or MSP.")
nargs=3,
help="""Query dataset for which to compute retention indices.
Path to CSV or MSP, filetype and retention time unit.""")
required.add_argument("--method",
required=True,
type=str,
Expand All @@ -33,14 +39,14 @@ def create_parser():
return parser


def main():
def main(argv):
"""Command line interface for the RIAssigner library.

Args:
argv (List[string]): Arguments passed to the program
"""
parser = create_parser()
args = parser.parse_args()
args = parser.parse_args(argv)

query: Data = args.query
reference: Data = args.reference
Expand All @@ -54,4 +60,4 @@ def main():


if __name__ == "__main__":
main()
main(sys.argv[1:])
13 changes: 7 additions & 6 deletions RIAssigner/cli/LoadDataAction.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import argparse

from RIAssigner.utils import get_extension
from RIAssigner.data import MatchMSData, PandasData


Expand All @@ -9,10 +8,12 @@ def __init__(self, option_strings, dest, **kwargs):
super().__init__(option_strings, dest, **kwargs)

def __call__(self, parser, namespace, values, option_string=None):
filetype = get_extension(values)
if filetype == '.msp':
data = MatchMSData(values)
elif filetype in ['.csv', '.tsv']:
data = PandasData(values)
filename = values[0]
filetype = values[1]
rt_unit = values[2]
if filetype == 'msp':
data = MatchMSData(filename, filetype, rt_unit)
elif filetype in ['csv', 'tsv']:
data = PandasData(filename, filetype, rt_unit)

setattr(namespace, self.dest, data)
3 changes: 2 additions & 1 deletion RIAssigner/data/Data.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@ class Data(ABC):
def is_valid(rt: RetentionTimeType) -> bool:
return rt is not None and rt >= 0.0

def __init__(self, filename: str, rt_unit: str = 'seconds'):
def __init__(self, filename: str, filetype: str, rt_unit: str):
self._filename = filename
self._filetype = filetype
self._rt_unit = rt_unit
self._unit = Data.Unit(self._rt_unit)
self.read()
Expand Down
6 changes: 3 additions & 3 deletions RIAssigner/data/MatchMSData.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ class MatchMSData(Data):
def read(self):
"""Load data into object and initialize properties.
"""
self._read_spectra(self._filename)
self._read_spectra(self._filename, self._filetype)

self._sort_spectra_by_rt()

Expand All @@ -29,7 +29,7 @@ def write(self, filename: str):
"""
save_as_msp(self._spectra, filename)

def _read_spectra(self, filename: str):
def _read_spectra(self, filename: str, filetype: str):
"""Read spectra from 'msp' file into data.

Args:
Expand All @@ -38,7 +38,7 @@ def _read_spectra(self, filename: str):
Raises:
NotImplementedError: For filetypes other tahn 'msp'.
"""
if filename.endswith('.msp'):
if filetype == 'msp':
self._spectra = list(load_from_msp(filename))
else:
raise NotImplementedError("Currently only supports 'msp'.")
Expand Down
2 changes: 1 addition & 1 deletion RIAssigner/data/PandasData.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def read(self):

def _read_into_dataframe(self):
""" Read the data from file into dataframe. """
if(self._filename.endswith('.csv') or self._filename.endswith('.tsv')):
if(self._filetype in ['csv', 'tsv']):
separator = define_separator(self._filename)
self._data = read_csv(self._filename, sep=separator)
else:
Expand Down
2 changes: 1 addition & 1 deletion RIAssigner/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ def get_extension(filename: str):
Returns:
str: Filename extension.
"""
return splitext(filename)[1]
return splitext(filename)[1][1:]
7 changes: 6 additions & 1 deletion tests/builders/MatchMSDataBuilder.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ class MatchMSDataBuilder:
def __init__(self):
self.filename = None
self._rt_unit = 'seconds'
self._filetype = "msp"

def with_filename(self, filename: str):
self.filename = filename
Expand All @@ -15,5 +16,9 @@ def with_rt_unit(self, rt_unit: str):
self._rt_unit = rt_unit
return self

def with_filetype(self, filetype: str):
self._filetype = filetype
return self

def build(self) -> MatchMSData:
return MatchMSData(self.filename, self._rt_unit)
return MatchMSData(self.filename, self._filetype, self._rt_unit)
7 changes: 6 additions & 1 deletion tests/builders/PandasDataBuilder.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ class PandasDataBuilder:
def __init__(self):
self._filename = None
self._rt_unit = 'seconds'
self._filetype = 'csv'

def with_filename(self, filename: str):
self._filename = filename
Expand All @@ -15,6 +16,10 @@ def with_rt_unit(self, rt_unit: str):
self._rt_unit = rt_unit
return self

def with_filetype(self, filetype: str):
self._filetype = filetype
return self

def build(self) -> PandasData:
data = PandasData(self._filename, self._rt_unit)
data = PandasData(self._filename, self._filetype, self._rt_unit)
return data
21 changes: 21 additions & 0 deletions tests/builders/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import logging
from typing import Optional, Union

from .MatchMSDataBuilder import MatchMSDataBuilder
from .PandasDataBuilder import PandasDataBuilder

logging.getLogger(__name__).addHandler(logging.NullHandler())


def get_builder(filetype) -> Optional[Union[PandasDataBuilder, MatchMSDataBuilder]]:
if (filetype in ['csv', 'tsv']):
return PandasDataBuilder().with_filetype(filetype)
if (filetype in ['msp']):
return MatchMSDataBuilder().with_filetype(filetype)
return None


__all__ = [
"MatchMSDataBuilder",
"PandasDataBuilder",
]
161 changes: 161 additions & 0 deletions tests/data/dat/Alkanes_20210325.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
NAME: Undecane
SCANNUMBER: -1
RETENTIONTIME: 2.08
RETENTIONINDEX: 1100
FORMULA: C11H24
Num Peaks: 2
57 1
56 0.2

Name: DODECANE
Synon: $:00in-source
DB#: JP005756
InChIKey: SNRUBQQJIBEYMU-UHFFFAOYSA-N
Spectrum_type: MS1
Instrument_type: EI-B
Instrument: HITACHI M-80
Ion_mode: P
Formula: C12H26
MW: 170
ExactMass: 170.203450832
RETENTIONINDEX: 1200
RETENTIONTIME: 2.43
Comments: "SMILES=CCCCCCCCCCCC" "cas=112-40-3" "InChI=InChI=1S/C12H26/c1-3-5-7-9-11-12-10-8-6-4-2/h3-12H2,1-2H3" "computed SMILES=CCCCCCCCCCCC" "accession=JP005756" "date=2016.01.19 (Created 2008.10.21, modified 2011.05.06)" "author=MASS SPECTROSCOPY SOC. OF JAPAN (MSSJ)" "license=CC BY-NC-SA" "exact mass=170.20345" "ionization energy=70 eV" "ion type=[M]+*" "SPLASH=splash10-052f-9000000000-e297100d4245d91a3893" "submitter=University of Tokyo Team (Faculty of Engineering, University of Tokyo)" "MoNA Rating=3.75"
Num Peaks: 36
39 8.15
40 1.49
41 41.51
42 11.33
43 98.66
44 3.45
53 2.13
54 2.05
55 17.07
56 14.21
57 99.99
58 4.93
67 1.24
68 1.11
69 6.35
70 11.79
71 52.2
72 2.82
82 0.73
83 2.35
84 7.03
85 29.3
86 0.94
97 0.92
98 5.46
99 5.66
100 0.4
112 3.05
113 4.38
126 1.38
127 3.05
128 0.77
140 0.71
141 1.58
170 5.13
171 0.65

Name: TRIDECANE
Synon: $:00in-source
DB#: JP005755
InChIKey: IIYFAKIEWZDVMP-UHFFFAOYSA-N
Spectrum_type: MS1
Instrument_type: EI-B
Instrument: HITACHI M-80
Ion_mode: P
Formula: C13H28
MW: 184
ExactMass: 184.219100896
RETENTIONTIME: 2.75
RETENTIONINDEX: 1300
Comments: "SMILES=CCCCCCCCCCCCC" "cas=629-50-5" "InChI=InChI=1S/C13H28/c1-3-5-7-9-11-13-12-10-8-6-4-2/h3-13H2,1-2H3" "computed SMILES=CCCCCCCCCCCCC" "accession=JP005755" "date=2016.01.19 (Created 2008.10.21, modified 2011.05.06)" "author=MASS SPECTROSCOPY SOC. OF JAPAN (MSSJ)" "license=CC BY-NC-SA" "exact mass=184.2191" "ionization energy=70 eV" "ion type=[M]+*" "SPLASH=splash10-0596-9000000000-562483fea4bbcff9f5aa" "submitter=University of Tokyo Team (Faculty of Engineering, University of Tokyo)" "MoNA Rating=3.75"
Num Peaks: 36
39 4.82
40 1.46
41 40.61
42 9.32
43 99.99
44 3.74
53 2.05
54 2.26
55 19.09
56 10.29
57 96.92
58 4.17
67 1.52
68 1.32
69 8.09
70 12.2
71 58.38
72 3.39
82 0.97
83 3.05
84 7.07
85 33.34
97 1.48
98 4.91
99 6.54
111 0.33
112 3.23
113 4.77
126 1.94
127 3.63
140 0.91
141 2.16
154 0.58
155 1.08
184 4.66
185 0.73

Name: Tetradecane
Synon: tetradecane
Synon: $:00in-source
DB#: HMDB0059907_c_ms_1094
InChIKey: BGHCVCJVXZWKCC-UHFFFAOYSA-N
Instrument_type: GC-MS
Retentionindex: 1393.41
RETENTIONTIME: 3.08
Formula: C14H30
MW: 198
ExactMass: 198.23475095999999
Comments: "SMILES=CCCCCCCCCCCCCC" "cas number=" "InChI=InChI=1S/C14H30/c1-3-5-7-9-11-13-14-12-10-8-6-4-2/h3-14H2,1-2H3" "computed SMILES=CCCCCCCCCCCCCC" "column=5%-phenyl-95%-dimethylpolysiloxane capillary column" "retention index type=based on 9 n-alkanes (C10–C36)" "chromatography type=GC" "SPLASH=splash10-00di-9100000000-8774faafa64b5f45d039" "submitter=David Wishart (University of Alberta)" "MoNA Rating=3.333333333333333"
Num Peaks: 35
70.0 0.208
71.0 1.0
72.0 0.053
77.0 0.003
79.0 0.005
81.0 0.005
82.0 0.03
83.0 0.078
84.0 0.105
85.0 0.543
86.0 0.033
96.0 0.01
97.0 0.045
98.0 0.078
99.0 0.141
100.0 0.008
110.0 0.002
111.0 0.013
112.0 0.055
113.0 0.063
114.0 0.005
124.0 0.002
125.0 0.003
126.0 0.033
127.0 0.04
128.0 0.002
140.0 0.02
141.0 0.028
142.0 0.002
154.0 0.007
155.0 0.015
168.0 0.002
169.0 0.007
198.0 0.028
199.0 0.003
Loading