How to deploy on my own data?
Requirement: a Genoscapist instance up and running
Follow the Genoscapist Git deployement or use the Genoscapist Virtual Machine provided prior to this step.
Database creation
- Create a database (for example: seb_demo) CREATE DATABASE seb_demo;
- Create users (for example: read_user and admin_user) CREATE USER read_user WITH PASSWORD 'sebuser2020'; CREATE USER admin_user WITH PASSWORD 'sebadmin2020';
- Create schema and tables with the create_database.sql file available in the scripts directory
$ psql -f create_database.sql
Data integration into the database
Here, it is an example on a small Bacillus subtilis dataset. This dataset contains:
- a part of Bacillus subtilis genome (only the first 150,000 base pairs),
- 145 genes (feature type CDS),
- 269 reannotation samples and 6 rho-mutant samples.
Integration scripts are available in the scripts directory and the dataset files in scripts/data directory.
- Create a Conda environment with the environment.yml
$ conda env create -f environment.yml - Activate the Conda environment
$ conda activate genoscapist-database - Integrate GenBank annotation
$ python add_annotation_gb.py - Integrate samples and expression profiles
$ python add_samples.py
To integrate another data, all annotation features are stored in the features table. Mandatory information are:
- sequence_id, a reference sequence
- type, a feature type like CDS, regulator, promoter, terminator
- complement, a feature strand
- start, a feature start position on the reference sequence
- stop, a feature end position on the reference sequence
- id_feat, a feature name
Sample data are stored in the exp_seq table and are defined by group in the exp_group table. A sample group id defined by a name and a description. Sample data are defined by:
- sequence_id, a reference sequence
- experience, a sample name
- project, a project name (Reannotation or Rho for example)
- normalization, a normalization type (CustomCDS or Median for example)
- strand, a sample strand
- color, a sample color
- info, 1 if sample displayed by default or 0
Track order configuration
In the table features_manager, you have to define the order of appearance of track.
To insert data, you can use pgAdmin tool or the INSERT SQL command:
INSERT INTO bacteries.features_manager (type, position) VALUES ('[feature_type]', [position]);
For the seb_demo database, we choose to view the CDS track then the profiles track.
seb_demo=# select * from bacteries.features_manager;
id | type | position
----+----------+----------
1 | CDS | 1
2 | profiles | 2
(2 rows)
Configuration file
Before deploying the application, it is necessary to modify the configuration file config.py.
Details about parameters:
- DEPLOY, deployment environment (local, dev, prod or demo)
- SPECIE, specie accession number (AL009126 for Bacillus subtilis or CP000253 for Staphylococcus aureus for example)
- DB_USER, DB_PWD, DB_NAME, DB_HOST and DB_PORT, respectively the database user, the password, the database name, the host server and port
- NAME, the name of the deployment (for example: B. subtilis Expression Data Browser or S. aureus Expression Data Browser)
- MIN_CUSTOM, MAX_CUSTOM, MIN_MEDIAN and MAX_MEDIAN, minimum and maximum values of the expression signal for both normalization