๐๐ฒ๐ฝ๐น๐ผ๐๐ถ๐ป๐ด ๐๐ถ๐ผ๐ถ๐ป๐ณ๐ผ๐ฟ๐บ๐ฎ๐๐ถ๐ฐ๐ ๐ง๐ผ๐ผ๐น๐ ๐ผ๐ป ๐ฎ๐ป ๐๐๐ผ๐น๐ฎ๐๐ฒ๐ฑ ๐ฆ๐ฒ๐ฟ๐๐ฒ๐ฟ
Deploying bioinformatics tools on an RTX 3090 server with no internet access presents unique challenges. I recently completed a deployment report for Google AlphaMissense and VEP in a restricted network environment.
The biggest hurdle is the firewall. Most tools rely on FTP or MySQL to download databases. My tests show that while HTTPS works, FTP and MySQL ports are often blocked.
Key Technical Findings:
- HTTPS is your best friend. You can download VEP cache and other data via HTTPS even when FTP is blocked.
- VEP requires local data. You cannot run VEP with only a FASTA file. You must use a local cache or a database.
- GPU limitations. Most tools like VEP are CPU intensive. They do not use the GPU. Only specific tools like Parabricks provide GPU acceleration.
- Data volume is huge. A complete offline setup requires about 90GB of data.
The Workflow for Success:
- Download everything on a machine with internet access. This includes VEP cache, dbNSFP, and Exomiser data.
- Transfer the data to your isolated server using SCP or physical media.
- Use Docker to run your tools with local volume mounts to access the pre-downloaded data.
Essential Data Checklist:
- VEP cache (~20GB)
- dbNSFP (~30GB)
- Exomiser data (~20GB)
- Annovar human database (~15GB)
- Reference genome (~3GB)
If you face heavy restrictions, consider lighter alternatives like SnpEff or using bcftools to merge pre-annotated VCF files.
Source: https://dev.to/jh5_pulse/google-alphamissense-yu-ce-jie-gou-bian-yi-54gp
Optional learning community: https://t.me/GyaanSetuAi