About - CABase

Overview

CABase is a panel-based interactive platform for exploring the human carbonic anhydrase (CA) gene family across genomic, transcriptomic, epigenomic, proteomic, and clinical dimensions. It integrates data from GTEx, Human Protein Atlas, TCGA, CELLxGENE, Tahoe100m, ENCODE, ChIP-Atlas, LINCS L1000, and more into a unified sidebar + tabbed interface with interactive Plotly.js charts, an IGV.js genome browser, D3 force-directed co-expression networks, and an AI-powered research assistant grounded in 11K+ carbonic anhydrase publications.

The platform covers 15 human carbonic anhydrase genes organized by subcellular localization: cytosolic (CA1, CA2, CA3, CA7, CA13), mitochondrial (CA5A, CA5B), membrane-associated (CA4 GPI-anchored, CA9/CA12/CA14 transmembrane), secreted (CA6), and CA-related proteins lacking catalytic activity (CA8, CA10, CA11). Carbonic anhydrases catalyze the reversible hydration of CO2 to bicarbonate and are critical for pH regulation, electrolyte secretion, and biosynthetic reactions. Key disease associations include tumor hypoxia (CA9, CA12), renal tubular acidosis (CA2), glaucoma (CA2, CA4), and they are important drug targets for sulfonamide inhibitors.

Data Sections

Gene Summary tab: Gene Summary

Gene Identity Card (full name, genomic coordinates, aliases, NCBI summary, cross-reference IDs: Entrez, HGNC, Ensembl, UniProt, RefSeq, Pfam, PDB) plus AI-generated summaries across 5 dimensions: Genomic, Expression, Pathway, Functional, Clinical.

Good for: Quick overview of any gene's role, function, disease associations, expression patterns, and database identifiers.

Genomic Context tab: Genomic

IGV.js genome browser (hg38) with an expandable track catalog:

Reference & Variants: RefSeq gene models, ClinVar VCF variants, PhyloP 100-way conservation
Regulatory Elements: ENCODE cCREs (color-coded promoters, enhancers, CTCF-bound, DNase-only)
Histone Marks: H3K27ac, H3K4me3, H3K4me1 — 7 ENCODE cell types
DNA Methylation: Human Methylation Atlas WGBS — 39 merged cell types
Transcription: ENCODE RNA-seq signal — 7 cell types
ChIP-Atlas TF Binding: 1,846 chromatin-associated factors, individual or merged view

Good for: Gene structure, exon/intron layout, clinical variants, regulatory landscape, TF binding, epigenomics, conservation.

RNA-Seq tab: RNA-Seq

Plotly.js charts across 6 data sources, with a Normal Tissue / Cancer & Disease toggle:

GTEx V10 · 53 subtissues · in-house build Bar, Radar (top 20, organ systems, brain, GI). Expression computed directly from GTEx V10 gene TPM parquet (per-subtissue median TPM → nTPM, ≥20 samples per subtissue)
HPA 43 tissues Bar, Radar
FANTOM5 CAGE Bar, Radar
Cross-Database: Parallel coordinates, Slope chart
DepMap cancer cell lines Waterfall, Box, Violin, Median by lineage
TCGA 33 cancers Diverging bar, Box plot with scatter, KM survival curves (426 curves across 15 genes × 33 cancers, median expression split, UCSC Xena log2 TPM)
Multi-Gene: Searchable multi-select (2-5 genes), database selector (GTEx/HPA/FANTOM5), grouped bar + heatmap
PRECOG Survival Analysis PRECOG v2 Three sub-sections: Adult (51 cancers, ~28K patients, 710 records), ICI immunotherapy (20 cancers, ~4K patients, 685 records), Pediatric (12 cancers, ~3K patients, 172 records). Each with waterfall chart + filterable table side by side. ICI table includes cancer, ICI target (anti-PD-1/PD-L1/CTLA-4), stage, treatment, patients, outcome, study. Small cohorts (<20) flagged with warning icon. Positive z-score = unfavorable prognosis; significant if |z| > 3.09

Good for: Tissue expression patterns, cancer expression, cell line data, survival analysis, immunotherapy survival, cross-database comparison, multi-gene analysis.

Proteomics tab: Proteomics

Protein-level data across 5 sources with interactive 3D structure viewer:

HPA Protein (IHC): 46 tissues · 110 cell types Tissue × cell type heatmap, tissue distribution bar chart. Subcellular localization (IF-based) shown inline
CPTAC Mass-Spec: 11 cancers Tumor vs normal protein fold change (logFC + adjusted p-values)
HPA Cancer IHC: 20 cancers Stacked bar of patient counts per staining level (High/Medium/Low/Not detected)
ProteomicsDB: 67 tissues Mass-spec protein abundance (normalized intensity) across human tissues
Post-Translational Modifications (iPTMnet): 103 PTM sites Phosphorylation, ubiquitination, acetylation, methylation sites with kinase-substrate relationships. Lollipop scatter plot + detail table
3D Structure Viewer (PDBe Mol*): Interactive PDB and AlphaFold structures with PTM site overlays (phosphorylation, ubiquitination). Structure descriptions from RCSB PDB / AlphaFold APIs

Good for: Protein expression across tissues and cancers, post-translational modifications, kinase-substrate relationships, 3D structure visualization with PTM overlays.

scRNA-Seq tab: scRNA-Seq

CELLxGENE 64 tissues · 867 cell types · Census 2026-03-26 Dot plots (15 CA genes, 72,105 aggregated records), tissue selector, PNG/SVG/CSV export
Tahoe100m 2.3M DMSO cells · 50 cell lines · 28 cancers Bubble plots + detailed table, gene/cancer filters. Pseudobulked from the DMSO-control subset of the Tahoe-100M atlas (~77M cells total across 14 plates)

Good for: Cell-type-specific expression, which cell types express a gene, tumor microenvironment expression.

Correlation Analysis tab: Correlation

Co-expression Network: D3 force-directed two-hop ego networks from GTEx Spearman correlations (p-values + BH FDR). 53 subtissues + ALL_SAMPLES. 553 network files (per gene × per tissue)
Correlated Genes: Positive/negative tables with CSV/TSV/XLSX export
Pathway Enrichment: MSigDB enrichment with hyperlinks and filters

Good for: Co-expression partners, tissue-specific networks, pathway enrichment, functional associations.

Perturbations tab: Perturbations

Two complementary perturbation datasets accessible via sub-tab toggle, identifying drugs and genetic manipulations that significantly alter expression of carbonic anhydrase genes.

LINCS L1000 — Compound Perturbations

LINCS L1000: 720K experiments · 14 CA genes Level 5 moderated z-scores from compound perturbation experiments (Subramanian et al., Cell 2017). 732,609 filtered records (|modz| ≥ 3.0)
Waterfall Chart: Top 15 activators and repressors ranked by dominant effect size, with hatched paired bars for drugs showing both directions
Enrichment Analysis: Switchable between Mechanism of Action, Drug Target, Cell Lineage, and Primary Disease (pill toggle)
Full Table: Filterable by direction, cell lineage, disease, dose, timepoint. Compound names link to CLUE.io. Export CSV/TSV/XLSX

LINCS L1000 — Genetic Perturbations (CRISPR, shRNA, Overexpression)

View 1 — Genetic Perturbations: 11,456 records · 14 genes Which CRISPR knockouts, shRNA knockdowns, or overexpression experiments significantly alter a CA gene's expression? Identifies upstream genetic regulators
View 2 — Downstream Effects: 6,581 records What happens to the transcriptome when a CA gene is knocked out or overexpressed? Split into Knockout (CRISPR + shRNA) and Overexpression sections
Data Sources: CRISPR (142K experiments), shRNA (238K), overexpression (34K) — all Level 5 GCTX, |modz| ≥ 3.0

14 of 15 CA genes are measured in the L1000 panel. Not measured: CA13.

Tahoe-100M — Single-Cell Perturbations

Tahoe-100M: 72,505 records · 379 drugs · 50 cell lines · 15 genes · 55.6% BH-significant Pseudobulk replicate differential expression from ~77 million single cells across 14 plates (Zhang et al., bioRxiv 2025)
Full Transcriptome: All 15 CA genes measured (vs. 14 in LINCS L1000) — scRNA-seq captures complete gene expression, not a limited panel
Volcano Plot: Effect score vs −log10(BH p-value) with significance + effect-size thresholds
Methodology: Cells split into pseudobulk replicates (25 cells for groups of 50–199 cells, 50 cells for groups of 200+). Wilcoxon rank-sum test (treatment vs DMSO replicates) with Benjamini-Hochberg FDR correction. Effect score = log2FC z-scored within cell line and dose. Three dose levels (0.05, 0.5, 5.0 μM)
Waterfall + Enrichment + Table: Same visualization pattern as LINCS (including enrichment pill toggle and paired waterfall bars), with additional columns for p-value, BH p-value, treatment/DMSO replicate counts, cell count, fraction expressing, approval status, and confidence tier

Good for: Drug discovery, target validation, single-cell perturbation responses, identifying compounds that modulate CA gene expression, cross-referencing bulk (LINCS) and single-cell (Tahoe) evidence.

Cross References tab: Cross Refs

Dynamic links to 30+ external databases (GeneCards, UniProt, NCBI, Ensembl, KEGG, Reactome, ClinVar, OMIM, etc.).

Good for: Finding a gene in other databases, jumping to external resources.

AI Research Assistant

RAG-powered chatbot grounded in 11,712 indexed carbonic anhydrase publications (15,883 text chunks). Accessible via the floating widget from any tab. Supports multi-turn conversation with history, markdown rendering, and adjustable temperature.

Architecture

Query Classifier: Qwen3-14B pre-classifies every query as site, hybrid, or science. Site questions (e.g. "which species?", "where can I find variant data?") skip the RAG pipeline entirely and are answered in ~1-2s from platform knowledge. Science questions proceed through the full RAG pipeline. Hybrid questions get both
Embeddings: BAAI/bge-base-en-v1.5 (768-dim), Neon PostgreSQL + pgvector HNSW
Search: Hybrid — vector similarity (70%) + BM25 (30%), 100 candidates
Reranking: Qwen3-Reranker (4B/8B) with quality threshold
LLM: Qwen3-Next-80B MoE (default) · Qwen3-235B (deep analysis)
Response Types: Concise (3K), Overview (5K), Interpretive (10K), Deep (10K)

Data Integration

The AI assistant can optionally include structured experimental data from the site alongside literature context. Selectable data sections:

Genomic: Gene structure, ClinVar variants, ChIP-Atlas TF peaks
HPA / DepMap / CELLxGENE / Tahoe: Tissue and cancer cell line expression
Pathways: MSigDB pathway enrichment (KEGG, Reactome, GO, Hallmark)
GTEx Correlations: Top co-expressed genes per tissue (Spearman r ≥ 0.5). Automatic pairwise lookup when multiple genes are queried
TCGA: Tumor expression (log₂ TPM) across 33 TCGA cancer types with tissue-aware cancer detection (e.g., "liver cancer" → LIHC)

Changelog

May 13, 2026 — AI Gene Summaries: 15 CA Genes Generated for the First Time

All 15 CA Genes Now Have AI Summaries: CA1, CA2, CA3, CA4, CA5A, CA5B, CA6, CA7, CA8, CA9, CA10, CA11, CA12, CA13, CA14 each carry seven AI-generated, citation-grounded summary dimensions: Genomic Context, Expression Patterns, Protein Biology, Pathway Analysis, Perturbation Response, Functional Role, and Clinical Relevance. ~12 minutes wall time and ~$0.08 DeepInfra spend total, generated via the v2 pipeline from CABase’s combined experimental data + RAG literature corpus.
v2 Pipeline (Engine Cherry-Pick from WntHub): Three engine commits brought the 7-dimension gene-summary generator over from the sister site: a new scripts/lib/gene-data-extractors.js (21 site-agnostic per-source extractors covering ChIP-Atlas with ±1.5 kb promoter-window peak clustering and TSS-distance reporting, ClinVar with cross-assembly dedup, PRECOG with ICI cohort regimen detail, MSigDB enrichment with ALL_SAMPLES-empty per-tissue fallback for tissue-restricted genes like CA9, etc.); a rewritten scripts/generate-gene-summaries.js with a --preview flag that writes a numbered markdown dossier of every prompt for human review before any LLM submission; and a BATCH_BYPASS_RATE_LIMIT env var on the netlify query-initiate function to enable 8-way parallel batches without self-throttling.
Frontend: CA-Native Branding Throughout: The lingering “WntHub” leaks in the engine JS — the bookmark document.title (browser bookmarks now save as CA-gene — Tab | CABase), the AI assistant panel heading (now CABase AI Assistant), the chat greetings, the LLM prompt-template persona (the LLM now reads “specializing in carbonic anhydrase biology” rather than “Wnt signaling pathways”), the AI Insights Report title / footer, and the citation strings — now read from window.SITE_CONFIG. Engine JS byte-identical between WntHub and CABase, parameterised entirely through js/site-config.js.

May 11, 2026 — Tampere Rebrand & Engine Sync from WntHub

Tampere University / Parkkila Group Branding: CABase is a Parkkila-group project (Faculty of Medicine and Health Technology, Department of Anatomy, Tampere University) — the Oulu branding was carried over by accident from the shared engine template. Replaced University of Oulu logo (and link) with the official Tampere University logo across the sidebar, about-page subtitle, and about-page footer; updated credit lines to Parkkila Group · Tampere University; updated contact email to tuni.fi.
Engine Cherry-Picks from WntHub: Pulled in several robustness improvements developed during WntHub’s gene-set expansion: hash-based cache invalidation in build_network_json.py (per-(gene, tissue) network JSONs self-invalidate when the gene set changes via embedded _geneSetHash); LINCS pipelines gained per-gene skip-if-exists with --force override; gene-list dedup across build_*.py and gtex/06_build_site_correlations.py (single source of truth in config.py); HPA pipeline fixed for renamed FANTOM5 column and retired rna_cancer.tsv.zip endpoint.
Site-Neutral Engine Identifiers: CABase’s engine code now uses GENES / GENE_SET / GENE_SET_HASH / _geneSetHash / pathway_gene / gene_adj instead of CA-prefixed names. CA_GENES kept as a one-line alias in config.py for site-specific scripts. Engine commits from WntHub now cherry-pick auto-clean instead of conflicting on parallel renames.
Site-Config-Driven Identity: config.py reads site-config.json at import and re-exports SITE_NAME, GENE_SET_LABEL, GENE_SET_FULL, DATA_SUFFIX. Engine code never needs hardcoded “CABase” / “CA” / “CAs” strings.
Numeric-Aware Gene Picker: Sidebar dropdown now uses Intl.Collator({numeric: true}), so CA9 renders before CA10 instead of after CA1 (lexicographic sort previously put CA10..CA14 before CA2). Same fix benefits any sister-site gene family with embedded numbers.
Frontend: CELLxGENE & Symlink Fix: Front-end now loads CELLxGENE_gene_expression.latest.CAs.tsv.gz rather than chasing a dated filename. The .latest file is a real copy (not a symlink) because Netlify’s CDN doesn’t follow symlinks — lesson learned during the WntHub deploy.
Stub-ified gtex/04_network_json.py: The unused per-tissue duplicate of the active scripts/build_network_json.py is now a tiny deprecation stub that prints a redirect and exits with code 2. Closes the dual-maintenance hazard that had already let the duplicate drift from its sibling.
About Page RAG Corpus Counts Corrected: The RAG section had been carrying WntHub’s 23,323 publications / 31,773 chunks — copied at fork time and never updated. CABase’s actual indexed corpus: 11,712 indexed publications, 15,883 text chunks. Tagline updated to “grounded in 11K+” (was “grounded in CA publications” without a count).
Automated Site Stats: New engine scripts (scripts/build_site_stats.py + scripts/render_site_text.py) compute every data-derived number from data/ + config.py + manual overrides and substitute them into HTML/JS via <span data-stat="key"> markers. Wired into master_rebuild_all.sh as Step 17, before the _site/ rsync. About-page stat drift caught en route: GTEx 55 → 53 subtissues; ChIP-Atlas 1,086 → 931 TFs; TCGA 1,451 → 426 curves; PRECOG 2,145/2,129/490 → 710/685/172 records; iPTMnet 1,118 → 103 sites; networks 609 → 553; LINCS V1 22,927 → 11,456; LINCS V2 13,177 → 6,581; Tahoe 145,026 → 72,505 records.
Sidebar Logo Aspect Fix: Tampere logo’s “2-line” layout has a ~1.74:1 aspect ratio (vs Oulu’s ~2.47:1). The legacy width="200" height="60" HTML attributes were forcing browsers to stretch the new logo ~1.9× horizontally. Added width: auto + object-fit: contain to the CSS so the rendered width derives from the image’s actual intrinsic ratio.

April 16, 2026 — Engine Port & Feature Sync from WntHub

SITE_CONFIG Parameterization: Engine code now reads all site-specific values (brand name, gene-set labels, data-file suffix, function URLs, RAG metadata) from js/site-config.js (frontend) and site-config.json + site-knowledge.txt (backend). Engine JS files are now byte-identical with WntHub — future fixes cherry-pick between repos.
RNA-Seq Normal/Cancer Toggle: New sub-tab toggle on the RNA-Seq tab. Normal Tissue view (GTEx, HPA, FANTOM5, Cross-Database) and Cancer & Disease view (DepMap, TCGA+Survival, PRECOG). Cancer charts lazy-render on first switch. Multi-Gene Comparison always visible.
Perturbation Enrichment Pills: Enrichment chart now switchable between MOA, Drug Target, Cell Lineage, and Primary Disease (both LINCS and Tahoe sub-tabs).
Tahoe Volcano Plot: New panel on Tahoe sub-tab: effect score vs −log10(BH p-value) with significance + effect-size thresholds.
Waterfall Paired Bars: Drugs with experiments in both directions now show a dominant solid bar + a hatched secondary bar in the same row, placed in the dominant-direction section.
Radar Hover Fix: Top Tissues radar plots now show tissue name + expression value on hover (was showing "trace 0").
Netlify Functions Renamed: ca-query-* → query-* (shared filenames across sites). SITE_CONFIG.functionPrefix drives URL routing.
About Page Counts Corrected: ChIP-Atlas 1,086 → 1,846; GTEx 53 → 55 subtissues; CELLxGENE 38/187 → 64/867; Tahoe 116M → 2.3M DMSO; networks 2,088 → 609; L1000 coverage + Tahoe record counts updated to CA-specific values.

March 24, 2026 — Tahoe-100M Perturbations & iPTMnet Improvements

Tahoe-100M Perturbations: New sub-tab on the Perturbations page integrating single-cell RNA-seq drug response data from the Tahoe-100M dataset (~77M cells, 379 drugs, 50 cell lines, 14 plates). Pseudobulk replicate approach: cells split into 25- or 50-cell replicates, Wilcoxon rank-sum test vs matched DMSO replicates, BH FDR correction. 217,915 records across 32 genes (84.3% BH-significant). Waterfall chart, MOA enrichment, and filterable table with p-values, replicate counts, and confidence tiers
Info Tooltips: Added contextual info icons across all 10 site-wide data tables explaining statistical metrics, column meanings, and data sources. Floating tooltip design escapes all CSS stacking contexts
Tab Persistence Fix: Fixed Plotly charts going blank when navigating away from a tab and returning. Charts now re-render from cached data on tab switch
iPTMnet Plot: Y-axis now uses iPTMnet score (instead of known enzymes), circle size represents number of enzymes. Hover tooltip includes enzyme list and publication count
iPTMnet Table: Rebuilt with TableViewer for sorting, filtering, and export. Added Score and Position columns, enzyme names link to UniProt, dropdown filters on Type/Score/Known/Evidence
Mol* Viewer Fix: Fixed issue where switching from PTM overlays to structure property themes would fail. Viewer now reloads cleanly when transitioning between overlay categories

March 22, 2026 — Perturbations Tab & Proteomics Enhancements

Perturbations Tab: New tab integrating LINCS L1000 compound perturbation data (720K experiments, 33K compounds, 230 cell lines). Waterfall chart of top activators/repressors, MOA enrichment analysis, and full filterable/exportable table with CLUE.io compound links. 39 of 46 genes measured
Genetic Perturbations: LINCS L1000 CRISPR (142K), shRNA (238K), and overexpression (34K) data in two views: (1) genetic perturbations affecting each CA gene (45,868 records, 39 genes), (2) downstream effects split into Knockout (CRISPR + shRNA) and Overexpression sections (59,533 records, 37 genes). Waterfall charts + filterable tables for all sections
PRECOG Survival Analysis: PRECOG v2 survival z-scores added to RNA-Seq tab. Three databases: Adult (46/46 genes, 51 cancers, ~28K patients), Pediatric (44/46 genes, 12 cancers, ~3K patients), ICI immunotherapy (46/46 genes, 20 cancers, ~4K patients). Waterfall charts + filterable tables. ICI table enriched with ICI target, tumor stage, treatment status, cohort size, outcome type, and study source. Pipeline: scripts/pipelines/precog/01_extract_survival_zscores.py
LINCS Pipelines: Compound pipeline (scripts/pipelines/lincs/01_extract_perturbations.py) extracts per-gene perturbation data at |modz| ≥ 3.0 from Level 5 GCTX files (401K perturbations, 12,735 compounds, 702 MOA classes). Genetic pipeline (scripts/pipelines/lincs/02_extract_genetic_perturbations.py) extracts CRISPR/shRNA/overexpression data with the same threshold
PTM Table: PMID counts replaced with clickable PubMed links (collapsible when >3). Table card scrollable with sticky header
Structure Viewer: Overlay selection persists across gene changes. New built-in Mol* overlays: secondary structure, hydrophobicity, residue type, sequence position, B-factor/pLDDT. Fullscreen now fills the viewport
Dev Workflow: Added [dev] publish = "." to netlify.toml — local dev serves from project root, no rsync needed. Production deploys via netlify deploy --prod (build is automatic)

March 21, 2026 — Proteomics Tab & AI Query Classifier

Proteomics Tab: New dedicated tab with protein-level data from 5 sources — HPA IHC (normal tissue + cancer), CPTAC mass-spec proteomics (11 cancers), ProteomicsDB (67 tissues), iPTMnet PTM sites (1,118 sites with kinase-substrate relationships)
3D Structure Viewer: PDBe Mol* integration — interactive PDB and AlphaFold structures with PTM overlays (phosphorylation, ubiquitination). Structure descriptions fetched from RCSB PDB and AlphaFold APIs. 26 genes with PDB structures, all genes with AlphaFold
AI Query Classifier: Qwen3-14B pre-classifies queries as site/hybrid/science. Site questions ("which species?", "where is variant data?") skip RAG entirely and answer in ~1-2s. Science questions proceed through full RAG pipeline with zero overhead
Data Pipelines: New pipelines for HPA protein IHC, CPTAC, ProteomicsDB (OData API), and iPTMnet (REST API). All follow existing retrieve-filter-compress pattern
Gene Identity Card: New card at top of Gene Summary tab with full name, genomic coordinates, aliases, NCBI summary, and cross-reference IDs (Entrez, HGNC, Ensembl, UniProt, RefSeq, Pfam, PDB) with clickable links
Tab Rename & Reorder: Expression → RNA-Seq, Single Cell → scRNA-Seq. New order: Gene Summary → Genomic → RNA-Seq → Proteomics → scRNA-Seq → Correlation → Cross Refs
Sidebar Fixes: Ensembl ID parsing improvements (nested JSON fallback). Ensembl ID text removed from sidebar info card. CABase logo links to Overview tab. University of Oulu logo enlarged

March 20, 2026 — AI Data Integration & TCGA Restructure

AI — GTEx Correlations: New data section sends top co-expressed genes to AI. Automatic pairwise lookup when multiple genes queried. Tissue-aware (uses specific tissue or ALL_SAMPLES)
AI — TCGA Expression: New data section sends tumor expression stats (median, IQR, n) across 33 TCGA cancer types. Smart cancer detection from natural language ("liver cancer" → LIHC)
AI — Conversation: Multi-turn chat with history. Gene detection from conversation context. Temperature slider. Markdown rendering. Inline PMID citations. New Chat button
AI — Model: Upgraded to Qwen3-Next-80B MoE (fast inference with reliable instruction following)
Data: TCGA data reorganized under data/TCGA/ (survival + expression). All paths and pipelines updated
UI: AI tab removed (floating widget only). Hero page updated. About page restyled. Chatbot colors matched to site palette

March 20, 2026 — TCGA Survival, Pipeline & Network Upgrades

TCGA Survival: KM curves from UCSC Xena (33 cancers, 1,451 curves). Box plot with scatter overlay. All on same TCGA row
Co-expression Networks: Proper p-values + BH FDR. 55 subtissues. 2,088 network files
Data Pipeline: 3-stage architecture (Retrieve/Analyze/Format). Vectorized extraction 10x faster. Site 524MB to 404MB
Multi-Gene: Database selector (GTEx/HPA/FANTOM5). Auto-populated. Tall charts
UI: AI tab removed (floating widget only). Compact export buttons. Model display fixed

March 19, 2026 — Phase 7: Tracks, Networks & Polish

IGV.js Tracks: ENCODE cCREs, DNA Methylation Atlas (39 cell types), RNA-seq signal. Info tooltips. Auto-reload
Networks: All-vs-all Spearman across 50+ GTEx tissues. D3 force-directed two-hop ego networks
Skeleton Loading: Shimmer animations across all tabs
UI: Unified sidebar gene selector. Side-by-side network + tables. Fullscreen fix for SVG

March 18, 2026 — Phases 1-6: Full Redesign

Vertical-scroll SPA replaced with sidebar + 8 tabbed views + panel grid
Plotly.js kitchen-sink expression charts, IGV.js genome browser, enhanced TableViewer
LLM optimization: Nemotron-3 Super default (~10s responses), BAAI/bge-base-en-v1.5 embeddings

September 2025 — Initial Release

CABase platform launch with genomic context, expression, correlation, gene regulation, cross-references, and AI assistant

Credits

CABase integrates data from:

GTEx (V10) · Human Protein Atlas · FANTOM5
TCGA via UCSC Xena (Pan-Cancer Atlas, Liu et al. 2018)
CELLxGENE (CZI) · Tahoe100m · DepMap
ENCODE · Human Methylation Atlas · ChIP-Atlas
CPTAC (via HPA) · ProteomicsDB · iPTMnet
LINCS L1000 (Subramanian et al., Cell 2017; compound + CRISPR/shRNA/overexpression) · Connectivity Map (Broad Institute)
PRECOG v2 (Benard et al., Nucleic Acids Research 2026; adult, pediatric, and ICI survival z-scores. CC BY-NC 4.0)
ClinVar · MSigDB · Ensembl · NCBI

Developed at Tampere University, Faculty of Medicine and Health Technology, Department of Anatomy (Parkkila Group). Contact: harlan[dot]barker[at]tuni.fi

About CABase