samcov (1.0.0a16)
Installation
pip install --index-url samcovAbout this package
A simple SAM/BAM file coverage extraction tool.
samcov
Extract per-base coverage from SAM/BAM alignment files, compute aggregate statistics, and identify low-coverage regions across multiple samples.
Features
- Per-base coverage extraction from SAM or BAM files via
samtools depth - Multi-sample aggregation — collect coverage maps from any number of alignments
- Statistical summaries — mean, median, and mode coverage per position across samples
- Low-coverage region detection — find contiguous gaps below a configurable depth threshold
- Consensus generation — produce FASTA consensus sequences with
samtools consensus - Threshold strip plots — export color-coded threshold regions as SVG strips (mean coverage vs. configurable depth thresholds)
- CSV export — sparse or dense output for downstream analysis in R, pandas, Excel, etc.
System Requirements
- Python >= 3.10
- samtools (>= 1.20) must be installed on your
PATH. The tool usessamtools depthfor coverage extraction andsamtools consensusfor FASTA generation.
From the Reslate Solutions package registry
pip install samcov --index-url https://git.reslate.solutions/api/packages/ydeng/pypi/
From source (with uv)
git clone https://git.reslate.solutions/ydeng/samcov.git
cd samcov
uv pip install -e ".[dev]"
From source (with pip)
git clone https://git.reslate.solutions/ydeng/samcov.git
cd samcov
pip install -e ".[dev]"
Quick start
# Extract coverage for a single BAM
samcov alignment.bam --csv coverage.csv
# Process multiple alignments
samcov sample1.bam sample2.bam sample3.bam --csv coverage.csv
# Also compute per-position statistics (mean / median / mode)
samcov *.bam --csv coverage.csv --centers-csv centers.csv
# Find regions with depth < 5 in ANY sample
samcov *.bam --low-coverage-csv low_cov.csv --low-coverage 5
# Find regions with depth < 5 in ALL samples (shared gaps)
samcov *.bam --shared-low-coverage-csv shared_gaps.csv --low-coverage 5
# Find regions with depth >= 50 in ALL samples (shared high-coverage)
samcov *.bam --shared-high-coverage-csv shared_high.csv --high-coverage 50
# Export shared gaps as BED for IGV / genome browsers
samcov *.bam --shared-low-coverage-bed shared_gaps.bed --low-coverage 5
# Export shared high-coverage as BED
samcov *.bam --shared-high-coverage-bed shared_high.bed --high-coverage 50
# Export per-sample low-coverage regions as BED
samcov *.bam --low-coverage-bed per_sample_gaps.bed --low-coverage 5
# Generate a threshold strip plot (one panel per reference)
samcov *.bam --plot-strips coverage_plot.svg --below 1,5,10 --above 50,100
# Generate a single-panel threshold strip plot (single reference)
samcov *.bam --plot-strip coverage_plot.svg --below 1,5,10
CLI reference
usage: samcov [-h] [--csv CSV] [--centers-csv CENTERS_CSV]
[--low-coverage-csv LOW_COVERAGE_CSV]
[--low-coverage LOW_COVERAGE] [--start-at START_AT] [--sparse]
[--verbosity VERBOSITY]
[--shared-low-coverage-csv SHARED_LOW_COVERAGE_CSV]
[--shared-low-coverage-bed SHARED_LOW_COVERAGE_BED]
[--shared-high-coverage-csv SHARED_HIGH_COVERAGE_CSV]
[--shared-high-coverage-bed SHARED_HIGH_COVERAGE_BED]
[--high-coverage HIGH_COVERAGE] [--low-coverage-bed LOW_COVERAGE_BED]
[--consensus CONSENSUS]
[--plot-strips PLOT_STRIPS] [--plot-strip PLOT_STRIP]
I [I ...]
| Flag | Description |
|---|---|
--csv |
Dense or sparse per-position coverage CSV |
--centers-csv |
Per-position mean / median / mode |
--low-coverage-csv |
Low-coverage ranges per sample |
--shared-low-coverage-csv |
Low-coverage ranges shared across all samples |
--shared-low-coverage-bed |
Shared low-coverage regions in BED6 |
--shared-high-coverage-csv |
High-coverage ranges shared across all samples |
--shared-high-coverage-bed |
Shared high-coverage regions in BED6 |
--low-coverage-bed |
Per-sample low-coverage regions in BED6 |
--low-coverage N |
Depth threshold for low coverage (default: 1) |
--high-coverage N |
Depth threshold for high coverage (default: 50) |
--start-at N |
Coordinate offset (e.g. 1 for 1-based output) |
--sparse |
Omit rows where all samples have zero coverage |
--verbosity LEVEL |
DEBUG, INFO, WARNING, ERROR |
--consensus DIR |
Generate FASTA consensus via samtools consensus |
--plot-strips PATH |
Multi-panel SVG threshold strip plot (one panel per reference) |
--plot-strip PATH |
Single-panel SVG threshold strip plot (single reference) |
--below 1,5,10 |
Threshold strips for regions with mean coverage below each value |
--above 50,100 |
Threshold strips for regions with mean coverage above each value |
Threshold strip plots
Generate compact strip-style SVGs showing where mean coverage falls below or above configurable thresholds:
# Multi-panel strip — below and above thresholds on one plot
samcov *.bam --plot-strips coverage.svg --below 1,5,10 --above 50,100
# Single-panel strip — only below-threshold regions
samcov *.bam --plot-strip coverage.svg --below 1,5,10
# Only high-coverage regions
samcov *.bam --plot-strip coverage.svg --above 50,100
- X-axis: base position (respects
--start-at) - Below-threshold tracks sit below the axis, above-threshold tracks sit above
- Each threshold renders as a horizontal colored strip
- Colored segments appear wherever mean coverage is strictly below (or above) that threshold
- Tracks are stacked with small gaps for readability
- Color-blind-friendly palette: red → orange → gold → green → blue
- Legend labels show
<1×,<5×,>50×,>100×, etc.
Output formats
Coverage CSV (--csv)
| position | sample1.bam/ref | sample2.bam/ref | … |
|---|---|---|---|
| 0 | 42 | 38 | … |
| 1 | 45 | 40 | … |
| 2 | 0 | 1 | … |
Use --sparse to omit rows where all samples have zero coverage.
Centers CSV (--centers-csv)
| position | mean | median | mode |
|---|---|---|---|
| 0 | 40.0 | 42.0 | 42 |
| 1 | 42.5 | 45.0 | 45 |
Low-coverage CSV (--low-coverage-csv)
| sample | low coverage ranges |
|---|---|
| sample1.bam/ref | [3, 4], [150, 155] |
| sample2.bam/ref | [2, 5] |
Shared low-coverage CSV (--shared-low-coverage-csv)
| start | end | length | threshold |
|---|---|---|---|
| 3 | 4 | 2 | 5 |
| 150 | 155 | 6 | 5 |
Intervals where all samples have depth below the threshold. Use this to find consensus assembly gaps or universally problematic regions.
Ranges are zero-based, inclusive by default. Use --start-at for one-based output.
Low-coverage BED (--low-coverage-bed)
Per-sample low-coverage intervals in BED6 format:
.\t3\t5\tsample1.bam/ref\t0\t+
.\t150\t156\tsample1.bam/ref\t0\t+
.\t2\t6\tsample2.bam/ref\t0\t+
Columns: chrom, start (0-based), end (exclusive), name, score, strand.
The chromosome defaults to . because samcov processes alignments agnostically.
Shared low-coverage BED (--shared-low-coverage-bed)
Shared low-coverage intervals in BED6 format:
.\t3\t5\tshared_low_coverage\t0\t+
.\t150\t156\tshared_low_coverage\t0\t+
Use --start-at to shift coordinates (e.g. for one-based reference indexing).
Shared high-coverage CSV (--shared-high-coverage-csv)
| start | end | length | threshold |
|---|---|---|---|
| 50 | 99 | 50 | 50 |
| 200 | 250 | 51 | 50 |
Intervals where all samples have depth >= the threshold. Use this to find universally well-covered regions.
Ranges are zero-based, inclusive by default. Use --start-at for one-based output.
Shared high-coverage BED (--shared-high-coverage-bed)
Shared high-coverage intervals in BED6 format:
.\t50\t100\tshared_high_coverage\t0\t+
.\t200\t251\tshared_high_coverage\t0\t+
Use --start-at to shift coordinates.
Python API
from samcov import count, metrics, export, visualize
# Load coverage from one or more BAMs
coverage_maps, max_length = count.count_all_sam_positions(["sample1.bam", "sample2.bam"])
# coverage_maps = {
# "sample1.bam/NC_000962.3": {0: 42, 1: 45, ...},
# "sample2.bam/NC_000962.3": {0: 38, 1: 40, ...},
# }
# Compute mean / median / mode per position
centers = metrics.measure_centers(coverage_maps, max_length)
# Find contiguous low-coverage regions in ANY sample (depth < 5)
low_cov = metrics.calculate_consecutive_low_coverage(coverage_maps, max_length, threshold=5)
# Find contiguous high-coverage regions in ALL samples (shared peaks)
shared_peaks = metrics.calculate_shared_high_coverage(coverage_maps, max_length, threshold=50)
# Export to CSV
export.export_coverages_as_csv(coverage_maps, max_length, "coverage.csv", sparse=False)
export.export_centers_as_csv(centers, max_length, "centers.csv", sparse=False)
export.export_low_coverage_csv(low_cov, max_length, "low_cov.csv")
export.export_shared_low_coverage_csv(shared_gaps, max_length, "shared_gaps.csv", threshold=5)
export.export_shared_high_coverage_csv(shared_peaks, max_length, "shared_peaks.csv", threshold=50)
# Export to BED
export.export_low_coverage_bed(low_cov, "low_cov.bed")
export.export_shared_low_coverage_bed(shared_gaps, "shared_gaps.bed")
export.export_shared_high_coverage_bed(shared_peaks, "shared_peaks.bed")
# Generate SVG plots
visualize.plot_coverage(coverage_maps, max_length, "coverage.svg")
visualize.plot_all(coverage_maps, max_length, "multi_ref_coverage.svg")
Consensus generation
from samcov.consensus import generate_all_consensus
# Requires samtools on PATH
generate_all_consensus("sample1.bam", "sample2.bam", output_folder="consensus/")
# → consensus/sample1.fasta
# → consensus/sample2.fasta
Requirements
- Python ≥ 3.10
tqdm(progress bars)matplotlib(for SVG plots)samtools(optional, required for consensus and coverage extraction)
Development
# Run the test suite
uv run pytest tests/ -v
# Build a wheel
uv build
# Release (semantic-release, CI only)
npx semantic-release
License
MIT