cell-extract (1.3.1)
Installation
pip install --index-url cell-extractAbout this package
Extract specific columns from multiple tabular files and merge by row identifier.
cell-extract
Extract specific columns from multiple tabular files and merge them by row identifier.
cell-extract reads CSV, TSV, or custom-delimited files, extracts a user-specified column from each, and produces a single merged output table. Duplicate row IDs within a file are handled by averaging their numeric values.
Installation
pip install -e .
Or with dev dependencies (for testing):
pip install -e ".[dev]"
Quick Start
Given two files:
alpha.csv
gene,expr,pval
BRCA1,2.3,0.01
TP53,5.1,0.001
beta.csv
gene,expr,pval
BRCA1,4.7,0.02
TP53,3.2,0.05
Run:
cell-extract alpha.csv beta.csv --column expr
Output:
Origin,alpha,beta
BRCA1,2.3,4.7
TP53,5.1,3.2
Usage
cell-extract [OPTIONS] FILE [FILE ...]
Required Argument
| Option | Description |
|---|---|
--column COLUMN, -c COLUMN |
Name of the column to extract from each file |
Options
| Option | Description |
|---|---|
--output FILE, -o FILE |
Write output to FILE instead of stdout |
--output-format {csv,tsv}, -f {csv,tsv} |
Output format (default: csv) |
--with-source, -s |
Add a Source column recording which original column was extracted |
--delimiter DELIM, -d DELIM |
Input delimiter override (e.g. tab, pipe, |, ;) |
--skip-lines N |
Skip N leading lines in each file before parsing headers |
--id-column COLUMN, -i COLUMN |
Column to use as row identifier (default: first column) |
--version, -V |
Show version and exit |
--help, -h |
Show help message |
Behavior
- Row union: All row identifiers from all files are merged. If a row exists in file A but not file B, the cell for file B will be empty.
- Duplicate rows averaged: If a row identifier appears multiple times within a file, the values are averaged. Non-numeric values in the target column are skipped with a warning.
- Integer formatting: Averages that are whole numbers display without a decimal point (e.g.
6); fractional results include a decimal (e.g.1.5). - Empty cells: Missing values are written as empty fields.
- Delimiter detection:
.csv→ comma;.tsvor.tab→ tab. Use--delimiterto override. - Output format: CSV by default; use
-f tsvfor TSV output. - Row identifier: Defaults to the first column of each file. Use
--id-columnto specify a different column.
Examples
Multiple files with different row sets
cell-extract set1.csv set2.csv set3.csv -c expression
TSV input and output
cell-extract -c count -f tsv sample_A.tsv sample_B.tsv
Save to file
cell-extract -c fold_change -o merged.csv *.csv
With source tracing column
cell-extract alpha.csv beta.csv -c expr -s
# Origin,alpha,beta,Source
# BRCA1,2.3,4.7,expr
# TP53,5.1,3.2,expr
Custom delimiter (pipe-delimited files)
cell-extract -c val -d pipe data1.csv data2.csv
cell-extract -c val -d "|" data1.csv data2.csv
Skip metadata header lines
# Files with experiment labels before the column headers:
# Experiment: RNA-seq
# Date: 2025-02-14
# gene,expr,pval
# BRCA1,2.3,0.01
cell-extract -c expr --skip-lines 2 file1.csv file2.csv
Custom row identifier column
# Use the "label" column instead of the first column as the row key
cell-extract -c val -i label data.csv
Duplicate rows averaged
# input.csv:
# id,val
# X,10
# X,30
cell-extract input.csv -c val
# Origin,input
# X,20 ← average of (10+30)/2
All features together
cell-extract -c result -i sample_id -d pipe --skip-lines 3 -s -f tsv -o out.tsv *.csv
Development
Run tests:
pip install -e ".[dev]"
pytest tests/
License
MIT
Requirements
Requires Python: >=3.9
Details
Assets (2)
Versions (6)
View all
cell_extract-1.3.1.tar.gz
11 KiB