cell-extract (1.4.0)

Published 2026-05-21 17:06:15 +00:00 by ross in ross/cell-extract

To install the package using pip, run the following command:

pip install --index-url  cell-extract

For more information on the PyPI registry, see the documentation.

Extract specific columns from multiple tabular files and merge by row identifier.

cell-extract

Extract specific columns from multiple tabular files and merge them by row identifier.

cell-extract reads CSV, TSV, or custom-delimited files, extracts a user-specified column from each, and produces a single merged output table. Duplicate row IDs within a file are handled by averaging their numeric values.

Installation

pip install -e .

Or with dev dependencies (for testing):

pip install -e ".[dev]"

Quick Start

Given two files:

alpha.csv

gene,expr,pval
BRCA1,2.3,0.01
TP53,5.1,0.001

beta.csv

gene,expr,pval
BRCA1,4.7,0.02
TP53,3.2,0.05

Run:

cell-extract alpha.csv beta.csv --column expr

Output:

Origin,alpha,beta
BRCA1,2.3,4.7
TP53,5.1,3.2

Usage

cell-extract [OPTIONS] FILE [FILE ...]

Required Argument

Option	Description
`--column COLUMN`, `-c COLUMN`	Name of the column to extract from each file

Options

Option	Description
`--output FILE`, `-o FILE`	Write output to FILE instead of stdout
`--output-format {csv,tsv}`, `-f {csv,tsv}`	Output format (default: csv)
`--with-source`, `-s`	Add a `Source` column recording which original column was extracted
`--delimiter DELIM`, `-d DELIM`	Input delimiter override (e.g. `tab`, `pipe`, `\|`, `;`)
`--skip-lines N`	Skip N leading lines in each file before parsing headers
`--id-column COLUMN`, `-i COLUMN`	Column to use as row identifier (default: first column)
`--version`, `-V`	Show version and exit
`--help`, `-h`	Show help message

Behavior

Row union: All row identifiers from all files are merged. If a row exists in file A but not file B, the cell for file B will be empty.
Duplicate rows averaged: If a row identifier appears multiple times within a file, the values are averaged. Non-numeric values in the target column are skipped with a warning.
Integer formatting: Averages that are whole numbers display without a decimal point (e.g. 6); fractional results include a decimal (e.g. 1.5).
Empty cells: Missing values are written as empty fields.
Delimiter detection: .csv → comma; .tsv or .tab → tab. Use --delimiter to override.
Output format: CSV by default; use -f tsv for TSV output.
Row identifier: Defaults to the first column of each file. Use --id-column to specify a different column.

Examples

Multiple files with different row sets

cell-extract set1.csv set2.csv set3.csv -c expression

TSV input and output

cell-extract -c count -f tsv sample_A.tsv sample_B.tsv

Save to file

cell-extract -c fold_change -o merged.csv *.csv

With source tracing column

cell-extract alpha.csv beta.csv -c expr -s
# Origin,alpha,beta,Source
# BRCA1,2.3,4.7,expr
# TP53,5.1,3.2,expr

Custom delimiter (pipe-delimited files)

cell-extract -c val -d pipe data1.csv data2.csv
cell-extract -c val -d "|" data1.csv data2.csv

Skip metadata header lines

# Files with experiment labels before the column headers:
#   Experiment: RNA-seq
#   Date: 2025-02-14
#   gene,expr,pval
#   BRCA1,2.3,0.01

cell-extract -c expr --skip-lines 2 file1.csv file2.csv

Custom row identifier column

# Use the "label" column instead of the first column as the row key
cell-extract -c val -i label data.csv

Duplicate rows averaged

# input.csv:
# id,val
# X,10
# X,30

cell-extract input.csv -c val
# Origin,input
# X,20        ← average of (10+30)/2

All features together

cell-extract -c result -i sample_id -d pipe --skip-lines 3 -s -f tsv -o out.tsv *.csv

Development

Run tests:

pip install -e ".[dev]"
pytest tests/

License

MIT

Requires Python: >=3.9

Details

PyPI

ross/cell-extract

2026-05-21 17:06:15 +00:00

MIT

19 KiB

Assets (2)

cell_extract-1.4.0-py3-none-any.whl 7.5 KiB

cell_extract-1.4.0.tar.gz 11 KiB

Versions (6) View all

1.4.0

2026-05-21

1.3.1

2026-05-20

1.3.0

2026-05-05

1.2.0

2026-05-05

1.1.0

2026-05-05

Issues

cell-extract (1.4.0)

Installation

About this package

cell-extract

Installation

Quick Start

Usage

Required Argument

Options

Behavior

Examples

Multiple files with different row sets

TSV input and output

Save to file

With source tracing column

Custom delimiter (pipe-delimited files)

Skip metadata header lines

Custom row identifier column

Duplicate rows averaged

All features together

Development

License

Requirements