Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interleaved output is not reproducible with multiple threads #552

Open
briankegerreis opened this issue Mar 15, 2024 · 2 comments
Open

interleaved output is not reproducible with multiple threads #552

briankegerreis opened this issue Mar 15, 2024 · 2 comments

Comments

@briankegerreis
Copy link

When writing to interleaved stdout with the --stdout flag, reads appear to be written in random order as md5sums from repeat runs differ.

steps to reproduce with v0.23.3:

for TRY in 1 2 3; do fastp -w $THREADS --dont_eval_duplication -i in1.fq.gz -I in2.fq.gz -A -G -L -Q --stdout > interleave_attempt${TRY}.fastq; done
md5sum interleave*
for TRY in 1 2 3; do fastp -w $THREADS --dont_eval_duplication -i in1.fq.gz -I in2.fq.gz -A -G -L -Q -o split_attempt${TRY}_1.fastq -O split_attempt${TRY}_2.fastq; done
md5sum split*
@ckrushton
Copy link

I have also encountered this, and upon investigation, it appears to be the result of a race condition when writing to stdout with multiple threads. While writing to an output file, it FASTP seems to always be consistent, because a thread that completes faster than others will wait until the others complete before writing to the output. Unfortunately this logic is not applied when writing to stdout.

We ended up using --stdout with a single thread in a our workflow, then modifying the source code directly to allow FASTP to work with named pipes (currently it appears to write the output in a consistent order, but then block and hang after it finishes processing)

@tshtatland
Copy link

I also found fastp output to be irreproducible with the default number of threads (3), when writing to STDOUT, with fastp v0.23.2. Switching to single thread (-w 1), as suggested by @ckrushton, made the output reproducible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants