a second awk to insert a column with the length of the sequence · sort on first column · remove the first colum · convert back to fasta.
shuffle shuffle sequences sliding sliding sequences, circular genome supported sort sort sequences by id/name/sequence/length split split sequences into files ...
duplicate of how to rearrange fasta file according to its length.
This probably was to allow for preallocation of fixed line sizes in software: at the time most users relied on Digital
Don't implement a FASTA reader yourself! Like most cases, there are some smart people that already did this for you. Use for example ...
sizeseq -osformat swiss Sort sequences by size Input sequence set: globins.fasta Return longest sequence first [N]: output sequence(s) [globins.swiss]: ...
Seems like opening two files for each sequence is probably contibuting to a lot to the run time. You could pass file handles to your get/write ...
In order to reduce the size of the cache and allow the serialisation to occur, some changes ...
sizeseq -osformat swiss Sort sequences by size Input sequence set: globins.fasta Return longest sequence first [N]: output sequence(s) [globins.swiss]: ...
rename, Renaming duplicated IDs. Ordering, shuffle, Shuffling sequences. sort, Sorting sequences by ID/name/sequence/length ...