How to delete rows from a huge (big) file

The question was asked: 6 years 9 months ago   views: 35

To read the entire file size of 500-600 MB I can't, because these data are loaded into RAM, for me it is too expensive.

Read a file file_get_content'om c limit lines (nominally 1000 lines). I then say to remove a particular row. Without using $f = file.

Read more: I read a very large file of 1000 lines (first 1000 rows) of your process and depending on conditions, some of the row you want to delete and some to keep.

I can record the result to a temporary file, and if the script stops or something else, and rollback capabilities are not.

Asked: 28-11-2012 в 13:49:58
And sed does not rescue the father of Russian democracy? - 28-11-2012 в 14:01:22
@ifrops, I do not understand Your alarm about the entries in the temporary file. How did - and do it. Read the old lines, write the rows to a temporary. Then rename (mv) it to old. Only do the files on the same file system. All OK. The name of the temporary file remember in some file. If the script had fallen, nothing terrible happened. Of course, someone (m. b. the script before the main work) must remove "irrelevant" temporary files. - 28-11-2012 в 14:16:38
@avp What do you mean "place"? Without creating the output file? So deleting a row in the beginning of the file, followed 600M shoveling all the lines-sight apocalyptic. If we are talking about what sed'it is necessary to specify the name of the output file -- it does not.(-i) - 28-11-2012 в 14:21:34
@ifrops: what's wrong if the script falls? Well in a temporary file will be incorrect data, delete it off and start the script again. - 28-11-2012 в 15:08:52
@ifrops, if any steps to solve Your problem don't fit, it's because You (meaning tasks) not really described. By the way, here You have opened the file (as a whole). Send data. The script falls. After the recall it will still re-send file (do not change). So in that regard, nothing has improved. - 28-11-2012 в 15:45:36

Answers   2


The idea of this.

Deducted from the the file 100 (200, 1000 rows), filtered and written to the output file. Then note in a special file, number of lines subtracted and from which position (or simply the block number). And so on in a cycle.

If the script drops and it restarts, then it subtracts from spec file label for a start and begin to process it further.

The downsides are two:

  • some blocks will be filtered two or more times (as the script continues to run).
  • need to note in the output file that the entire block was recorded. For example add to the end of the file label, and when the next block is to remove and add again in the end.
Answered: 28-11-2012 в 13:59:33
Comrades minus, even though you write what you do not like. - 06-12-2012 в 12:00:03
cat myfile.txt | grep -v textscrollexample > newfile.txt

but in more detail, a couple of lines of the source file, that is removed and on what basis. The above example is poor, and obviously not for your case, but the information is not enough ))

Answered: 28-11-2012 в 13:58:29
it is better to avoid unnecessary commands: grep -v filter myfile.txt >newfile.txt - 28-11-2012 в 14:45:42