When we try to open the big file (e.g. : server log), we may meet an error when we open with text editor.
Notepad : File is too large for Notepad .
Notepad++ : File is too big to be opened .
At this point, the only way is try to split the file using program. Surely it is always good to use those "close to metal" language like C++ . However , you may not want to install compiler , SDK etc.
Is there any convenient way ? Yes ,using batch script is always a solution .
But always remember , it may takes you more than half hour to split 1GB file into 100 small files.
Code in Batch .bat script (Method 1 Faster)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | @echo off setLocal EnableDelayedExpansion set limit=50000 #Rows per file REM can be in any extension (e.g. csv ) , as long as it is a text file set file=YourFileName.txt set lineCounter=1 set filenameCounter=1 set name= set extension= for %%a in (%file%) do ( set "name=%%~na" set "extension=%%~xa" ) for /f "tokens=*" %%a in (%file%) do ( if !lineCounter! gtr !limit! ( set /a filenameCounter=!filenameCounter! + 1 set lineCounter=1 echo Created !splitFile!. ) REM Output filename pattern YourFileName-part1.csv , YourFileName-part2.csv set splitFile=!name!-part!filenameCounter!!extension! echo %%a>> !splitFile! set /a lineCounter=!lineCounter! + 1 ) |
Code in Batch .bat script (Method 2 Slower)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | @echo off setlocal enableextensions disabledelayedexpansion set STARTTIME=%TIME% set "nLines=50000" #Rows in each file set "line=0" REM can be in any extension (e.g. csv ) , as long as it is a text file for /f "usebackq delims=" %%a in ("InputFileName.txt") do ( set /a "file=line/%nLines%", "line+=1" setlocal enabledelayedexpansion for %%b in (!file!) do ( endlocal >>"OutputName_%%b.txt" echo %%a REM Ouput filename pattern : OutputName_1.txt , OutputName_2.txt REM Filename prefix will NOT follow the input file in this way.
)
)
|
- Efficiency Comparison
The way of writing the script could lead to double process time.
- Test with 100K rows , >18 MB file.
For example : Split a file 18.7MB into 3 files (each file with max 31670 rows).
Using method 1 one takes 23s , while using method 2 takes 46s.
- Test with 5000K rows , >1 GB file
With method 1 , if a 1.17GB (1230188 KB , around 5000Krows inside) has to be split into 100 small files (50K rows @file) , it takes 30m42s .- PC configuration reference
No comments:
Post a Comment