When we try to open the big file (e.g. : server log), we may meet an error when we open with text editor.
Notepad : File is too large for Notepad .
Notepad++ : File is too big to be opened .
At this point, the only way is try to split the file using program. Surely it is always good to use those "close to metal" language like C++ . However , you may not want to install compiler , SDK etc.
Is there any convenient way ? Yes ,using batch script is always a solution .
But always remember , it may takes you more than half hour to split 1GB file into 100 small files.
Code in Batch .bat script (Method 1 Faster)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28 | @echo off
setLocal EnableDelayedExpansion
set limit=50000 #Rows per file
REM can be in any extension (e.g. csv ) , as long as it is a text file
set file=YourFileName.txt
set lineCounter=1
set filenameCounter=1
set name=
set extension=
for %%a in (%file%) do (
set "name=%%~na"
set "extension=%%~xa"
)
for /f "tokens=*" %%a in (%file%) do (
if !lineCounter! gtr !limit! (
set /a filenameCounter=!filenameCounter! + 1
set lineCounter=1
echo Created !splitFile!.
)
REM Output filename pattern YourFileName-part1.csv , YourFileName-part2.csv
set splitFile=!name!-part!filenameCounter!!extension!
echo %%a>> !splitFile!
set /a lineCounter=!lineCounter! + 1
)
|
Code in Batch .bat script (Method 2 Slower)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 | @echo off
setlocal enableextensions disabledelayedexpansion
set STARTTIME=%TIME%
set "nLines=50000" #Rows in each file
set "line=0"
REM can be in any extension (e.g. csv ) , as long as it is a text file
for /f "usebackq delims=" %%a in ("InputFileName.txt") do (
set /a "file=line/%nLines%", "line+=1"
setlocal enabledelayedexpansion
for %%b in (!file!) do (
endlocal
>>"OutputName_%%b.txt" echo %%a
REM Ouput filename pattern : OutputName_1.txt , OutputName_2.txt REM Filename prefix will NOT follow the input file in this way.
)
)
|
The way of writing the script could lead to double process time.
- Test with 100K rows , >18 MB file.
For example : Split a file 18.7MB into 3 files (each file with max 31670 rows).
Using method 1 one takes 23s , while using method 2 takes 46s.
- Test with 5000K rows , >1 GB file
With
method 1 , if a 1.17GB (1230188 KB , around 5000Krows inside) has to be split into 100
small files (50K rows @file) , it takes
30m42s .
With method 2, believer me, you don't want to try.
- PC configuration reference
16GB RAM , with 8 x QuaCores , i5 CPU.
Conclusion
- Better use C++ to split file over 1GB . ;)