Please help. I was shocked to discover that all of the files I am
working with have corrupt text at the end.
I have 500 .txt text files in folder c:\rules\. Each file is about
40MB. I wanted to manually delete the corrupt text, however it takes
too long to open each file in Notepad [because those files are so
large].
The first line of each file contains text "Rule=1"
The number following text "Rule=" that is found throughout the files
is incremented by 1 each time this text appears in the file.
May I please ask you to help me by writing a macro that will do the following:
1. Open folder c:\rules\
2. Open the first file in that folder.
3. The macro has to find the second to last line that contains the
text "Rule=". Let us refer to this line as 2toLast
4. The macro should delete all text below the 2toLast line. The
macro has to also delete the 2toLast line.
5. The macro should save the changes made to the file.
6. The macro should go to the next file in the folder c:\rules\, and
repeat steps 3-5
7. Repeat step 6 until run out of files in folder c:\rules\.
Example
The first file in c:\rules\ is called acred2sub1.txt
It begins with text
"Id=23859, Rule=1, Gen=0, training =
11.5444689897386>0.0000>100670.4175{1.9803 1.0000}, selection =
11.6146812264607>0.0000>100688.0767{1.9983 1.0000}, testing =
11.5258096838284>0.0000>101230.9377{-2.3595 1.0000}, trial=0, birth =
0, time = 1051111.04:43:32 (parents: 0 and 0)
0 Probability GT [1488 0.000000 0.986820]"
This file ends with the text
******************************************************
******************************************************
******************************************************
***********************************************
***********************************************
***********************************************
"Id=2428985, Rule=2291, Gen=48, training =
11.5627062540301>0.0000>100670.4175{2.2533 1.0000}, selection =
11.5616177181233>98280.6602>98280.6602{1.8662 0.9000}, testing =
11.5258096838284>0.0000>101230.9377{-2.3595 1.0000}, trial=0, birth =
48, time = 1051112.05:31:48 (parents: 2383210 and 2383552)
0 Probability if-thenProb [1488 0.000000 1.000000]
1 Boolean if-then-else [1488 0 1]
2 Boolean if-then-else [1488 0 1]
3 Boolean and [1488 0 1]
4 Boolean any_boolean True [1488 1 1]
5 Boolean < [1488 0 1]
6 Real data [1488 29.125000 143.250000]
7 Variable
8 Real lag [1488 31.107955 139.937500]
9 Real ln [1488 3.371597 4.964591]
10 Real data [1488 29.125000 143.250000]
11 Variable
12 Real mov [1488 31.107955 139.937500]
13 Variable
14 Real days-remaining [1488 1.000000 62.000000]
15 Boolean < [635 0 1]
16 Real power [635 0.000000
3297966952784501300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
17 Real abs [635 17777125493.869579 947106196758301.370000]
18 Real power [635 17777125493.869579 947106196758301.370000]
19 Real data [635 29.125000 137.875000]
20 Variable
21 Real any_real 7.000000 0x0000000000001c40 [635 7.000000 7.000000]
22 Real data [635 29.125000 137.875000]
23 Variable
24 Real power [635 0.000000
92510233423514822000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
25 Real abs [635 1.000000 3521614606208.000000]
26 Real power [635 1.000000 3521614606208.000000]
27 Real days-remaining [635 1.000000 62.000000]
28 Real any_real 7.000000 0x0000000000001c40 [635 7.000000 7.000000]
29 Real data [635 29.125000 137.875000]
30 Variable
31 Boolean < [853 0 1]
32 Real data [853 33.250000 143.250000]
33 Variable
34 Real days-remaining [853 1.000000 62.000000]
35 Boolean < [78 0 1]
36 Real data [78 29.500000 73.000000]
37 Variable
38 Real power [78 0.000000
3297966952784501300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
39 Real abs [78 19442589725.148437 11047398519097.000000]
40 Real power [78 19442589725.148437 11047398519097.000000]
41 Real data [78 29.500000 73.000000]
42 Variable
43 Real any_real 7.000000 0x0000000000001c40 [78 7.000000 7.000000]
44 Real data [78 29.500000 73.000000]
45 Variable
46 Boolean any_boolean False [1410 0 0]
47 Probability ANY_PROB 1.000000 0x000000000000f03f [40 1.000000 1.000000]
Id=2402972, Rule=2292, Gen=48, training =
11.5417736815012>0.0000>100085.4204{1.8968 1.0000}, selection =
11.5609984677956>91748.2451>91748.2451{1.4932 0.9000}, testing =
11.5258096838284>0.0000>101230.9377{-2.3595 1.0000}, trial=0, birth =
48, time = 1051112.05:07:44 (parents: 2363747 and 2384852)
0 Probability if-thenProb [1488 0.000000 1.000000]
1 Boolean if-then-else [1488 0 1]
2 Boolean and [1488 0 1]
3 Boolean any_boolean True [1488 1 1]
4 Boolean < [1488 0 1]
5 Real data [1488 29.125000 143.250000]
6 Variable
7 Real lag [1488 31.107955 139.937500]
8 Real ln [1488 3.467114 4.843345]
9 Real mov [1488 32.044118 126.893145]
10 Variable
11 Real mov [1488 29.125000 143.250000]
12 Variable
13 Real any_real 0.309681 0xc0520aa2d1d1d33f [1488 0.309681 0.309681]
14 Real mov [1488 31.107955 139.937500]
15 Variable
16 Real days-remaining [1488 1.000000 62.000000]
17 Boolean < [636 0 1]
18 Real data [636 29.125000 137.875000]
19 Variable
20 Real power [636 0.000000
3297966952784501300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
21 Real abs [636 17777125493.869579 947106196758301.370000]
22 Real power [636 17777125493.869579 947106196758301.370000]
23 Real data [636 29.125000 137.875000]
24 Variable
25 Real any_real 7.000000 0x0000000000001c40 [636 7.000000 7.000000]
26 Real data [636 29.125000 137.875000]
27 Variable
28 Boolean < [852 0 1]
29 Real + [852 10.625000 257.000000]
30 Real data [852 33.250000 143.250000]
31 Variable
32 Real lag [852 -26.500000 115.750000]
33 Real data [852 33.250000 143.250000]
34 Variable
35 Real - [852 -26.500000 115.750000]
36 Real maximum [852 34.250000 116.750000]
37 Variable
38 Real maximum [852 31.000000 116.750000]
39 Variable
40 Real any_real 7.000000 0x0000000000001c40 [852 7.000000 7.000000]
41 Real days-remaining [852 1.000000 62.000000]
42 Real minimum [852 33.250000 143.250000]
43 Variable
44 Real * [852
1377493611034874400000000000000000000000000000.000000
19946234872205364000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
45 Real + [852 34.250000 144.250000]
46 Real any_real 1.000000 0x000000000000f03f [852 1.000000 1.000000]
47 Real maximum [852 33.250000 143.250000]
48 Variable
49 Real days-remaining [852 1.000000 62.000000]
50 Real lag [852
35548222220254824000000000000000000000000000.000000
138275458386172380000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
51 Real data [852 33.250000 143.250000]
52 Variable
53 Real * [852
35548222220254824000000000000000000000000000.000000
138275458386172380000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
54 Real / [852
1205024482042536400000000000000000000000000.000000
1184372234571069500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
55 Real power [852
22895465158808191000000000000000000000000000.000000
22503072456850322000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
56 Real data [852 29.500000 116.750000]
57 Variable
58 Real data [852 29.500000 116.750000]
59 Variable
60 Real any_real 19.000000 0x0000000000003340 [852 19.000000 19.000000]
61 Real data [852 29.500000 116.750000]
62 Variable
63 Probability GT [635 0.000000 1.000000]
64 Real mov [635 32.050000 126.873904]
65 Variable
66 Real minimum [635 29.125000 113.750000]
67 Variable
68 Real + [635 34.283784 179.740079]
69 Real abs [635 32.156250 126.388542]
70 Real mov [635 32.156250 126.388542]
71 Variable
72 Real data [635 29.125000 137.875000]
73 Variable
74 Real days-remaining [635 1.000000 62.000000]
75 Real abs [635 29.125000 137.875000]
76 Real data [635 29.125000 137.875000]
77 Variable
78 Real abs [635 1.000000 62.000000]
79 Real days-remaining [635 1.000000 62.000000]
Id=2426880, Rule=2293, Gen=48, training =
11.5420824603975>0.0000>100670.4175{2.0713 1.0000}, selection =
11.5609196233777>91443.0849>91443.0849{1.4792 0.9000}, testing =
11.5258096838284>0.0000>101230.9377{-2.3595 1.0000}, trial=0, birth =
48, time = 1051112.05:31:23 (parents: 2386050 and 2356912)
0 Probability if-thenProb [1488 0.000000 1.000000]
1 Boolean if-then-else [1488 0 1]
2 Boolean and [1488 0 1]
3 Boolean any_boolean True [1488 1 1]
4 Boolean > [1488 0 1]
5 Real any_real 76.000000 0x0000000000005340 [1488 76.000000 76.000000]
6 Real data [1488 29.125000 143.250000]
7 Variable
8 Boolean < [743 0 1]
9 Real data [743 29.125000 75.875000]
10 Variable
11 Real power [743 0.000000
3297966952784501300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
12 Real abs [743 17777125493.869579 14477411826665.816000]
13 Real power [743 17777125493.869579 14477411826665.816000]
14 Real data [743 29.125000 75.875000]
15 Variable
16 Real any_real 7.000000 0x0000000000001c40 [743 7.000000 7.000000]
17 Real data [743 29.125000 75.875000]
18 Variable
19 Boolean any_boolean False [745 0 0]
20 Probability BOOLEAN [653 0.000000 1.000000]
21 Boolean or [653 0 1]
22 Boolean any_boolean False [653 0 0]
23 Boolean if-then-else [653 0 1]
24 Boolean any_boolean False [653 0 0]
25 Boolean any_boolean False [unused]
26 Boolean < [653 0 1]
27 Real mov [653 32.044118 82.789474]
28 Variable
29 Real data [653 29.125000 75.875000]
30 Variable
31 Real days-remaining [653 1.000000 62.000000]"
***********************************************
***********************************************
***********************************************
***********************************************
***********************************************
***********************************************
The text "Rule=" appears 2293 times in this file. Thus the macro
should find the line containing text "rule=2292" and delete this line
and all of the text below this line. When this is done, file
acred2sub1.txt will end with text
***********************************************
***********************************************
***********************************************
***********************************************
***********************************************
***********************************************
"Id=2428985, Rule=2291, Gen=48, training =
11.5627062540301>0.0000>100670.4175{2.2533 1.0000}, selection =
11.5616177181233>98280.6602>98280.6602{1.8662 0.9000}, testing =
11.5258096838284>0.0000>101230.9377{-2.3595 1.0000}, trial=0, birth =
48, time = 1051112.05:31:48 (parents: 2383210 and 2383552)
0 Probability if-thenProb [1488 0.000000 1.000000]
1 Boolean if-then-else [1488 0 1]
2 Boolean if-then-else [1488 0 1]
3 Boolean and [1488 0 1]
4 Boolean any_boolean True [1488 1 1]
5 Boolean < [1488 0 1]
6 Real data [1488 29.125000 143.250000]
7 Variable
8 Real lag [1488 31.107955 139.937500]
9 Real ln [1488 3.371597 4.964591]
10 Real data [1488 29.125000 143.250000]
11 Variable
12 Real mov [1488 31.107955 139.937500]
13 Variable
14 Real days-remaining [1488 1.000000 62.000000]
15 Boolean < [635 0 1]
16 Real power [635 0.000000
3297966952784501300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
17 Real abs [635 17777125493.869579 947106196758301.370000]
18 Real power [635 17777125493.869579 947106196758301.370000]
19 Real data [635 29.125000 137.875000]
20 Variable
21 Real any_real 7.000000 0x0000000000001c40 [635 7.000000 7.000000]
22 Real data [635 29.125000 137.875000]
23 Variable
24 Real power [635 0.000000
92510233423514822000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
25 Real abs [635 1.000000 3521614606208.000000]
26 Real power [635 1.000000 3521614606208.000000]
27 Real days-remaining [635 1.000000 62.000000]
28 Real any_real 7.000000 0x0000000000001c40 [635 7.000000 7.000000]
29 Real data [635 29.125000 137.875000]
30 Variable
31 Boolean < [853 0 1]
32 Real data [853 33.250000 143.250000]
33 Variable
34 Real days-remaining [853 1.000000 62.000000]
35 Boolean < [78 0 1]
36 Real data [78 29.500000 73.000000]
37 Variable
38 Real power [78 0.000000
3297966952784501300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
39 Real abs [78 19442589725.148437 11047398519097.000000]
40 Real power [78 19442589725.148437 11047398519097.000000]
41 Real data [78 29.500000 73.000000]
42 Variable
43 Real any_real 7.000000 0x0000000000001c40 [78 7.000000 7.000000]
44 Real data [78 29.500000 73.000000]
45 Variable
46 Boolean any_boolean False [1410 0 0]
47 Probability ANY_PROB 1.000000 0x000000000000f03f [40 1.000000 1.000000]"
May I please ask you to provide the actual Visual Basic code for a
macro [that I can execute in MS-Word, or Excel], that performs the
seven steps above?
Thank you for your kind help! |
Clarification of Question by
billbauer-ga
on
13 Dec 2005 13:36 PST
Here is the same example, but some text is removed so as to clarify the example:
Example
The first file in c:\rules\ is called acred2sub1.txt
It begins with text
"Id=23859, Rule=1, Gen=0, <snip>"
This file ends with the text
"Id=2428985, Rule=2291, Gen=48, <snip>
Id=2402972, Rule=2292, Gen=48, <snip>
Id=2426880, Rule=2293, Gen=48, <snip>"
The text "Rule=" appears 2293 times in this file. Thus the macro should
find the line containing text "Rule=2292" and delete this line and all of
the
text below this line. When this is done, file acred2sub1.txt will end with
text
"Id=2428985, Rule=2291, Gen=48, training =
11.5627062540301>0.0000>100670.4175{2.2533 1.0000}, selection =
11.5616177181233>98280.6602>98280.6602{1.8662 0.9000}, testing =
11.5258096838284>0.0000>101230.9377{-2.3595 1.0000}, trial=0, birth = 48,
time = 1051112.05:31:48 (parents: 2383210 and 2383552)
0 Probability if-thenProb [1488 0.000000 1.000000]
1 Boolean if-then-else [1488 0 1]
2 Boolean if-then-else [1488 0 1]
3 Boolean and [1488 0 1]
4 Boolean any_boolean True [1488 1 1]
5 Boolean < [1488 0 1]
6 Real data [1488 29.125000 143.250000]
7 Variable
8 Real lag [1488 31.107955 139.937500]
9 Real ln [1488 3.371597 4.964591]
10 Real data [1488 29.125000 143.250000]
11 Variable
12 Real mov [1488 31.107955 139.937500]
13 Variable
14 Real days-remaining [1488 1.000000 62.000000]
15 Boolean < [635 0 1]
16 Real power [635 0.000000
3297966952784501300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
17 Real abs [635 17777125493.869579 947106196758301.370000]
18 Real power [635 17777125493.869579 947106196758301.370000]
19 Real data [635 29.125000 137.875000]
20 Variable
21 Real any_real 7.000000 0x0000000000001c40 [635 7.000000
7.000000]
22 Real data [635 29.125000 137.875000]
23 Variable
24 Real power [635 0.000000
92510233423514822000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
25 Real abs [635 1.000000 3521614606208.000000]
26 Real power [635 1.000000 3521614606208.000000]
27 Real days-remaining [635 1.000000 62.000000]
28 Real any_real 7.000000 0x0000000000001c40 [635 7.000000
7.000000]
29 Real data [635 29.125000 137.875000]
30 Variable
31 Boolean < [853 0 1]
32 Real data [853 33.250000 143.250000]
33 Variable
34 Real days-remaining [853 1.000000 62.000000]
35 Boolean < [78 0 1]
36 Real data [78 29.500000 73.000000]
37 Variable
38 Real power [78 0.000000
3297966952784501300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000]
39 Real abs [78 19442589725.148437 11047398519097.000000]
40 Real power [78 19442589725.148437 11047398519097.000000]
41 Real data [78 29.500000 73.000000]
42 Variable
43 Real any_real 7.000000 0x0000000000001c40 [78 7.000000 7.000000]
44 Real data [78 29.500000 73.000000]
45 Variable
46 Boolean any_boolean False [1410 0 0]
47 Probability ANY_PROB 1.000000 0x000000000000f03f [40 1.000000
1.000000]"
Thank you for your kind help!
|
Request for Question Clarification by
lotd-ga
on
13 Dec 2005 18:38 PST
Hi billbauer,
I am posting my answer as a request for question clarification. Once
you, hopefully, confirm the code works as desired I will post it as an
official answer.
Please find the code below.
To use this code, open a new Excel file and follow the instructions below:
Select
Tools -> Macro -> Visual Basic Editor
Select
View -> Code
Copy and Paste the code below into the code window.
Click the save button.
You can then run the macro by selecting Run -> Run Sub/User Form.
PLEASE NOTE
I would suggest testing the script on a few copies of the files in a
different folder to ensure you receive the desired results. To do
this, please ensure you modify the RuleFilesFolder variable value from
?C:\Rules\? to your preferred folder.
As you have so many large files in a single folder. I would strongly
advise processing the files in batches by creating a ?processing?
folder and moving a set number of files (50-100 at a time) and then
moving them out of the ?processing? folder to a ?completed? folder.
Once you have processed all the files you can then simply rename the
?completed? folder to ?Rules?.
Please ensure that when changing the RuleFilesFolder variable that you
always include a backward slash ( \ ) at the end of the folder as
shown above for the Rules folder example.
Also, I do not know how much disk space you have and whether you wish
to save the original files as a precaution. If you DO NOT wish to save
the original files then please comment out the following line in the
code by adding a single quote ( ' ) character to the start of the
line.
File.Copy (RuleFilesFolder & File.Name & ".old")
You can also, if you wish, monitor the progress of the process by
uncommenting the following line in the code. This line will display a
pop-up after each file is processed and display the name of the file
as well as the count. However, you will have to click the OK button
after each file for the process to resume.
MsgBox File.Name & " Processed. File" & FileCounter & " of " &
FileCollection.Count, vbOKOnly, "File Processed"
Please let me know if you have any questions.
Regards,
lotd
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Option Explicit
Sub Main()
Dim MaxRuleNumber
Dim RuleFilesFolder
Dim objFSO
Dim File
Dim FileCollection
Dim arrFileLines()
Dim folder
Dim MaxRulesLine
Dim PrevMaxRulesLine
Dim txtStream
Dim i, l
Dim NewFile
Dim FileCounter
'File Path containing all the files
RuleFilesFolder = "C:\Rules\"
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Set the folder that contains all the files
Set folder = objFSO.getfolder(RuleFilesFolder)
'Create a file collection
Set FileCollection = folder.Files
FileCounter = 0
'Loop through each file in the in the folder
For Each File In FileCollection
MaxRulesLine = 0
i = 0
'Open the file to read the contents
Set txtStream = File.OpenAsTextStream(1)
'Read each line into an array and make a note of the
pneultimate line that contains the text "Rule="
Do Until txtStream.AtEndOfStream
ReDim Preserve arrFileLines(i)
arrFileLines(i) = txtStream.ReadLine
If InStr(1, arrFileLines(i), "Rule=") <> 0 Then
PrevMaxRulesLine = MaxRulesLine
MaxRulesLine = i
End If
i = i + 1
Loop
'The following line will save a copy of the original file with
the .old extension
'If you do not wish to save the original files then comment out
the line by adding a single quote (') character
'to the start of the line.
'The original files are saved in the same folder as the file
being processed but with the .old extension.
File.Copy (RuleFilesFolder & File.Name & ".old")
'Write to a new file which has the same name as the original
file, hence overwriting the original file.
l = 0
Set NewFile = objFSO.opentextfile(RuleFilesFolder & File.Name, 2, 0)
For l = LBound(arrFileLines) To (PrevMaxRulesLine - 1) Step 1
NewFile.writeline (arrFileLines(l))
Next
NewFile.Close
FileCounter = FileCounter + 1
'The following line will display a pop-up with the name of the
file processed as well as the file count.
'MsgBox File.Name & " Processed. File" & FileCounter & " of " &
FileCollection.Count, vbOKOnly, "File Processed"
Next
Set objFSO = Nothing
Set NewFile = Nothing
Set File = Nothing
End Sub
|