Can someone help me with this?
I can't make the replace text feature work to remove extra paragraph returns or manual page breaks.
I see that both are there in the "replace this with that" list, but they don't get replaced in my test files.
I also need to know how to repair broken sentences... such as are common when working with a file that has come from a scanner, or has been converted from one format to another.
Typically, a sentence will just get broken for no known reason, with the second half ending up on the line below. So there is one sentence (or half a sentence) ending without an "end of sentence" punctuation mark, and the line below starts without the usual capitalized letter.
The best I can figure out, the need is to find a pilcrow that is NOT preceded with an "end of sentence" punctuation mark... but I don't know how to write out the formula to do that.
Any help would be much appreciated.
Thanks!
Autocorrect Replace Text
-
cjseasyaspie
- Posts: 34
- Joined: Tue Jul 03, 2012 9:10 am
- Contact:
Here is how to replace manual page breaks with a paragraph end mark (removing the manual page breaks only might result in undesirable consequences):

Here is how to replace two paragraph end marks with only one:

As you noticed, a common feature of documents which result from the work of an OCR application, is that some sentences are broken over 2 lines for no apparent reason. Most of the time, the following patterns can be found:
1. pilcrow+space
In such cases, use this setup:
Find box: ^p(+space)
Replace with box: space
2. space+pilcrow+space
Use this setup:
Find box: (space)^p(space)
Replace with box: (space)
3. comma+pilcrow
Use this setup:
Find box: ,^p
Replace with box: ,(space)
4. comma+pilcrow+space
Use this setup:
Find box: ,(space)^p(space)
Replace with box: ,(space)
Note that the Atlantis spellchecker (“Tools | Spellcheck…”) and the Atlantis AutoCorrect after-you-type feature (“Tools | AutoCorrect…”) will show you real misspellings or punctuation problems, but also typos created by the OCR application that you might have overlooked.
HTH.
Cheers,
Robert

Here is how to replace two paragraph end marks with only one:

As you noticed, a common feature of documents which result from the work of an OCR application, is that some sentences are broken over 2 lines for no apparent reason. Most of the time, the following patterns can be found:
1. pilcrow+space
In such cases, use this setup:
Find box: ^p(+space)
Replace with box: space
2. space+pilcrow+space
Use this setup:
Find box: (space)^p(space)
Replace with box: (space)
3. comma+pilcrow
Use this setup:
Find box: ,^p
Replace with box: ,(space)
4. comma+pilcrow+space
Use this setup:
Find box: ,(space)^p(space)
Replace with box: ,(space)
Note that the Atlantis spellchecker (“Tools | Spellcheck…”) and the Atlantis AutoCorrect after-you-type feature (“Tools | AutoCorrect…”) will show you real misspellings or punctuation problems, but also typos created by the OCR application that you might have overlooked.
HTH.
Cheers,
Robert
-
cjseasyaspie
- Posts: 34
- Joined: Tue Jul 03, 2012 9:10 am
- Contact:
Thank you!
I did try a simple search and replace, but maybe I did something wrong.
But I had no idea how to attack the broken sentence problem.
I'll go through your steps.
btw... I write tutorials for newbies... the reason I need to work everything down to the easiest possible routines.
Thanks again!
CJ
I did try a simple search and replace, but maybe I did something wrong.
But I had no idea how to attack the broken sentence problem.
I'll go through your steps.
btw... I write tutorials for newbies... the reason I need to work everything down to the easiest possible routines.
Thanks again!
CJ