Wow, this forum looks great, and I really hope that with a little bit of help I can get on with my work and because of the fact and the fact that I'm new here so could somebody please come help me?
My problem starts out due to the fact that I'm converting documents from pdf for to word format because I didn't save the word files after I had written them.
So, here I am, and I've figured out just about everything about the software that I need to know about using it as far as my project goes, but one thing has me stumped...
The problem is, "Whenever I try to covert the pdf files on my computer to word, the documents retain their original hyphenated characters so now whenever I try to edit text within the body of the document there's always hundreds of hyphenated words spread throughout the entire text. Even the words in the center of the page. Is there a way to automatically remove all of these nuisance hyphens or do I have to weed them all out by hand?
Nuisance Hyphenated Text
-
- Posts: 5
- Joined: Wed Jul 31, 2013 10:47 pm
- Location: Joshua Tree, CA
- Contact:
hyphens
(Please see EDIT at end of this text. I did not see, originally, that the hyphenated words were all over the place. However, what I wrote before the EDIT is still relevant.)
I could help you if I knew more about the actual text file.
How did you get the text file that you are using in Atlantis? Did you do a copy-and-paste or did you "Save as Text" from within a PDF viewer? Either way, you would have obtained an unformatted text file.
When you make a text file from a PDF, it retains the same line endings as the lines in the PDF. The lines still have "hard returns" that make the line break at that point. If you had hyphenation turned on in Word, when you created the Word files, then you will have a zillion hyphens at the ends of lines. These are so-called "soft hyphens" that do not really need to be in the word; they are present only because the line break caused the word to be split.
First, for the future, it is almost ALWAYS better to turn hyphenation off in a document. Unless you have very narrow columns, the use of end-of-line soft hyphens is more problematical than helpful. Of course, words that require hyphenation (like *self-conscious*) are a different matter. That kind of hyphen is not a soft hyphen, but a normal hyphen.
If you have an understanding of how to handle "hard returns" in Search and Replace, you can use the Atlantis "Search and Replace" to find hyphens that occur only at the end of a line and replace them with just a space. That is one way to go about the procedure. Of course, you will zap a few hyphens that really needed to be there (as in *self-conscious*), but you avoid zapping any of the hyphenated words that are NOT at the end of a line.
Try this on a COPY of your file (not the original):
Under EDIT go to Replace. (You can also just hit Ctrl+H)
In Find, enter this code: -^p
(The - is the hyphen, and the ^p is the code for a hard return.)
In Replace, hit the spacebar one time.
Now run Replace. That ought to delete all hyphens at the end of a line. You may delete a few hyphens that you would have wanted to keep. But you will not be deleting any hyphens except those at the end of a line. You may need to go through and remove most of the hard returns at line ends, too. Again, without seeing your file, I cannot say for sure what you need to do.
I have been doing this kind of thing for 25 years, but it is extremely difficult to walk someone through this, because the steps can be intricate.
Just make a copy of a file and play around with the Search and Replace until you develop an idea of what to do.
Good luck.
Roland
EDIT: Ah, yes ... I went back and reread your post. You have a lot of hyphens in the center of a line. This is because the hyphenated words were originally at the end of a line in a PDF. I have seen this a zillion times.
Your best bet is to use Search and Replace and zap each instance individually, because you will have to decide whether the word NEEDS a hyphen or not. Most will not. The process is very quick. The Search will jump to the next instance of a hyphen and you will answer yes or no as to the replacement. Of course, if you want to zap them all, you can do that with one click of a button.
I could help you if I knew more about the actual text file.
How did you get the text file that you are using in Atlantis? Did you do a copy-and-paste or did you "Save as Text" from within a PDF viewer? Either way, you would have obtained an unformatted text file.
When you make a text file from a PDF, it retains the same line endings as the lines in the PDF. The lines still have "hard returns" that make the line break at that point. If you had hyphenation turned on in Word, when you created the Word files, then you will have a zillion hyphens at the ends of lines. These are so-called "soft hyphens" that do not really need to be in the word; they are present only because the line break caused the word to be split.
First, for the future, it is almost ALWAYS better to turn hyphenation off in a document. Unless you have very narrow columns, the use of end-of-line soft hyphens is more problematical than helpful. Of course, words that require hyphenation (like *self-conscious*) are a different matter. That kind of hyphen is not a soft hyphen, but a normal hyphen.
If you have an understanding of how to handle "hard returns" in Search and Replace, you can use the Atlantis "Search and Replace" to find hyphens that occur only at the end of a line and replace them with just a space. That is one way to go about the procedure. Of course, you will zap a few hyphens that really needed to be there (as in *self-conscious*), but you avoid zapping any of the hyphenated words that are NOT at the end of a line.
Try this on a COPY of your file (not the original):
Under EDIT go to Replace. (You can also just hit Ctrl+H)
In Find, enter this code: -^p
(The - is the hyphen, and the ^p is the code for a hard return.)
In Replace, hit the spacebar one time.
Now run Replace. That ought to delete all hyphens at the end of a line. You may delete a few hyphens that you would have wanted to keep. But you will not be deleting any hyphens except those at the end of a line. You may need to go through and remove most of the hard returns at line ends, too. Again, without seeing your file, I cannot say for sure what you need to do.
I have been doing this kind of thing for 25 years, but it is extremely difficult to walk someone through this, because the steps can be intricate.
Just make a copy of a file and play around with the Search and Replace until you develop an idea of what to do.
Good luck.
Roland
EDIT: Ah, yes ... I went back and reread your post. You have a lot of hyphens in the center of a line. This is because the hyphenated words were originally at the end of a line in a PDF. I have seen this a zillion times.
Your best bet is to use Search and Replace and zap each instance individually, because you will have to decide whether the word NEEDS a hyphen or not. Most will not. The process is very quick. The Search will jump to the next instance of a hyphen and you will answer yes or no as to the replacement. Of course, if you want to zap them all, you can do that with one click of a button.
Last edited by rstroud on Thu Aug 01, 2013 12:16 pm, edited 1 time in total.
It all depends on what exactly are these “hyphens”. Are they hyphens joining the two parts of a compound word such as “blue-green”, “over-ripe”, “freeze-dry”, etc., or are they “conditional hyphens”, i.e. hyphens that will only be used when the word is divided at the end of a line of text?
If they are “conditional hyphens”, they will only be seen in the document window when “Optional hyphens” is checked on the “View” tab of the Atlantis “Tools | Options…” dialog, and when the “View | Special Symbols” mode is on.
Now if they are “conditional hyphens”, you can easily remove them all at one go:
1. Press “Ctrl+Home” to place the insertion cursor at the top of the document.
2. Press “Ctrl+H” to open the “Find/Replace” dialog.
3. In the “Find” box, enter the symbol for “Optional Hyphen” ("^-", without the quote marks):

4. If necessary, clear the “Replace with” box.
5. Press the “Replace All” button.
HTH.
Cheers,
Robert
If they are “conditional hyphens”, they will only be seen in the document window when “Optional hyphens” is checked on the “View” tab of the Atlantis “Tools | Options…” dialog, and when the “View | Special Symbols” mode is on.
Now if they are “conditional hyphens”, you can easily remove them all at one go:
1. Press “Ctrl+Home” to place the insertion cursor at the top of the document.
2. Press “Ctrl+H” to open the “Find/Replace” dialog.
3. In the “Find” box, enter the symbol for “Optional Hyphen” ("^-", without the quote marks):

4. If necessary, clear the “Replace with” box.
5. Press the “Replace All” button.
HTH.
Cheers,
Robert
Optional hyphens
Robert, I did not check, but I think that the hyphens that get copied from a PDF file, in a copy-and-paste, all come out as just regular "hard" hyphens, don't they?
If I copy/paste from a PDF file using Adobe Reader or PDF-XChange Viewer, I don’t get any extra hyphens, soft or hard. What I invariably get are broken paragraphs with tons of unnecessary paragraph end marks:

So if you use Copy/paste into Atlantis or any other word processor, you lose all the formatting, and you get tons of unnecessary paragraph end marks. This is just not the way to go.
You won’t get any of these headaches if instead you convert the PDF to a text format like the RTF, DOC, or DOCX formats.
I just did a simple test. I went to the Zamzar site and uploaded a short PDF file. I converted it to the DOCX format. I got it as attachment to an email a minute later. The converted DOCX file looks almost like a perfect clone of the original PDF file. I can now edit this new DOCX file to correct a few blemishes and imperfections.
Give it a try. In any case, you’ll get better results and less aggravation.

So if you use Copy/paste into Atlantis or any other word processor, you lose all the formatting, and you get tons of unnecessary paragraph end marks. This is just not the way to go.
You won’t get any of these headaches if instead you convert the PDF to a text format like the RTF, DOC, or DOCX formats.
I just did a simple test. I went to the Zamzar site and uploaded a short PDF file. I converted it to the DOCX format. I got it as attachment to an email a minute later. The converted DOCX file looks almost like a perfect clone of the original PDF file. I can now edit this new DOCX file to correct a few blemishes and imperfections.
Give it a try. In any case, you’ll get better results and less aggravation.
-
- Posts: 5
- Joined: Wed Jul 31, 2013 10:47 pm
- Location: Joshua Tree, CA
- Contact:
I don't know which method you use to convert your PDF files to word, but may I suggest that you have a look at "Calibre", whish is a program to organise e-books, but it also can convert between different formats. I use it to convert PDF and epub files to RTF. This way the files can be edited directly in Atlantis.
The formatting will be very close to the original file. Even pictures from the file can be found in the converted file. You then can use the Find/Replace in Atlantis to remove unwanted double spacings etc.
You'll find Calibre here:
http://calibre-ebook.com/
There is also a portable version of Calibre:
http://calibre-ebook.com/download_portable)
--
Torben
The formatting will be very close to the original file. Even pictures from the file can be found in the converted file. You then can use the Find/Replace in Atlantis to remove unwanted double spacings etc.
You'll find Calibre here:
http://calibre-ebook.com/
There is also a portable version of Calibre:
http://calibre-ebook.com/download_portable)
--
Torben
-
- Posts: 5
- Joined: Wed Jul 31, 2013 10:47 pm
- Location: Joshua Tree, CA
- Contact:
Haven't heard of that one yet. I would like to say though that I do use nitro 8 pdf software with relative ease and success.
I just crop the page so that only the body of the document is visible, save it, and edit it with the Atlantis Word Processor. So I have everything I need without the cost of Microsoft 365 in the long run. Thank you Atlantis! krb.crc
I just crop the page so that only the body of the document is visible, save it, and edit it with the Atlantis Word Processor. So I have everything I need without the cost of Microsoft 365 in the long run. Thank you Atlantis! krb.crc