Reader Benjamin e-mailed me recently with this request:
Reader Question
I’ve got text (imported badly – I don’t have access to the original source) which is spaced badly in Microsoft Word 2010 — meaning I have to manually cursor + delete then space-bar to put it back together without the green wiggles.It’s time consuming and I would like to know if there is an automated alternative. I’m sure I’m one of millions who are suffering with this. Can you help us?
He attached a video demonstrating his problem, which immediately made clear what he was up against:
When he says he’s “one of millions who are suffering with this,” I believe him. Because I’m one of them, too. And between the two of us, we might’ve come up with a good solution.
I come across this problem a lot myself. I quite often copy and paste text from Adobe Acrobat documents rather than retype it, but by far the biggest hassle is having to remove the hard returns at the end of every line. The best solution that I’ve come up with prior to this, and the one that I suggested to him (with the admission that it was a less-than-ideal solution) was to create a macro that automated the several keystrokes it took to go to the end of each line, insert a space, delete the hard return, return to the beginning of the line, and cursor down one line. I assigned the macro to the key combination Alt-Z and just executed it for each line that had a hard return I wanted to remove.
I uploaded a quick video on my YouTube channel to show him what I’d done. I told him I knew it wasn’t a perfect solution, and I promised to look into it further when I had time.
Within a few hours, though, Benjamin had replied that, using the keywords that were embedded in my YouTube video to find other videos on the subject, he had pieced together a better solution: a macro that, instead of working one line at a time as mine did, went through the entire document and cleaned it up.
Benjamin’s macro
Here’s the code he sent me:
Sub pagebreaks()
'
' pagebreaks Macro
'
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^p^p"
.Replacement.Text = "¬ ¬"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "¬"
.Replacement.Text = " "
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Briefly, his macro (called “pagebreaks”) uses the Find and Replace function in Word to perform two find-and-replace actions: the first pass replaces two adjacent paragraph breaks (the “^p^p” you see next to “.Text =”) with two placeholder characters separated by a space; the second pass replaces the placeholder text with a space.
My macro … same method, but slightly different
Sub pagebreaks()
'
' pagebreaks Macro
'
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^p^p"
.Replacement.Text = "|"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^p"
.Replacement.Text = " "
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "|"
.Replacement.Text = "^p^p"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Because the text I was working with was apparently somewhat different than what Benjamin was working with, my macro did the following:
- Replaced all double paragraph breaks (which signified true separations between paragraphs, not a premature end of a line) with a placeholder character |
- Replaced all single paragraph breaks (left over from the first pass and signifying premature end-of-line) with a space
- Replaced the placeholder character from Step 1 with a double paragraph break to restore the separations between paragraphs
So my “pagebreaks” macro changed text that looked like this:
… into text that looked like this:
(Those paragraph symbols ¶ you see at the end of the lines is the symbol displayed by the Show/Hide feature for the end-of-line/carriage-return code, a.k.a. a “hard return.” To display these codes, click on the button with the paragraph symbol in the middle of the Home tab.)
If you’ve never pasted ready-made macro code into Word for your own use, click here for a good tutorial on that process.
“But what if I don’t want a macro?”
You can replicate these same steps with Microsoft Word’s Find and Replace feature. You’ll just need to make 2-3 passes through the document to make all of the changes.
I’ve always done a find/replace. Find the hard return (under more, special, paragraph character or whatever your version is) and replace with a space. It’s simple and quick.
And if that works with the text you’ve got, that’s great. One of the problems with the text Benjamin had was the incorrect line breaks had one hard return while the “true” paragraph breaks had two. We had to replace the single hard returns without disturbing the double ones. Make sense?
I created the macros per your tutorial link – both of them – and tried using them. Neither one worked or did anything. I’ve never created macros like this before so I tried selecting, not selecting text, enabling macros using the Macro security menu. Still nothing. What am I doing wrong?
@Rich –
Importing macro code like this admittedly takes a little practice. And if your text isn’t formatted quite like the examples above (say, your line breaks are done with soft returns rather than hard returns), then neither of the examples above will work for you. If they’re soft returns, you’ll see a code that looks like the return arrow on your Enter key rather than the paragraph symbol in the examples above.
You might try experimenting with the Find and Replace feature to see you can figure out how (in one pass or several) you can successfully replace the incorrect line breaks without disrupting the legitimate ones. Once you get the sequence just right, you might try recording your own macro for later use (see http://legalofficeguru.com/simple-word-macros/ for some tips).
All this to say that it depends on how the text you’re working with is formatted. Are you viewing your text with Show/Hide turned on? If so, does it look somewhat like the examples above, or is it different (and how so)?
@Guru. thanks. recopied and repasted it and now it works fine 🙂
When I copy and paste from a pdf of a case, there are no line breaks between paragraphs.
It looks like this.
We disagree with [redacted] and affirm the judgments.¶
¶2 In late 2006, fifteen-year-old [redacted] met [redacted], then twenty,¶
on MySpace, an online social networking website. [redacted] knew [redacted] as “RJ”¶
and knew he was twenty. They mainly communicated online. Sometime in¶
December, with the permission of [redacted] father, [redacted] visited [redacted] and¶
her girlfriend at [redacted] house. The girlfriend introduced [redacted] to [redacted]¶
father as an eighteen-year-old who attended the same high school as she and¶
[redacted] did. [redacted] did not dispute his represented age.¶
¶3 On or about December 22, 2006, [redacted] told [redacted] her father¶
denied her request to allow [redacted] to visit because neither parent would be home.¶
After I run the macro, it looks like this.
We disagree with [redacted] and affirm the judgments. ¶2 In late 2006, fifteen-year-old [redacted] met [redacted], then twenty, on MySpace, an online social networking website. [redacted] knew [redacted] as “RJ” and knew he was twenty. They mainly communicated online. Sometime in December, with the permission of [redacted] father, [redacted] visited [redacted] and her girlfriend at [redacted] house. The girlfriend introduced [redacted] to [redacted] father as an eighteen-year-old who attended the same high school as she and [redacted] did. [redacted] did not dispute his represented age. ¶3 On or about December 22, 2006, [redacted] told [redacted] her father denied her request to allow [redacted] to visit because neither parent would be home.
Is there a way I can add a line between the numbered paragraphs?
TIA.
Jeff
Is the “¶2” a literal paragraph symbol followed by a 2, or a line break followed by a 2?
Hello, I’ve just seen your posts above, I keep finding ways to fix the problem once it has already happened. I’m creating PDFs so surely I should be able to fix this problem at source so no one has to faff around afterwards removing line breaks… I have seen posts saying add tags (under Accessibility) but Adobe won’t let me do this (it warns that it can’t for some reason). Any ideas? I’d like to make easy to copy and paste PDFs…
Thanks, Alison
How can i get this macro to work in Microsoft office?
i mean microsoft outlook, not office.
I’ve always done it this very simple way: Select find/replace. Click more, click special, select “paragraph mark” (top of list). Click in replace block and space once. Replace all. Voila!
Hey Guru,
Thanks for this article. In the case where the copied PDF text do NOT have double paragraph breaks to separate each paragraph (no double spacing), what would be the change to the macro you made to preserve the paragraphs while removing line breaks? Is it even possible? Any feedback would be awesome.
KL
Assuming all the line breaks are the same code (hard line break/carriage return), that would be pretty tough to do programmatically.
thank you for the timely response! since i’m working with text that isn’t too long, i’ve just decided to utilize your script by adding in the extra paragraph breaks manually. 🙂
This was very useful and it saved me a lot of time.
I would need the same macro to work in excel. Is that possible?
What needs to be changed?
Thanks
Unfortunately, I’m not really an Excel macro expert. Try posting your question at answers.microsoft.com to see what one of their MVPs can tell you.
Hi, I tried your macro, but it converts all document, not only the selected text. What could be the cause of that?
Thank you in advance,
Marcelo
Well, it’s designed to convert the entire document. If you’ve got some text within a document that needs converting, I suggest copying that over to a blank Word document and running the macro there instead.
Yes, I tried it and it worked perfectly. Thank you.
Anyway, I thought that the “With Selection.Find” statement would make it restricted to the selection and not the the whole text. How could I change the macro to make the replacements only within a block of selected text?
Thank you in advance,
Marcelo
Try the technique found here: http://www.wordbanter.com/showthread.php?t=117367
It worked perfectly. Thank you very much.
Marcelo
I used Adobe Acrobat to convert to Word and then the copy and paste as unformatted text worked.
I have Acrobat Professional XI, so I don’t know if it does this in earlier versions. I found this out by accident, but it works like a charm. Open the PDF containing the annoying returns do a Save As as a Plain Text document (which you can later throw out as you won’t need it). Once it has done converting the file, then do a Save As of the orginal PDF and give it a new name. Open this new PDF, and voila, the only hard returns are at the ends of paragraphs where they should be.
I’ll have to try that in the version I have (9?). That would be awesome if I could do that!