29 June 2008
The information in this post details how to clean up DMDX .zil files, allowing for easy importing into Excel. However, the explanations following each Find/Replace term will benefit anyone looking to understand how to use Notepad++ extended search mode and regular expressions.
If you are specifically looking for multiline regular expressions, look at this post.
You may already know that I am a big fan of Notepad++. Apparently, a lot of other people are interested in Notepad++ too. My introductory post on Notepad++ is the most popular post on my speechblog. I have a feeling that that is about to change.
Since the release of version 4.9, the Notepad++ Find and Replace commands have been updated. There is now a new Extended search mode that allows you to search for tabs(\t), newline(\r\n), and a character by its value (\o, \x, \b, \d, \t, \n, \r and \\). Unfortunately, the Notepad++ documentation is lacking in its description of these new capabilities. I found Anjesh Tuladhar's excellent slides on regular expressions in Notepad++ useful. After six hours of trial and error, I managed to bend
Notepad++: A guide to using regular expressions and
extended search mode
► 2013 (3) ► 2012 (2) ► 2011 (4) ► 2010 (2) ► 2009 (7) ▼ 2008 (11) ► October (1) ► August (3) ► July (2) ▼ June (2)
Notepad++: A guide to using regular expressions an...
Create conference posters: From Powerpoint to high... ► May (1)
Blog Archive
regular expressions in Notepad++ useful. After six hours of trial and error, I managed to bend Notepad++ to my will. And so I decided to post what I think is the most detailed step-by-step guide to Search and Replace in Notepad++, and certainly the most detailed guide to cleaning up DMDX .zil output files on the internet.
What's so good about Extended search mode?
One of the major disadvantages of using regular expressions in Notepad++ was that it did not handle the newline character well—especially in Replace. Now, we can use Extended search mode to make up for this shortcoming. Together, Extended and Regular Expression search modes give you the power to search, replace and reorder your text in ways that were not previously possible in Notepad++.
Search modes in the Find/Replace interface
In the Find (Ctrl+F) and Replace (Ctrl+H) dialogs, the three available search modes are specified in the bottom right corner. To use a search mode, click on the radio button before clicking the Find Next or Replace buttons.
Cleaning up a DMDX .zil file
DMDX allows you to run experiments where the user responds by using the mouse or some other input device. Depending on the number of choices/responses (and of course the kind of task), DMDX will output a .zil file containing the results (instead of the traditional .azk file). This is
► March (1) ► February (1) ► 2007 (19)
Software Testing
Download
smartbear.com/30-Day-Trial
Easy Automated Tool For Both
Novice And Advanced Testers.
Free Trial.
annoyances (4) archive (1) backup (3) customisation (2) display (1) DMDX (2) download (12) dropbox (1) excel (2) experiments (3) figures (1) formatting (3) Topicsspecified in the header along with the various response options available to the participant. For some reason, DMDX outputs the reaction time twice—and on separate lines—in .zil files. Here's a guide for cleaning up these messy .zil files with Notepad++. Explanations of the Notepad++ search terms are provided in bullet points at the end of each step.
Step 1: Backup your original result file (e.g. yourexperiment.zil) and create a copy of that file (yourexperiment_copy.zil) that we will edit and clean up.
Step 2: Open yourexperiment_copy.zil in Notepad++ (version 4.9 or later).
Step 3: Remove all error messages.All lines containing DMDX error messages begin with an exclamation mark. Let's get rid of them.
Bring up the Replace dialog box (Ctrl+H) and select the Regular Expression search mode.
Find what: [!].*
Replace with: (leave this blank)
guides (17) notepad++ (2) pdf (5) praat (3) productivity (12) publishing (2) recording (1) regular expressions (2) roboform (1) scripts (4) security (2) setup (9) software (19) speech (2) stats (1) styles (3) thesis (6) Word (6) writing (2) zotero (4)
Press Replace All. All the error messages are gone.
[!] finds the exclamation character. .* selects the rest of the line. Step 4: Get rid of all these blank lines.
Switch to Extended search mode in the Replace dialog.
Find what: \r\n\r\n
Replace with: (leave this blank)
\r\n is a newline character (in Windows).
\r\n\r\n finds two newline characters (what you get from pressing Enter twice).
Step 5: Put each Item (DMDXspeak for trial) on a new line. Switch to Regular Expression search mode.
Find what: (\+.*)(Item)
Replace with: \1\r\n\2
\+ finds the + character.
.* selects the text after the + up until the word "Item". Item finds the string "Item".
() allow us to access whatever is inside the parentheses. The first set of parentheses may be accessed with \1 and the second set with \2.
\1\r\n\2 will take + and whatever text comes after it, will then add a new line, and place the string "Item" on the new line.
So far so good. Our aim now is to delete duplicate or redundant information (reaction time data).
Step 6: Remove all newline characters using Extended search mode, replacing them with a unique string of text that we will use as a signpost for redundant data later in RegEx. Choose a string of text that does not appear in you .zil file—I have chosen mork.
Find what: \r\n
Replace with: mork
Press Replace All. All the newline characters are gone. Your entire DMDX .zil file is now one very long line of (in my case word-wrapped) text.
Step 7: We're nearly there. Using our mork signpost keyword, let's separate the different RT values. Stay in Extended search mode.
Find what: ,
Replace with: ,mork
Step 8: Let's put the remaining Items on new lines.
Switch to and stay in Regular Expression search mode for the remaining steps.
Find what: mork(Item)
Replace with: \r\n\1
Press Replace All. All "Item"s should now be on new lines.
Step 9: Let's get rid of those duplicate RTs. Find what: mork ([^A-Za-z]*)mork [^A-Za-z]*\,mork
Replace with: \1,
A-Z finds all letters of the alphabet in upper case. a-z finds all lower case letters.
A-Za-z will find all alphabetic characters.
[^...] is the inverse. So, if we put these three together: [^A-Za-z] finds any character except an alphabetic character.
Notice that only one of the [^A-Za-z] is in parentheses (). This is recalled by \1 in the Replace with field. The characters outside of the parentheses are discarded.
Step 10: Let's get rid of all those morks. Find what: mork
Replace with: (leave blank)
Press Replace All. The morks are gone.
Step 11: Separate each participant's data from the next. Find what: (\**\*)
Replace with: \r\n\r\n\1\r\n\r\n
Press Replace All. The final product is a beautiful, comma-delimited .zil result file that is ready to be imported into Excel for further analysis.
Please post your questions in the comments below, rather than emailing me. This way, others can refer to my answers here, saving me many hours of responding to similar emails over and over. Update 20/2/2009: Having trouble understanding regexp? I have created a new Guide for regular expressions. Check it out.
Posted by Mark Antoniou at 11:28 AM
Labels: DMDX, experiments, guides, notepad++, productivity
+36 Recommend this on Google
1 – 200 of 398 Newer› Newest»
398 comments:
James said...
Hi, can those steps be automated in notepad++ ? like actions in photoshop? July 20, 2008 at 11:13 PM
Mark said...
James, that is the million dollar question. I immediately tried to automate this somehow but could not get Notepad++ to save these steps in a macro. If I find a solution, I will post it.
July 21, 2008 at 7:00 PM ninj said...
Nice article!
However, the reason why I arrived on your blog still remains unanswered: How to replace a multiple line regexp by a simple value (in my case: nothing). Here is the case:
In Symfony YAML generated files, I have the created_at and updated_at fields dumped, which I don't want.
I need to replace something like this: / *created_at:.*\n *updated_at:.*\n/ by
//
Of course I know it is possible to do it in two or three steps, but I'd like to find how to achieve it in one only, I'm a regexp maniac ;)
Maybe you or someone else own a solution... i couldn't manage to get one neither through CTRL-H nor through CTRL-R dialogs.
Thanks!
July 31, 2008 at 1:44 AM Mark said...
ninj, currently you cannot do this in Notepad++. This is because replacing newlines is possible in Extended search mode, and regular expressions are available in Regexp search mode. You are trying to combine the two search modes, and in the current version of Notepad++ you cannot.
Since I wrote this post, I too have caught regexp mania. If you are serious about using regular expressions for more advanced search and replace (as you are) then you need to use a more powerful text editor. I recommend XEmacs—I've been using it for about a month, and it is very powerful. I'm working on a post for XEmacs right now.
As for your specific problem, it is possible to get rid of the created_at and updated_at information. I would need to see the text file (feel free to send a sample to me as an email attachment). I have made a few assumptions: 1. that created_at and updated_at always occur on consecutive lines, 2. that there is information above and below these lines that is useful. The XEmacs regular expression would be this:
Search for: \(.*\) newline .*created_at:.* newline .*updated_at:.* newline \(.*\) Replace with: \1 newline \2
Note: In XEmacs, the newline character is created by pressing Ctrl+Q Ctrl+J. July 31, 2008 at 10:44 AM
Anonymous said...
Quick bleg. I would like to replace all occurrences of number+comma with number + TAB. So 12.8, 100 would become 12.8 TAB 100.
I'm using "\d," for the [Find What] value and "\1\t" for the [Replace With] value. Unfortunately I lose that last digit in the number that I'm replacing.
Any help would be appreciated. August 1, 2008 at 5:27 AM Anonymous said... Ok, I actually figured it out.
The [Find What] value should be "(\d)," and the [Replace With] value
should be "\1\t". In other words I just needed the parentheses around "\d" criteria. Thanks for the useful article Mark.
August 1, 2008 at 5:51 AM Flick said...
Thank you for the guide! I have to admit it's a little advanced for me, and I've only just found out about REGEX expressions, but am still very excited nonetheless!
I'm alittle confused by what to do in my situation. I have a mySQL file that I'd like to run, and the first part of each line is something like this:
INSERT INTO my_table (id,uid,my_msg,my_date,the_ip) VALUES ('2', I would very much like to be able to change the '2' part to just NULL and REGEX seems to be the way forward. However, I think I'd have to use ( as a unique identifier, and given that REGEx uses brackets as the separators, I'm now a little stuck.
Apologies in advance for this simple question, but my brain is really not working today. Thanks!
p/s: I'll continue looking into it in the meantime. August 11, 2008 at 2:06 AM
Flick said...
Just a quick update: I've been able to use Column Mode select (Alt+mouse) to select the column and replace the NULL, since thankfully everything is in the same column! I wonder if it is still possible in Regex though?
Thanks :)
August 11, 2008 at 2:20 AM Mark said...
Hi Flick, thanks for your comments. I do have a regex solution for you that is very easy and quick. Note that this regex syntax is specific to Notepad++.
First, let me answer your question re: the curved bracket (or parenthesis) character: in order to search for and find the open parenthesis character, place the parenthesis within square brackets like this: [(]
However, you do not need to use the parentheses or square brackets at all to achieve what you want to (if I have understood you correctly).
Search for: '.*', Replace with: NULL
If you do not want to get rid of the comma, then delete it from the search term. If this then stuffs up your search and finds incorrect portions of text, you could insert a comma after null in the replace with expression: NULL,
August 11, 2008 at 3:38 PM Anonymous said...
Mark,
Do you have some advice for the following. I have a set of text lines... and I want to delete duplicate lines. But the redundant information will occur only at the beginning of the line, the end of those lines differ in their information. I'm just starting to use notepad++ RegExp utilities, but I'm no whiz yet with the format.
Thanks
October 7, 2008 at 2:00 AM Mark said...
example, and I'll show you the correct regexp. October 7, 2008 at 3:56 AM
Anonymous said...
ok... I've made the text file simpler so that the duplicates I want to delete all have the same information.
[19-766]
???^Los Angeles^60-638^LOS ANGELES CITY USE ONE STEP 1940 LARGE CITY ED FINDER
[19-767]
???^Los Angeles^60-638^LOS ANGELES CITY USE ONE STEP 1940 LARGE CITY ED FINDER
[19-773]
???^Los Angeles^60-638^LOS ANGELES CITY USE ONE STEP 1940 LARGE CITY ED FINDER
[19-1581]
???^Los Angeles^60-638^LOS ANGELES CITY USE ONE STEP 1940 LARGE CITY ED FINDER
the phrases in brackets, on separate lines, are ignored by the final use of the text file. They can remain, but I do want to delete the duplicates of the ??? lines. I'll have other cities with similar format.
thanks
October 7, 2008 at 8:55 AM Anonymous said...
... this group of lines is followed, for example, by: [19-773]
???^Los Angeles^60-639^LOS ANGELES CITY USE ONE STEP 1940 LARGE CITY ED FINDER
[19-1580]
???^Los Angeles^60-639^LOS ANGELES CITY USE ONE STEP 1940 LARGE CITY ED FINDER
county name between the 1st and 2nd ^ October 7, 2008 at 8:59 AM
Mark said...
Ok, I understand the problem. Can you provide me with what you would like the output to look like after applying the regexp.
For e.g., should it look like this: Los Angeles 60-638
Los Angeles 60-639
Is this the only useful info? Should everything else be deleted?
Also, are the number of repetitions (lines of redundant info) the same for each city/number?
October 7, 2008 at 11:13 PM Anonymous said...
Mark,
As further background, you are looking at the content of the 1930 census districts laundered into the 1940 census districts. I have transcribed a cross table between 1930 and 1940, and we seeded the 1940 EDs with the 1930 information. Those 1930 ED numbers are in brackets, and point to the next text line (where that information came from). Since census districts change boundaries between federal censuses, especially in large cities, you will see multiple 1940 entries from different 1930 EDs that are partially contained within the 1940 ED. I don't think there would be any more than 10 such contribution EDs. For rural areas the data from 1930 to 1940 is accurate, for urban areas we have transcribed street indexes for over 200 large cities, thus instead of repeating their 1940 ED streets (I have scanned 28 rolls of 1940 ED descriptions), I just direct them to the other utility. For smaller areas of 25,000 or more, I intend to get street indexes for them, and have replaced their descriptions with "TO BE DONE BY
BOUNDARY OR STREET INDEX".
When there are multiple ED entries for a single 1940 ED # (which is a two part number), they will occur together as a block with no blank line between the various lines. If a 1940 ED has only a single 1930 entry, it should have a blank line above the brackets, and one below the text line.
I fooled with TextFX but it moves the brackets from the text lines, doesn't show a numerical sort of numbers (thus one sees 2, 20, 21, ...) and for some didn't get me to a unique line.
I need the entire line. So for the first example I want: [19-766]
???^Los Angeles^60-638^LOS ANGELES CITY USE ONE STEP 1940 LARGE CITY ED FINDER
[19-767] [19-773] [19-1581]
but I'm willing to give up the brackets lines, but I do want a blank line between the statements.
I've done 2 states, and with California decided to do some more automation. To see Alabama and Arkansas... go to http://www.stevemorse.org/ed/ed.php
and choose 1940 and one of those two states.
Thanks... I'll ask Steve Morse to acknowledge you on the One Step site if you can pull this off. Joel Weintraub Dana Point, CA October 8, 2008 at 10:47 AM Anonymous said... Mark,
Steve Morse wrote a utility to do what I want.
But it was interesting to see if RegEx could do the same thing. So... thanks for your help... don't do any more.
Thanks Joel Weintraub
October 14, 2008 at 2:32 PM Mark said...
Glad to hear that your problem got solved. Apologies for not responding as quickly as I usually would, but you caught me at a bad time (wedding and honeymoon). My wife doesn't let me post about regular expressions while on honeymoon!
Basically, the problems with your data are twofold:
a) There is no unique identifier in the first occurrence of a 'new' number; and b) The number of repetitions varies.
You cannot use regexp to compare two strings of text and decide if a change has occurred (i.e. a new number/city, whatever). In summary, getting a parser/utility written was a smart move.
I am writing up a guide about how to use regular expressions, going from basics to more advanced stuff. Stay tuned.
October 28, 2008 at 3:22 AM Anonymous said...
Regular Expressions - User guide
http://www.zytrax.com/tech/web/regex.htm November 17, 2008 at 2:27 AM
liz said...
thanks, helped me out a bunch :) November 25, 2008 at 4:36 AM fresh332 said...
I have an output file from a program which contains "\n" characters instead of line breaks, e.g.: "Text\nNew line\nAnother line"
Similar to your "mork" solution I do a consecutive replace, first in "normal mode" replacing "\n" characters with something unique like "ZZZ", then in "extended mode" replacing "ZZZ" with "\n" so I finally have the line breaks.
There should be a way to do this in one step, or to automate the two steps, either in notepad++ or with some other tool - has anyone got an idea?
December 3, 2008 at 9:24 PM Mark said...
fresh332, yes there is a way to do this in one step; and no, you cannot do this with Notepad++.
I now use a very powerful text editor called XEmacs. It really leaves Notepad++ for dead when it comes to regexp. It's so good that I'm working on a more detailed guide to regexp using XEmacs right now.
FYI: in XEmacs, you specify a newline character by first pressing Ctrl+Q and then Ctrll+J. This creates a newline character that takes care of \n and other "newline" characters.
December 4, 2008 at 8:31 AM Dave Bui said...
Brilliant! I love the replacement double blank lines to a single blank lines. December 4, 2008 at 10:22 PM
David Leigh said...
I didn't see a mention of Notepad++'s other Find/Replace facility: The TextFX plugin. I did not look to see if any of the "unsolvable" problems would be solved by TextFX, but in the case that they might be, it's worth looking at the TextFX Find/Replace facility (CTRL+R or via the menus) because of the way it can handle newlines and tabs. That being said, connecting Find/Replace (any flavor) with the macro recording facility of Notepad++ would elevate this software to "perfect" in my eyes...it's the one thing remaining that really aggrevates me on a semi-regular basis. Other than that, I LOVE this editor.
January 3, 2009 at 12:29 AM Anonymous said...
Jay Fulton said...
Thank you VERY much. The documentation helps you, of course, but it saves time for the rest of us, too! Much appreciated
January 31, 2009 at 3:44 AM Ninad said...
Hi,
Can anyone tell me how to use regexp and convert upper case to lower case? February 3, 2009 at 9:32 AM
Mark said...
Good question, Ninad. I'm not sure if regexp can change upper to lower case for you. And I'm not sure how complicated your text file is. However, if you simply want to change text to lower case or vice versa, you can do this without using regexp. In Notepad++, select the text that you would like to change, then click on the TextFX menu, then TextFX Characters, and then select lower case.
Easy, huh?
February 3, 2009 at 10:00 AM Anonymous said...
Hi.
Is it possible to search and replace the following in notepad++? /*
... ... */
I can do it it it's all on one line, e.g. /* ... */
But I can't seem to find the regex command to select across multiple lines. Is this because n++ regex can't handle line returns?
- J
February 14, 2009 at 9:49 AM Mark said...
That's exactly right. The problem is the line returns (or newlines). This is quite problematic isn't it?
If you would like to be able to do these types of regular expressions then you should use a more powerful text editor. I use XEmacs.
I've been working on a very comprehensive "Guide to regexp using XEmacs" post for a while now. Hopefully I will publish it in the next month or two.
February 14, 2009 at 7:04 PM Jolas Arvin said...
April 16, 2009 at 8:20 PM Jolas Arvin said...
i just search the net for multiline regex replacements and i bumped into this post. im experiencing same problems on n++. poor thing n++ can handle multiline regex. :( oh well im looking forward to see the XEmacs guide to regex. hope multiline regex
replacement will be included in it. tnx.
i'm somewhat into coding that feature in java to fully customize regex commands into my needs (specially the multiline replacements). :) if anyone did that, please share. many thanks.. :)
April 16, 2009 at 8:22 PM Anonymous said...
Can I do a logical-OR regular expression search in Notepad++? In TextPad I used "^Alert|^Error|^Warning" to find all lines in a system log that started with either of the three words. The "|" operator does not seem to work in Notepad++.
Of course, I could do three separate searches, but it would be nice if NotePadd++ did this for me by interpreting an OR operator, e.g. "|".
Mark said...
No, Notepad++ cannot perform logical OR regexp searches. That was an easy question :)
However, the excellent and free XEmacs can handle your search without any problems. Note that your Textpad OR operator | would become \| in XEmacs, i.e.,
^Alert\|^Error\|^Warning April 30, 2009 at 6:40 PM Anonymous said...
I have a text file full of blocks of text like this: "STRING1" =>
{ url => "URL1",
visibleif => sub { !$is_temporarily_terminated && padlock("STRING2");
}, },
... more blocks like the above separated by a blank row.
End state: I need an excel file with 3 columns: string1, url1, and string2
Any ideas? I am completely new to regex and using notepad++ for now. If someone who is really good at this replies quickly, then there also could be some work that we could pay them to do in the future as we get a lot of projects like this.
May 15, 2009 at 5:27 AM Mark Antoniou said...
That's pretty easy to fix. I wouldn't use Notepad++ for this. Instead use the excellent and free XEmacs.
In XEmacs, the correct regex search term would be (newline character at end of each line is made by Ctrl+Q, Ctrl+J): "\(.*\)".* .*"\(.*\)".* .* .*"\(.*\)".* .* .*
and the correct replace term would be: \1,\2,\3
This would create the following output: STRING1,URL1,STRING2
which you could then open in Excel as a comma delimited file, which would place each string/url in a separate column.
May 15, 2009 at 1:13 PM Abhishek said...
Mark,
One question. The contents of file are following. ABC
XYZ 123
I want to the file contents to be following. 'ABC','XYZ','123'
Thanks, Abhishek
May 25, 2009 at 6:45 PM Mark Antoniou said...
Hey Abhishek. This is an easy task. I would advise that you use XEmacs rather than Notepad++. The reason for this is that Notepad++ does not deal well with newlines. In XEmacs, you would search for:
\(.*\) \(.*\)
and replace this with: \1,\2
Done :)
Abhishek said...
Thanks Mark. But, I work on client network where we cannot install XEmacs. We have only notepad ++ installed. Any other thoughts please?
ABC 123 XYZ
Need to chnage into 'ABC','123','XYZ' May 27, 2009 at 12:15 AM
Mark Antoniou said...
Ok, well there is a way around it, so long as your data is exactly as you have specified here, i.e.:
ABC XYZ 123
So, in order to get to this: ABC,XYZ,123
All you need to do is replace the newline character with a comma.
If that is the case, you would use extended search mode and search for: \r\n and replace this with: ,
That should do the trick. May 27, 2009 at 10:22 AM Vladimir said...
Hi Mark,
Found your blog and hoping you can help me. I have a batch file that I receive daily. I need some help trying to modify it.
I need to insert a page break before it says PAGENO throughout the whole document. I tried to do Find and Replace with PAGENO & \fPAGENO, but it didn't work. It puts FF in black box in front of PAGENO, but doesn't create a page break when I print. What did I do incorrectly and is this the way to do a page break with regexp?
Also, is there a way to automate this process with Notepad++ or any other app? Thank you very much for your help!
August 5, 2009 at 7:21 AM Mark Antoniou said... Hey Vladimir,
This was a tough one! Let me begin by saying that I have an answer for you... kind of. First of all, as far as I am aware, you cannot have page breaks in a text document. Ok, now that we've got that out of the way, what are we going to do to help you? I would say that inserting a page beak requires a rich text editor. So, Notepad++ is not going to cut it.
I have achieved what you requested in one easy step using Microsoft Word. Open your file in Word and select Replace (Ctrl+H), and enter the following search term:
Find what: \fPAGENO Replace with: ^12 and then hit Replace All.
All of the \fPAGENO are now page breaks. Easy.
If you wish to remove PAGE from the top of each page, you could replace it with nothing. Be sure to match the case when searching so that you do not remove any legitimate occurrences of the word "page" that are in the content of your file (if there are any).
As for automating this, it can be done (although I am not hugely experienced in task automation in Word). Take a look at this URL:
http://www.microsoft.com/technet/scriptcenter/resources/qanda/jul07/hey0710.mspx Good luck. Let me know how it works out.
August 5, 2009 at 11:21 AM Puiufly said...
Don't waste time. move perl.
August 5, 2009 at 10:28 PM Mark Antoniou said...
...or you could learn to use Perl, as suggested :) This is going a bit beyond regexp though! August 5, 2009 at 11:21 PM
jp said... Hi Mark,
I must say its a very useful post.
However i would be very grateful to u if u can solve one of my problems in notepad++. Input: { "arc_on_sf::set_end(...)" } 25848 0.041144 0.000002 0.1 { "pt_on_sf::evaluate" } 24408 0.032451 0.000001 0.0 { "pt_on_cv::evaluate" }
Output: when i place the cursor on any of the open braces and press ctrl-B in a LISP file(got by using alt-l-l enter) i can see the open bracket n the closed bracket
highlighted. Now i need a command to delete the text inbetween teh brackets.
for ex: In the above input if I select { "pt_on_cv::evaluate" } then it should get deleted upon using a shortcut.
so the final output will be Output:
{ "arc_on_sf::set_end(...)" }
25848 0.041144 0.000002 0.1 { "pt_on_sf::evaluate" }
24408 0.032451 0.000001 0.0 April 8, 2010 at 9:03 PM Mark Antoniou said... Thanks for your question jp.
Some more information would be helpful. As your search involves multiple lines, I would strongly recommend using a more powerful text editor than Notepad++. I use XEmacs on Windows and Aquamacs on OSX. The solutions below will work in any text editor that supports multiline regular expressions (not Notepad++).
If you simply want to remove all instances of curly brackets, and everything that is in between them, you would search for:
{.* }
Note that in Emacs, the way to insert a newline into your search query is to press Ctrl+Q then Ctrl+J. In the above example, you would insert the newline after the asterisk * and before the close curly braces }
and replace this with nothing.
However, I am assuming that you want to keep some of the information in the curly brackets. From your question, I cannot tell if it is every second instance, or curly brackets that contain "cv". Some more information would allow me to give you a more tailored answer. For the time being, I will assume that you want to remove curly brackets containing "cv", but want to leave those containing "sf" (or anything else) unaffected. To accomplish this, you would search for:
{.*cv.* }
and replace this with nothing. April 9, 2010 at 10:09 AM sourabh bora said... May 7, 2010 at 12:39 AM sourabh bora said...
Hey Mark,
Awesome blog. I could not make {n} (repeats the previous item n times work Specifically I am looking at deleting a string 10 numbers
Thanks
May 7, 2010 at 12:39 AM Mark Antoniou said... Thanks sourabh bora.
Could you copy and paste a sample from your file so that I can have a look at what patterns might work?
May 7, 2010 at 10:08 AM Christopher said...
Wow, this guide is very helpful and makes debugging code or even reformatting jumbled scan text from books a snap to clear up.
Always used Notepass++ and these search and replace tips really makes things so much easier and faster.
May 11, 2010 at 9:42 PM sourabh bora said... Thanks for your reply. Here is an example:
Post123456 This is a nice post Post12345678 This is not a nice post Post324567 This is another nice post
I want to delete the "nice" posts (Post--Followed by exactly 6 numbers, ) Thanks
May 12, 2010 at 11:22 PM Mark Antoniou said...
This is actually a lot easier than I thought. If the text preceding the 6 numbers is always the same, then you have an easy way of uniquely identifying the "nice" posts.
nice post Post... Replace with: nothing
This will get rid of the words "nice post Post" and the six characters directly after. May 13, 2010 at 9:26 AM
sourabh bora said...
Thanks. Unfortunately, no text in the passage is same. The only pattern is
"Post" followed by 6 and exactly six random digits. There can be "Post" followed by 8 or 9 random digits, but they are of no interest to us.
Example
If you are working on something Post123456 cool, let #delete this Post123456789 him know.#dont delete Post234567 They select a #delete Post1 forum member#dont delete Post23 each month for a#dont delete
grant of up to $100 in hardware or software or other products. (Products do not have to be available on the mp3Car Store.)
May 13, 2010 at 12:18 PM Mark Antoniou said...
Ok, so I didn't understand your previous message properly, then. It still looks to me that there is a pattern there though.
Search for: Post...
Replace with: nothing
The problem is that if you search for "Post..." it will replace longer strings too, such as "Post12345678" will become "78", and this is not good. So, in order to make it unique, you might include a space after the final period in your search expression. I will put the search term in quotes to illustrate that there is a space on the end. Do not use the quotes in your text editor
This search term will leave longer strings of numbers unaffected. May 13, 2010 at 12:34 PM
Mark Antoniou said...
Here is the output from your sample of text above: If you are working on something
cool, let #delete this
Post123456789 him know.#dont delete They select a #delete
Post1 forum member#dont delete Post23 each month for a#dont delete
grant of up to $100 in hardware or software or other products. (Products do not have to be available on the mp3Car Store.)
May 13, 2010 at 12:36 PM sourabh bora said... May 13, 2010 at 12:38 PM sourabh bora said...
Thanks. This is exactly what I did.
However, regexp has a more elegant solution. You can specify exactly how many characters you are searching for.
What if the number of digits was 60 instead of 6? you can write +{60} instead of typing 60 dots.
I was wondering if notepad has this feature implemented.
And also, we need to search only for digits.. so we will have to type [0-9] sixty times. (otherwise, posting123 will be selected)
May 13, 2010 at 12:41 PM marius said...
Hy i am new to regular expression
and i don't quite get it. As i do not wont to make a program to replace what i got here, i would like you to help me.
My file is AAABBBCCC etc with all sort of characters from ascii table
the problem is that i whant the text ( code ) to be ABC and search for all hex ascii code not just numbers or letters.
Thanx a lot
June 13, 2010 at 9:44 PM Mark Antoniou said...
Thanks for your question Marius. I'm just not exactly clear on what you want to do. To help me, could you provide me with a sample of what your text looks like (a few lines), and then provide me with what you want those lines to look like after you run the regular expression.
June 14, 2010 at 7:05 PM marius said...
well my text looks like aaafffcccddd777gggzzziiippp¶¶¶▬▬▬---000▄▄▄ and i would like all the triplets to be replaced with only one character.
As you can see it is not only a to z and A to Z there are all type of characters with code between 0 and 255 ( Ascii code )
June 14, 2010 at 9:51 PM Mark Antoniou said...
Ok. If that is all that your file contains, then you could simply search for: ..(.)
and replace with: \1
Easy.
Note, I don't use Notepad++ any more, since I have moved on to Emacs. In Emacs the search term would be:
..\(.\)
but the concept is exactly the same: Discard the first two occurrences and keep the third.
June 14, 2010 at 10:23 PM marius said...
thank you a lot
June 14, 2010 at 10:47 PM Mark N said...
I am trying to do 2 things:
1. Find lines with MORE than 95 characters (including white space) and
2. 1. Find lines with LESS than 95 characters (including white space)
I can do perl regular expressions, but they just don't work for notepad++ for some reason. Can you please help?.
June 24, 2010 at 6:21 AM Afzaal Ameer said...
Hey man as per your wish i have shifted to Xemacs now can you please explain the regex to remove multiline comments
June 27, 2010 at 3:01 PM Mark Antoniou said... Hey Afzaal,
It's very easy with Emacs. You get the newline character by pressing Ctrl+Q Ctrl+J. For example, if you had two lines and wanted to remove the line break you would Search for: Ctrl+Q Ctrl+J
Replace with: nothing/leave blank June 27, 2010 at 9:29 PM
Mark Antoniou said... Mark N,
I'm not ignoring you. I've had a bit of trouble getting the regular expression to work in Notepad++. It definitely can be done as a regular expression though.
Must you use Notepad++? June 27, 2010 at 9:31 PM Mark N said...
Well I preffer that it be done in notepad++... besides I don't want to write a script that does this.
July 13, 2010 at 5:36 AM Garioch said...
hi, i have a somewhat similar problem ... i have a sql export-file
i want to "edit" the lines automatically .. coz its almost 6000 of them each Insert-Line starts with
(id, another_id, third_id, NULL, ...
here i want to "delete" the 3rd id - while leaving all other things i tried with several search patterns - but to no luck ..
July 14, 2010 at 4:36 PM Garioch said...
to be more precise all id , 2nd ID and 3rd ID ar actual numbers July 14, 2010 at 4:42 PM
Mark Antoniou said...
Garioch, if you want me to give you the exact answer, oats a few lines of code into a comment. But, the general principle is this:
Group the ids that you want to keep as \1,\2 and don't insert I'd 3 into the replace term. Make sense?
July 14, 2010 at 4:43 PM Garioch said...
4 of the lines of those 6000
(1, 1, 1, NULL, 'delayed billing', '2007-02-16', 0, 17 more fields), (2, 1, 2, NULL, 'delayed billing', '2007-02-16', 0, 17 more fields), (3, 1, 3, NULL, 'delayed billing', '2007-03-01', 0, 17 more fields), (4, 1, 4, NULL, 'delayed billing', '2007-03-01', 0, 17 more fields),
since my question only concerns the start of each line i omitted some info at the end ... but this should give a picture of the data i want to Replace
until now i was able with some info from other web-pages to find the start of a line with a regex like
[(][0-9]*[, ][0-9]*[, ]
this marks exactly (1, 1, from the first insert-line
so how do i "mark" this as pattern 1 and how do i progress from there July 14, 2010 at 4:59 PM
Mark Antoniou said...
Sometimes, the best solution is not to get too fancy. How about if we group everything from the start that you want to keep into \1.
Then we group: Id3, NULL.
ThEn we group everything from there to the end of the line .* as \2.
So, your replace term would be: \1NULL\2 That would work.
July 14, 2010 at 5:14 PM Garioch said...
thanks mark
Find what :", [0-9]*, NULL, " Replace with : ", NULL, " then a quick "Replace All"
but again thanks for you advice (from previous answers) July 14, 2010 at 5:20 PM
user said...
Hi Mark is it possible to make something like this, im not a programmer so ill try to explain it easy
find any content between two specific custom tags and replace it with the same tags and a new content between them like
find [customtag]*[customtag]
replace [customtag] This is new content replacing whatever was between custom tags.[customtag]
im using * like a wildcard to explain that should select every single character between tags
and more specific what i want is find *
replace some html marked text like \\Let change some hmtl paragraphs\\
(ive put slashes mixed with html tags because blogger does not allow me to post those tags)
ive read you cannot use regular with multiline so i ask myself if this is possible in notepad++ in some extent and in multiple opened files simultaneously, preferable as i do all my work with this program, and only xemacs as a last option, or alternative if you want to show next to notepad++ that it is easier to accomplish this in xemacs. But i ask myself if xemacs is not for non programmer ppl like (i know html css and more or less can read php and python with a very rough idea of whats going on, sometimes)
thanks again for this super post the best in internet explaining regular expressions for notepad++ and introducing xemacs for the same.
July 15, 2010 at 10:22 PM user said...
(blogger screwed my poorly scaped html tags ill try again with parenthesis) and more specific what i want is
find (<)!--tag1--(>)*(<)!--tag1--(>)
replace (<)!--tag1--(>)some html marked text like (<)div\(>)(<)p\(>)Let change some hmtl paragraphs(<)/p(>)(<)/div(>)(<)!--tag1--(>)
July 15, 2010 at 10:27 PM teddan00 said...
if have a filename i.e. a song called "Born To Run-E Street Band-Bruce Springsteen.mp3"
I try to make "E Street Band-Bruce Springsteen" switch place with "Born To Run". Find: (.*)-(.*)\.
Replace: \2-\1.
But I get the following filename: "Bruce Springsteen-Born To Run-E Street Band.mp3" It seems that the last occurrence of "-" is found. is it possible to find the first
occurrence, AND still make it compatible with filenames that only have one "-" in it's filename.
July 16, 2010 at 1:54 AM TechnologyYogi said...
I used NP++'s regular expressions for find and replace for the first time - successfully, before this I depended on MS SQL Server's Management studio for this, as it has very cool easy to use find/replace features (using regular expressions).
Thanks for the post! July 17, 2010 at 1:00 AM
Mark Antoniou said...
First of all, apologies for taking so long to respond. I was on holidays overseas and only recently arrived back in Sydney.
teddan00, I will answer your question first because it is an easy one. If the character "-" is giving you trouble, simply change it to something else via a simple Find+Replace. For instance,
Search for: -Replace with: mork
Now, run a regular expression like this Search for: (.*)mork(.*)mork(.*).mp3 Replace with: \2-\3-\1.mp3
For songs with only one "-", Search for: (.*)mork(.*).mp3 Replace with: \2-\1.mp3 Easy.
August 9, 2010 at 11:07 PM Mark Antoniou said... @user
Thanks for your question. The short answer is "yes", that is exactly what regexp is for. I couldn't understand your second post, so I will do my best to answer your first post. Let's say that you had two custom tags and wanted to replace the text between them. Find: ([customtag1]).*([customtag2])
Replace: \1Type replacement text here\2
The \1 and \2 will re-insert custom tags 1 and 2, respectively back into the text file. Hope I understood and answered your question.
August 9, 2010 at 11:13 PM Der Bloggende Nomade said...
from now on it´s possible (5.7.1) to record search and replace events within a macro. September 7, 2010 at 7:38 PM
Tiberius Gracchus said...
There's a very simple workaround for searching multiple lines. Replace \r\n with something that is never present naturally. I like the ANSI character 167, but Notepad doesn't have a facility for inserting ANSI characters easily.
Anyway then you run your search specifying the character or string as your endline equivalent, go to town and replace the puppies with \r\n.
October 8, 2010 at 8:18 AM Mark Antoniou said...
Clever workaround. I like it. However, this doesn't address the main reason that forced me to move from Notepad++ to Emacs:
By using a more powerful text editor, workarounds are not required. New line characters can be searched for and/or replaced at will. This simplifies the search and replace expressions and saves me time.
October 8, 2010 at 9:47 AM Luc said...
Thank you for the guide!I'm a little confused by what to do in my situation. I have a file with such a structur:
BEGIN:VCARD VERSION:2.1 N:Doe;John;;; FN:John Doe TEL;CELL;PREF:+41800800800 EMAIL;PREF;WORK:[email protected] ORG:Test END:VCARD
I want the "FN:" section to be changed in that way: FN: Doe, John (and no more FN: John Doe). Is that possible?
Mark Antoniou said...
Thanks for your question, Luc. Here's the Notepad++ solution: Search for: (FN:)(.*) (.*)
Replace with: \1 \3, \2
Note that this expression assumes that people only have two names. November 9, 2010 at 9:13 PM
Edward said... Hi,
Is there a way for notepad++ to do an "or" operation? SOomething like: find A or B or C
I would especially like this for when I do a find of all in current document. Thanks, Ed
January 8, 2011 at 9:21 AM Pushkar said...
Hi Mark,
Thanks for the wonderful article, but I still couldn't resolve one of my problems. Could you please tell me how to replace "@#$%" with <@#$%>. Thank you. :) keep up the good work
January 30, 2011 at 6:40 AM Shamik said...
Awesome post...kudos for the great work February 2, 2011 at 6:37 AM
Mark Antoniou said...
trans-continental move. Now, to your questions: @Edward: To my knowledge, no.
@Pushkar: Do you literally mean replacing @#$% with <@#$%>? This can be achieved using a simple Find + Replace:
Find: @#$%
Replace with: <@#$%>
If you are talking about some sort of larger-scale find and replace based on some criterion, you need to give me more information, and preferably a snippet of text showing what the text looks like before and what you would like it to look like after. @Shamik: Glad you liked it :)
February 5, 2011 at 7:35 AM Martin said...
to answer the question, "is there anything it can't do"
well look ahead and look behind in regexp fails, and newlines (pretty much anything supported in extended) isn't supported in regexp.
and in case any one is wondering, yes vim supports this just fine.
but I'm still in love with notepad++ because it's just so much more simple to use, but learning vim is still well worth the effort (in my 1st week now and starting to get some real work done with it xD)
but who knows, maybe these issues will get addressed in the next version of notepad++
anyway nice article it did help a little even for an issue that couldn't be fixed in notepad++ xD
February 10, 2011 at 12:30 AM e22 said...
If you want to use Notepad++ to do regex over multiple lines simply start off by replacing \r\n with something like !NEWLINE! using the extended settings then do the reverse when finished!
Mark Antoniou said...
Yes, e22, that is what I did in the original post above, though I used a nonsense word "mork" rather than !NEWLINE!
Still though, it is quite unacceptable to me that three steps are required rather than one. And once you start using very complex regular expressions in text files that are
hundreds of thousands of lines long, it becomes very tedious to have to worry about whether you missed any of your newly inserted !NEWLINE!s, or if any subsequent expressions modified something in your nonsense word (e.g., if I then got rid of all exclamation marks, it would be hard to go back). My point is that regular expressions are meant to save you time...
February 24, 2011 at 2:03 AM Shikhar Kumar said... nice article, got my work done. March 12, 2011 at 6:37 PM Nico said...
Hello, nice guide.
I have a (newbie) question: I have the following text: Minradio#23-567
The result that I want is: 23567
What should be my regexp? Thanks
March 24, 2011 at 8:45 AM Mark Antoniou said...
This is quite a straightforward example, Nico. Haven't had one of these in a while ;) So we start off with this:
In Notepad++ regular expression search mode, Search for: .*#(.*)-(.*)
Replace with: \1\2
What you end up with is this: 23567
It might seem a little tricky, but the concept is simple: What information do you want to keep? And how does the other unimportant information border it? In the regexp above, I used the hash (#) and hyphen (-) as anchors. This means that:
a) the text before the hash is free to vary
b) the number of digits between the hash and hyphen are free to vary c) the number of digits after the hyphen are free to vary.
The limitation is that if some of your lines of text do not contain # or - then it will break my regexp.
March 24, 2011 at 9:01 AM Nico said...
Hey Mark, thanks for your help. Almost worked!!
The result that I've got is 16-103
The "-" was not removed. Any clue?
March 24, 2011 at 9:15 AM Mark Antoniou said...
Make sure that the hyphen is not enclosed within the parentheses. March 24, 2011 at 9:19 AM
Nico said...
That's my string:
What is the "\1\2" that you said to use as replacement? The "-" never goes away :-/
March 24, 2011 at 9:35 AM Mark Antoniou said...
Ok, let's back up a bit. Your original text is this: Minradio#23-567
You want to keep the numbers, and get rid of whatever is before the numbers as well as the hyphen. So, in Notepad++ regular expression search mode,
Search for: .*#(.*)-(.*)
Let me break down this search term. The first three characters .*# will search for anything until a hash # is found (Minradio# in the above example). We don't put
parentheses around this because we don't want to use it in our Replace term; we simply discard it. The next five characters (.*)- will search for anything until a hyphen - is found. The parentheses around the period and asterisk mean that that text (which is in this instance the text immediately after the hash #, that is, the number 23) can be recalled in our Replace term. The way to recall the contents of this first set of parentheses is by typing \1. The hyphen is not enclosed within the parentheses and therefore cannot be recalled in the Replace term; it is simply discarded. Finally, the last four characters (.*) select the remaining text (in this example 567) and the parentheses mean that it can be recalled in the Replace term, this time by \2, because it is the second set of parentheses. So, the Replace term looks like this:
Replace with: \1\2
What you end up with is this: 23567
So, why are you ending up with 23-567? There are a few possiblities: 1. The original text had two hyphens:
If that is the case change your search term to this: .*#(.*)--(.*)
2. You are including the hyphen within one of the sets of parentheses: .*#(.*-)(.*)
or .*#(.*)(-.*)
The hyphen therefore will not be discarded. It will be recalled when you use \1 (top) or \2 (bottom).
3. You are reinserting the hyphen in your Replace term: Replace with: \1-\2
March 24, 2011 at 11:49 AM prozaker said...
you could take a look at the pythonscript plugin, it has a python replace method that everyone could use. It looks complete, textfx or regular n++ regular expression lack options.
http://sourceforge.net/projects/npppythonscript/
---editor.pyreplace('id\=\"A\d+\" ','') # delete all id="A##"
---April 1, 2011 at 5:13 AM el Mauri said...
Hello, nice guide.
I have a (newbie) question: I have the following list of emails:
[email protected], [email protected], frojasd08_hotmail.com ... and the list so on
And I want to take with that email that does not comply with the format in a regular email, in my example:
frojasd08_hotmail.com (it hasn't the character @)
Thanks, Mauri
April 3, 2011 at 5:43 AM Mark Antoniou said...
Mauri, it turns out that this is not as trivial as it first appears. Handling email addresses is quite a controversial issue in the regexp world. See
http://www.regular-expressions.info/email.html for a discussion of the varioius issues and disagreements. Your sample text has two unique characteristics that allows us to sidestep the messy world of identifying 'what is an email address?', so I have taken advantage of these two unique conditions:
1. Each email is separated be a comma followed by a space ", " 2. Some of the email addresses are missing a "@"
I have written the solution below for Notepad++. It involves several steps, but as long as conditions 1 and 2 from above are satisfied, it will always work.
So, we start with this:
[email protected], [email protected], frojasd08_hotmail.com, [email protected], steve#yahoo.com, [email protected]
Step 1: Place each email address on its own line
Search for (Extended mode): ", " (without the quotation marks) Replace with: ,\n
You end up with this: [email protected], [email protected], frojasd08_hotmail.com, [email protected], steve#yahoo.com, [email protected]
Step 2: Remove correctly formatted emails that contain "@" Search for (Regular expression mode): .*@.*
Replace with: (nothing, leave blank) You end up with this:
frojasd08_hotmail.com, steve#yahoo.com,
Step 3: Remove blank lines Search for (Extended mode): \n Replace with: (nothing, leave blank) The result is this:
frojasd08_hotmail.com,steve#yahoo.com,
Optional step 4: If desired, you could at this point insert a space after each comma
Search for: ,
Replace with: ", " (without quotes) End result:
frojasd08_hotmail.com, steve#yahoo.com,
So, only those email addresses that do not contain the @ are left, and they may now be corrected, logged, or whatever.
April 8, 2011 at 8:35 AM BK said...
I need your help.
I built a reg expression using regmagic tool. The expression is:
\b(?:(?:[1-9][0-9]{1,3}|[5-9])[0-9]{4}|[0-9]+|[0-9]+)\b This expression supposed to find numbers between 50000
and 99999999
appstore.gearlive.com/member/76234/|0
I have 1000 lines like this. But despite I check, regular expression as the searchmode, it finds nothing.
What am I missing. Please Help! April 11, 2011 at 6:38 PM Mark Antoniou said...
Yeah, BK, it's not going to happen. Not with Notepad++, at least. From past experience, Notepad++ has problems both with repetition {1} and searching for white space \b. You could achieve what you want to do in seven (fairly inelegant) steps, starting from the largest number of digits:
[1-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] and removing one digit an each step [1-9][0-9][0-9][0-9][0-9][0-9][0-9] then
[1-9][0-9][0-9][0-9][0-9][0-9] and so on, until you arrive here [5-9][0-9][0-9][0-9][0-9]
April 13, 2011 at 5:36 AM warm up said...
@Mark,
[5-9][0-9][0-9][0-9][0-9]
this indeed finds what I want. Thank you very much. April 13, 2011 at 7:55 PM
Wavetrain said...
Hey, just wanted to say thanks for the pointers. Really helped me clean up a massive wiki list, it probably cut down editing time to 1/4 what it would have been.
warm up said...
I need a regex builder. Will you please suggest me a good one? Thanks in advance.
April 23, 2011 at 4:18 AM Mark Antoniou said...
Sorry warm up, I've never used one, and definitely couldn't recommend a good one. If you've got a specific regexp query I might be of more use.
April 23, 2011 at 6:21 AM warm up said...
Thanks for offering your help and your time. I want to find a string in a text like this;
For example evey line in the text file has a string #links#
and after this string there are several words that does not interest me. I want to find and mark #links# and the words afterthat so that I can delete them. How can I do that with notepad++?
April 26, 2011 at 5:01 PM Mark Antoniou said...
That's not too difficult. You just need to idenitfy each line that begins with #links# and delete it.
Search for: #links#.*
Replace with: nothing, just leave it blank April 27, 2011 at 12:12 AM
warm up said...
But I do not want to delete only #links#. I want to delete #links# and the words that are coming afterthat.
something is important but #links# this is not after the process I want to get only;
something is important but April 27, 2011 at 1:49 AM Mark Antoniou said...
Yep, I understood what you were after. This still works. Let me break it down for you: Make sure that you are searching in Regular Expression mode
Search for: #links#.*
Replace with: nothing, just leave it blank
Note that #links# is followed by a period and asterisk .* which will select everything after #links# until the end of the line.
So, when you use that term on this:
something is important but #links# this is not What will be left over is this:
something is important but April 27, 2011 at 2:08 AM warm up said...
Ok. That works. Thank you very much. April 27, 2011 at 4:12 AM Constantin said...
Searching for multiple lines doesn't seem to be working. I am searching for this
@Text.*\r\n.*;
Any ideas ?
May 17, 2011 at 4:50 AM Mark Antoniou said...
If you are trying to perform this search in Notepad++, it's not going to happen.
Having said that, if you insist on using Notepad++, you are going to need to get creative and will need to break the search down into steps because \r\n cannot be used in Regular Expression mode - you need to use Extended Search mode for \r\n. So, how many steps do you need? I'm not sure, because it depends on your text, but my guess is at least three:
1. Turn the newline into something unique. 2. Run the regexp.
3. Put the newlines back or do something else with them (not sure what, because you didn't specify).
May 17, 2011 at 4:59 AM Constantin said...
Well :), if the regular expression implementation in Notepad++ would implement the multi line pattern that could solve it. I am not familiar with how Notepad++ is
implemented but Java would allow multi line patterns. I bet .NET would do the same. The solution you suggested would work nicely but what I was trying to do was to search thru a large set of java files for a certain multi line pattern. So I can't have the option to replace the \r\n with a special token since that will alter the code base.
Thanks for looking! May 17, 2011 at 7:22 AM Mark Antoniou said...
If you do not *have* to use Notepad++, why not just use a more powerful text editor (XEMacs), which will give you the one-line solution that you are looking for?
May 17, 2011 at 7:29 AM Dee said...
Hi Mark,
replace field dynamically.. I had a situation like this: Text: Step 1 Step 2 Step 3 Find : Step\s\d Replace : Step\s\d|
which of course gave me this! Step\s\d|
Step\s\d| Step\s\d|
Eventually, it clicked that \1 represents the found patterns and I stumbled upon this:
Find: Step\s\d Replace: \1|
which gave me the desired result: Step 1|
Step 2| Step 3|
Just wanted to get that out there in case anyone else is struggling with that. Once again cheers Mark for the help on that one.. the fist in the air celebration was priceless.
Dee
June 16, 2011 at 1:31 AM Mark Antoniou said...
after you have that "ahah" moment! June 16, 2011 at 2:11 AM
Manuel said... hi,,
i need to do a massive replacement from: tcp10102/172.20.225.246_PROBE to
tcp10102_PROBE
can you tell me the syntax to use for this replacement? text after tcp and text after/ and before _PROBE varies.. June 16, 2011 at 11:39 PM
Mark Antoniou said... Hey Manuel,
This is pretty straightforward. You want to keep everything before the forward slash and everything after the underscore.
In Regular Expression search mode, Search for: (.*)/.*_(.*)
Replace with: \1_\2
You can see that in the search term, I am using the forward slash and underscore as signposts, and am keeping everything before and after (enclosed in parentheses), but am discarding everything in between (not enclosed in parentheses).
June 17, 2011 at 12:13 AM Nate said...
I am interested in searching a document and replacing everything from a href=" to " and change all the links quickly with notepad ++ can you tell me how to do this?
I tried searching for ahref=".*" and it selected everything up to the LAST " Please advise!
Thanks
June 18, 2011 at 3:45 AM Mark Antoniou said...
Ok, I'm not sure exactly what you want the end result to be, but I'll give it a go. Say that you start with something like this:
ahref="www.google.com" ahref="www.facebook.com" ahref="www.blogger.com" ahref="www.twitter.com"
If you want to keep the ahref=" and the final " you could Search for (regexp mode): (ahref=").*(")
Replace with: \1\2 The end result would be ahref=""
ahref="" ahref="" ahref=""
If you want to keep everything but the ahref=" and the final " you could Search for (regexp mode): ahref="(.*)"
Replace with: \1
The end result would be www.google.com www.facebook.com www.blogger.com www.twitter.com
If you want to do something else, you're going to have to be more specific. Ideally, show me what a few lines of text look like before, and what you want them to look like after.
June 18, 2011 at 4:01 AM Nate said...
Awesome! Thanks for the quick reply, worked great! What an awesome trick for rewriting!
June 18, 2011 at 4:10 AM Rakesh Juyal said...
Mark, is it possible to replace all ? in any text file with '${abc' then an incrementing number then '}$'
example:
---where ( col1 = ? or col1 = ? ) and col2 = ? replaced to
where ( col1 = ${abc1}$ or col1 = ${abc2}$ ) and col2 = ${abc3}$
---July 18, 2011 at 6:15 PM Mark Antoniou said...
Yes it is. But it will require a very long and convoluted process and several search and replace steps (similar to the blog post above). The problem is the "increment by one" part.
In Notepad++, you can insert incremented numbers from the Edit | Column Editor menu command. This places numbers at the front of each line.
You could possibly position each ? so that it occurs at the end of each line, then replace it with ${abc\1}$, where \1 represents the number at the beginning of the line. Not sure if you want to go ahead with this, but if you do, here are the steps:
1. Get rid of all line breaks, replacing them with some unique string that does not occur in your original text file, such as "thereisnoothertextlikethis".
2. Search for ? and replace with a ? followed by a linebreak.
3. Add numbers to the beginning of each line using the Edit | Column Editor menu command.
4. Use a regular expression to search for the number at the beginning of each line and move it to ${abc\1}$
5. Remove all linebreaks.
6. Replace all instances of thereisnoothertextlikethis to restore your original linebreak structure.
If you want to go ahead with this, paste a larger portion of your text file (10-20 lines) and I'll show you how to do it in more detail.
July 19, 2011 at 2:08 AM ג ו ד א י יתיא said... Hello,
Thank you for the time investing publishing and answering - Helped me a lot ...My Question is :
* If I have Emails with NOT similar text before and After and I would like to extract those Emails...for example :
In the same Text : First String : =============== "21-Feb-2011 12:16:49 GMT+02:00 PM","alternateContactBusinessPhone":"","databasePlatform":"","productLine":"Oracle E-Business Suite","lastPublicActivityCreatedBy":"[email protected]","accountStatus" :"Active","commitTime":"22-Feb-2011 9:09:06 GMT+02:00 AM","HWCity":"","conflictId":"0","outageType":"","contactLogin":"GOREN.NAAMA@GM AIL.COM","subCategory":"","SRContactEmail":"[email protected]","alertMe":"fa lse","SRContactPhone":"(972) 542-1341 x76" Second String : =============== "09-Jun-2011 10:42:57 GMT+03:00 AM","alternateContactBusinessPhone":"","databasePlatform":"","productLine":"Oracle Database Products","lastPublicActivityCreatedBy":"[email protected]","acc ountStatus":"Active","commitTime":"10-Jun-2011 10:20:52 GMT+03:00 AM","HWCity":"","conflictId":"0","outageType":"","contactLogin":"ITSHAK@HADASSA H.ORG.IL","subCategory":"","SRContactEmail":"[email protected]","alertMe":"fals e","SRContactPhone":"02-6778113"
Regards Etay G
August 3, 2011 at 1:15 AM Mark Antoniou said...
Glad you have found the blog useful, Etay G. I'm not sure exactly what you are trying to get from the text. Do you want to get rid of everything, leaving only the email
addresses?
August 5, 2011 at 1:23 AM RatA said...
Mark, thanks for the post, is very usefull. following the first example, how about not erasing all the line, but only a part.
like i want to remove the $_POST['abc']; part in all lines $abc = $_POST['abc'];
$bbb = $_POST['def'];
i try [$_POST].* but it erase all the line, and not the final part. August 19, 2011 at 2:26 AM
Mark Antoniou said...
RatA, if I understood correctly, you want to turn this: $abc = $_POST['abc']; $bbb = $_POST['def']; into this: $abc = $bbb = is that right? To do this,
Search for (regular expression mode): $_POST.* Replace with: nothing
RatA said...
thanks, u are a genius. August 21, 2011 at 5:06 AM Mikazza said...
September 15, 2011 at 11:34 PM Mikazza said...
Hi Mark,
Thanks for all the great info on regular expressions, although I have a problem I can't seem to find the solution for.
I have a data file which I would like to strip out some sections are they are useless, first I replaced all the \r\n with @NEWLINE@ so I could get the whole file in one line, now i'm trying to replace anything between and with
e.g.
**Data I want to keep is here 1** message
called today but nobody was home /message
**Data I want to keep is here 2** message
called today but nobody answered /message
the words message have < and > around them but the site wont let me post them. As I said I removed all the line breaks from this and tried to run this regular expression. Find: (messages.*)(/messages)
Replace: deleted
I couldn't work out how to find the < or > symbols.
what it does though is finds the 1st occurance of the word messages then finds the last occurance and replaces everything in between with the word deleted. In my example above its deleting **Data I want to keep is here 2**
Is there any way of doing this using regular expressions? September 15, 2011 at 11:37 PM
Mark Antoniou said...
Hi Mikazza, I am not sure that I have understood exactly what you are trying to do, but will give it a shot. So this is your original text:
**Data I want to keep is here 1** < message >
called today but nobody was home < /message >
**Data I want to keep is here 2** < message >
called today but nobody answered < /message >
In order to remove the < message > and < /message > tags, you should Search for (regular expression mode): <.*>
Replace with: nothing This will give you this:
**Data I want to keep is here 1**
called today but nobody was home
called today but nobody answered
If you then want to get rid of the lines that begin with "called", you could Search for (regular expression mode): called.*
Replace with: nothing which will give you this:
**Data I want to keep is here 1**
**Data I want to keep is here 2**
And then fix the blank lines as you see fit. Hope this helps.
p.s. I inserted spaces before and after the greater and less than symbols so that they would show up in the post. You would not include the spaces in the search term. September 17, 2011 at 12:53 AM
Mikazza said...
Thanks for the quick response Mark, what I want to replace is the < message > and < /message > and everything in between them. I can get it to work if there is only one set of these tags in the file (unfortunately there are thousands), if there are more than 1 set it goes wrong and deletes everything between the 1st < message > and the last < /message >.
Since the < message > and < /message > are on different lines in the file and the content between them can also vary on how many lines it's over, I removed all the line breaks to make it a bit easier to do the search and replace.
Thanks!
September 17, 2011 at 4:47 AM Mark Antoniou said...
Ok got it. So, you start of with this: **Data I want to keep is here 1** < message >
called today but nobody was home < /message >
**Data I want to keep is here 2** < message >
called today but nobody answered < /message >
Notepad++ has a hard time handling multiline regular expressions. One option is to use a different text editor with more powerful regexp capabilities (ahem, Emacs). The other option is to use Notepad++ and break this down into a few steps (3 to be precise). Step 1: Remove the newlines
Search for (extended mode): \r\n Replace with: nothing
This will give you this:
**Data I want to keep is here 1**< message >called today but nobody was home< /message >**Data I want to keep is here 2**< message >called today but nobody answered< /message >
Step 2: Make all instances of < /message > occur at the end of a line. The reason for this is because we want to discard everything before < /message >, apart from that bit at the front that we want to keep.
Search for (extended mode): < /message > Replace with: \r\n