Thursday, April 21, 2011

Problem using regex to extract text

Hello!

I need to extract from the following lines of code

<label for="<%=foobar.bar %>">Password:</label>

<label for="foobar">Password:</label>

I need to extract foobar, I can use this: (?<=for=")[^"]+(?=(")) to extract:

<%=foobar.bar %>

and

foobar

but I don't want <%= or .bar and if I try to create (?<=for=")[^"]+(?=(")) | (?<=for="<%=)[^"]+(?=(")) it doesn't work becuase the label that included <%= meets both conditions and I don't think you can use XOR? Is this anything anyone can help me with?

Merci :)

From stackoverflow
  • You can try this:

    (?<for="(<%=)?)[^" ]*(?=( %>)?")
    

    Assuming that what you want to capture never includes spaces. Otherwise you can try:

    (?<for="(<%=)?)[^"]*?(?=( %>)?")
    

    To use a non greedy form of [^"]*

  • I believe that it's better to not create uber-regexes. Do your task in several steps:

    1. Extract <%=foobar.bar %> or foobar with your regex (?<=for=")[^"]+(?=("))
    2. Check if result matches regex like <%=([\w]+)\.bar\s*%>.
    3. If it does use $1 group from match, otherwise use result of step 2.
    4. You get foobar
    Sara : thank you, I used (?<=for=\")[^\"]+(?=(\")) and then went on to use: (?<=<%=\s*)[^\s]+(?=\.bar\s*%>)

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.