Regular Expression That Removes Attributes From Tags?
Jun 22, 2010
What I'm interested in is a regular expression that will accept HTML input and remove all attributes inside the tag while leaving the tag intact. For example I want this...
<p class="test" id="TestParagraph">This is some test text right here.</p>
We have a custom set of custom control tags, eg:<ourTag:OurControl runat="server" />Throughout our project we have discovered Visual Studio's marvelous(sarcasm) helper which automatically pastes an ID with the tags name followed by a counter number. I am now trying to remove them globally.
<div id="mydiv">This is a "div" with quotation marks</div>
I want to use regular expressions to return the following:
<div id='mydiv'>This is a "div" with quotation marks</div>
Notice how the id attribute in the div is now surrounded by apostrophes?
How can I do this with a regular expression?
Edit: I'm not looking for a magic bullet to handle every edge case in every situation. We should all be weary of using regex to parse HTML but, in this particular case and for my particular need, regex IS the solution.
Edit #2: Jens Ameskamp helped to find a solution for me but anyone randomly coming to this page should think long and very hard about using this solution. In my case it works because I am very confident of the type of strings that I'll be dealing with. I know the dangers and the risks and make sure you do to. If you're not sure if you know then it probably indicates that you don't know and shouldn't use this method.
I haven't been able to find relevant information through searches. I'm very green when it comes to sever side scripting. I have an ASPX page with a standard form. In the head I have meta tags, the title tag, and a link tag neatly ordered on their own lines. However, when viewing the source code after publishing to the server, the spacing between the tags is removed and it looks quite messy. (There are also <style> and <script> tags that follow, but they remain unaffected.)
I realize this has no practical effect on the site itself (in an SEO sense or otherwise). My project manager shows the source code to our clients to educate them on meta tags and page titles. It would help if it wouldn't become jumbled like this. I wonder if this is a common issue and if it's possible to prevent through better coding practices. HTML as authored, with tags separated on their own lines:
HTML Code:
<head runat="server"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="description" content="Welcome to Lawn Care Waukesha - Cut My Lawn. Cut My Lawn - Lawn Care Services has offered quality lawn cutting, fertilizing, aerating, and much more at affordable pricing since 2002! We currently offer lawn care service to Waukesha, Brookfield, Pewaukee, Menomonee Falls, and surrounding communities." /> <meta name="keywords" content="lawn cutting, lawn mowing, lawn care, fertilizing, aeration, mulching, shrub trimming, lawn mowing, edging, pruning, mulching, weed control, waukesha, Brookfield, Pewaukee, menomonee falls" /> <title>Lawn Care Waukesha — Cut My Lawn, Lawn Care Service</title> <link rel="shortcut icon" type="image/x-icon" href="favicon.ico" /> HTML after being processed by the sever, with all the tags running together:
HTML Code: <head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta name="description" content="Welcome to Lawn Care Waukesha - Cut My Lawn. Cut My Lawn - Lawn Care Services has offered quality lawn cutting, fertilizing, aerating, and much more at affordable pricing since 2002! We currently offer lawn care service to Waukesha, Brookfield, Pewaukee, Menomonee Falls, and surrounding communities." /><meta name="keywords" content="lawn cutting, lawn mowing, lawn care, fertilizing, aeration, mulching, shrub trimming, lawn mowing, edging, pruning, mulching, weed control, waukesha, Brookfield, Pewaukee, menomonee falls" /><title> Lawn Care Waukesha — Cut My Lawn, Lawn Care Service </title><link rel="shortcut icon" type="image/x-icon" href="favicon.ico" />
I'm not sure it's relevant, but here's the script used to send the form (which I didn't write, by the way). It's the final tag inside the page head:
HTML Code: <script type="" runat="server"> Protected Sub SubmitForm_Click(ByVal sender As Object, ByVal e As System.EventArgs) If Not Page.IsValid Then Exit Sub Dim SendResultsTo As String = "email" Dim smtpMailServer As String = "smtp" Dim smtpUsername As String = "email" Dim MailSubject As String = "subject" Try Dim txtQ As TextBox = Me.FormContent.FindControl("TextBoxQ") If txtQ IsNot Nothing Then Dim ans As String = ViewState("hf1") If ans.ToLower <> txtQ.Text.ToLower Or ans.ToUpper <> txtQ.Text.ToUpper Then Me.CutMyLawnForm.ActiveViewIndex = 3.......................
I'm trying to use the validator to work on a email form to ensure that they enter a valid from email address. That part works. I also want to add to the expression the text that I pre-populate in the txtbox ("Enter your email address") so on postback, after sending the message, I can clear the fields and repopulate that box.
How do I add that wording to the current expression: w+([-+.']w+)*@w+([-.]w+)*.w+([-.]w+)*
I am using Regular Expression validator for a text box. The below is working fine.It is not performing validation when I enter values like 0000..001.How can I modify validation expression? But it would allow values like 100,...5000 i.e zeros after a number.
<asp:RegularExpressionValidator ID="reg2" runat="server" ControlToValidate="rng2" ValidationExpression="^[0-9]+" ErrorMessage="*Please Enter a Valid Number for Second Range." ForeColor="Red" Font-Bold="True"></asp:RegularExpressionValidator>
I need some help in regular expression.I am validating the textbox text when updating the records.When i click the update button,the first 5 letters should be equal to CM000 or cm000.How to validate this using regular expression in asp.net. does anyone know validationexpression for this.
How can I find all tags which have any attribute starting with some character?Something like ('TR[^a]) Here I am trying to find all TR which have any attribute starting with 'a'
I have a fck editor in which the user enters some text. And in the code i want to strip the class,id attributes of the text posted. I know this can be done through regular expressions And i have written some code to do so but unfortunately it's not working.
private string RemoveScripts(string input) { string re1 = "(.*?"; // Non-greedy match on filler string re2 = "(class)"; // Word 1 string re3 = "(=)"; // Any Single Character 1 string re4 = "(".*?"))"; // Double Quote String 1 string re5 = "(id)"; Regex regClass = new Regex(re1 + re2 + re3 + re4, RegexOptions.IgnoreCase | RegexOptions.Singleline); Regex regID = new Regex(re1 + re5 + re3 + re4, RegexOptions.IgnoreCase | RegexOptions.Singleline);
input = regClass.Replace(input, new MatchEvaluator(ReplaceClassID)); input = regID.Replace(input, new MatchEvaluator(ReplaceID)); return input; } private string ReplaceClassID(Match m) { return ""; }
I want a limited amount of html tags for the user to be able to put in my form and by looking at the posts, regular expressions seem to be the way forward. I don't want to use javascript or 3rd party as this is for an assignment at university.
The examples I have found seem to be set out differently than mine with using asp id identifiers whereas my code below is set out differently so I am unsure if I can use these examples.
I am building a forum and I want to be able to use simple square bracket tags to allow users to format text. I am currently accomplishing this by parsing the string and looking for the tags. It's very tedious, especially when I run into a tag like this [URL]. Having to parse the attribute, and the value, and make sure it has proper opening and closing tags is kind of a pain and seems silly. I know how powerful regular expressions are but I'm not good at them and they frustrate me to no end. I think an example would get me started. Just a regex for finding tags like [b]bolded text[/b] and tags with attributes like the link
I would like to know what the expression for "letters only" is for my asp.net web application. I tried ^[A-Za-z] but it does not work I would also like a regular expression for "numbers only except "-"
I am trying to write a Regular Expression Validator to allow users to input any number (either positive or negative), except 0. I tried the followings but obviously they failed:
^[1-9-]{1,10}$ <= failed when the 2nd to n-th digit is 0
I am using a regex to detect forum tags within posts, such as "[quote]text[/quote]" and then replace them with HTML formatting. I posted a question Forum tags. What is the best way to implement them? about a problem with nested tags.
Right now this matches an opening and closing tag, with match groups for the tag name, the tag content, and an optional value like the value after the equals in this forum tag [url=www.google.com]click me[/url].
What I need is for the expression to match the opening tag OR the closing tag and have a match group containing the tag name (including the '/' for the closing tag). I then want to iterate through them sort of like this:
Dictionary<string, int> tagCollection = new Dictionary<string, int>(); inputString = Regex.Replace(inputString, @"expression I'm asking for here", match => {
[Code]....
So now, each tag is appended with a number and I could use the original regex to perform the tag functions and properly handle nested tags as separate tags.
I got this code from a regular expression library to get the src of the image found in a given input. However I can't get it to work because the double quotes(") in the regular expression is closing the string under the second parameter for the .Match function. I tried changing double quotes(") to " but it doesn't work too.
I have custom control for create user. I use RegularExpressionValidator with expression (?=W{1,}){7,} (lentgth min 7 character and must include not less 1 NotAlphaNumeric character) . I test this expression
function GetText(AInputText) { var VRegExp = new RegExp(/(?=W{1,}){7,}/); var VResult = VRegExp.test(AInputText); return VResult; }
it's work. But RegularExpressionValidator doesn't accept this expression for input - 1234567! or 12345qwer!