Web Forms :: Strip Out MS Word Formatting Programmatically?
Oct 30, 2010
I'm about to start an application that will allow users to upload an amount of text to a SQL database via an ASP.NET webform. I am certain that many users will cut and paste the text from Word, together with all the formatting and other baggage that Word creates. I'm looking for a way of programatically stripping out all of this stuff and leave just plain text.
I am reading a MSWord Document (docx) in my C# code.Then I get the InnerText of the word document into a string which goes into the Body section of an email.The problem is that when I do something like the following:
emailBody = xmlDocument.InnerText.ToString();
Everything comes in one line and all the formatting is lost.
I'm having some difficulty merging multiple word documents together using Microsoft Office Interop Assemblies (Office 2007) and ASP.NET 3.5. I'm able to merge the documents, but some of my formatting is missing (namely the fonts and images). My current merge code is shown below.
I'm creating a RDLC report in C#. Is it possible to insert the content of a Word 2003 document (with formatting) in it (either in design time or programmatically) before exporting to PDF. The final result will be a PDF file containing the initial report (fields from database) and the Word document content following it.
Why this? I need to give the user the possibility to fill a form, attach a word document and export the all to PDF as I described earlier (ASP.NET). I don't have Word installed on the server so I can't Interact with its COM objects.
I have a web application and I need to convert DOCX files to PDF to generate some reports wtiten in Word 2007. First I used automation and I faced a DCOM problem and finally I discovered that Microsoft doesn't support automation of Word on the server side.Now I'm searching for other free tools as opposed to Word and etc isn't free, iTextSharp doesn't convert DOCX to PDF.
The purpose is to generate proposal documents that can manually be edited in Word after the fact, but before sending them out to the customers.
Much proposal content would be drawn from existing HTML website content (backing CMS) and also some custom (non-HTML) injection for certain scenarios. Of course the conditional logic could go into server-side ASP.NET to vary the content appropriately.
I'm open to 3rd-party tools if raw manipulation of the Word API is arduous. In fact a good 3rd party tool might be the answer.
I'm using asp.net/C# and I'm looking to create unique(?) uris for a small CMS system I am creating.
I am generating the uri segment from my articles title, so for example if the title is "My amazing article" the uri would be www.website.com/news/my-amazing-article
There are two parts to this. Firstly, which characters do you think I need to strip out? I am replacing spaces with "-" and I think I should strip out the "/" character too. Can you think of any more that might cause problems? "?" perhaps? Should I remove all non-alpha characters?
Second question, above I mentioned the uris MAY need to be unique. I was going to check the uri list before adding to ensure uniqueness, however I see stack overflow uses a number plus a uri. This I assume allows titles to be duplicated?
I have a word document which opens in a web browser using ASP.NET 2.0 this is the code:
[Code]....
THIS WORKS, PROMPTING THE USER WITH "OPEN" "SAVE" "CANCEL" OR SOME USERS WITH "SAVE" "CANCEL" DEPENDING ON USER INTERNET SECURITY SETTING. BUT FOR SECURITY ISSUES, MY BOSS WANT THIS WORD DOCUMENT TO BE OPENED IN WORD PROGRAM. (MICROSOFT WORD 93,97 ETC) IS THIS POSSIBLE....? OFCOURSE SAVING THE FILE IS OKAY, JUST OPENING THE DOCUMENT SHOULD BE NOT BROWSER.
I need to be able to remove non-XHTML tags from a string containing XHTML that has been stored in a database. The string also contains references for controls (e.g. ) inside the XHTML, but I need clean XHTML with all standard tag contents unchanged.
These control tags are varied (they could be any ASP.NET control), so there are too many to go looking for each one and remove them. The way they are closed is also varied, so not all of them have closing tags, some are self closing.
How can I go about doing this? I've found some HTML cleaners on-line for including in my project, but they either remove everything or just HTML encode the entire string.
Also, I'm dealing with parts of XHTML documents, not entire documents - don't know if that makes a difference.
An example (not fantastic, but gives you the idea of what I'm working with):
<p><mycontrols:mycontrol myproperty="hello world" myproperty2="7"><SPAN><a href="#"><img title="an example image" height="68" width="180" alt="an example image" src="images/example1.gif"></a></span></mycontrols:mycontrol><a href="#"></a></p>
How to Split and strip X string values into separate variables? X has string value of
itemA=myvalue&itemB=anothervalue&itemC=andanother
I have 3 strings (var1,var2,var3) to hold the values of the stripped values. Find in string X "itemA=" copy everything after "=" character until "&" character OR if no "&" character is found copy until end of string (store this value into var1) Find in string X "itemB=" copy everything after "=" character until "&" character OR if no "&" character is found copy until end of string (store this value into var2)
Find in string X "itemB=" copy everything after "=" character until "&" character OR if no "&" character is found copy until end of string (store this value into var3)
My requirement is : I have to open a word document in Microsoft word from my web application and the word document is stored in the server location.
In javascript i wrote a code to open the word document like below. var dsWordApp = new ActiveXObject("Word.Application"); var WordDoc = dsWordApp.Documents.Open(FileName)
Using above script i am able to open the local word doc files but i am unable to open the server side word doc file.
Is it at all possible to use IIS7's rewrite capability in web.config to strip a particular HTTP header from a client request?We have an application that makes an HTTP POST to our website, and apparently the request contains the HTTP Expect header. Previously this was not a problem, but we've switched hosts and now the site is returning HTTP error 417 Expectation failed. So the real solution is to fix the software so it doesn't send the Expect header, but that can't happen soon enough for the folks in charge, who'd like to come up with an immediate web-based fix.I've used ISAPI_Rewrite before and I've read that it can strip a header, and the new host claimed they had ISAPI installed... but that seems to have been a lie, as I cannot get it to work, and support's only response on the subject is "use IIS7 Rewrite instead."
I've got a literal control to display the username of the user logged into our companies intranet system, originally I had a LoginName control but couldn't get strip to remove the domain from the username (as the format is domainnameusername) so I'm trying it this way.
I have used datagridview to display certain data to users...
It works f9 but what I am concerned with is that there are a few columns having large text...n the text moves to the next line...i want that after a fixed length of letters it should display "......"
I have a page with two textboxes (one for english and one for korean)..
I want users to be able to enter an english word or enter a korean word in each respective textboxes...
Now I can only enter english in both of them... Even if my keyboard language is switched to Korean(I have the language pack installed).. I can not type in Korean in the Korean text box....
I thought I could simply change this using something like:
I'm hoping this is an easy one...I'm using Microsoft.Office.Interop.Word to convert uploaded word documents into previewable html files. I haven't implemented it fully, but I've played around with it enough that I think I have a plan that will work... My question revolves around
[Code]....
I'm not in love with the idea of opening word and closing it everytime there's an upload (which I hope is a lot of the time) I'd like to make this a shared object that loads at application start. I have 2 questions that go along with this. 1. I imagine that winword could lock up and that would be a problem, right? 2. to save a document I use wordapp.ActiveDocument, this could have an issue as a shared object, right? I think I'm talking myself right out of this...