Every night I have a program that runs that creates a natural gas report. In this report is something called "expert analysis." The is just text which is a commentary of the current Natural Gas market and because it's a volatile market there is new commentary every day. So also every day, an employee in our office runs a public webpage which displays this text, copies and pastes it into a webpage that is part of our system and it gets saved to a file which gets pulled into our nightly report.
The page that is in our system is as ASP page and it uses FileUp to create the file. We want to break our dependancy on FileUp (it's a SoftArtisans product, an excellent vendor, but we are trying to save some money). Right now we don't have FileUp because our trial expired. So every day I run the public webpage and copy the text and paste it to my expert.txt file all manually. Another problem is that the expert analysis is updated after our office staff has all gone home,
Here is my question - can I write a program in .NET that runs a page, parses the resulting HTML and extracts out the expert analysis all automatically? I won't need FileUp anymore becaus I can use the .NET upload control. And I won't have to log on every night to manually copy and paste the expert analysis. Is that what web/screen scraping is? What happens then the public page's format changes because they do maintenance on their site? Is it just like if I were to run [URL] and try to read the Moderator column of the ASP.NET forum? I'd be okay for a while (would I?) but then if the layout changed I wouldn't find the Moderator anymore?
I'm looking for a way to parse excel into Asp.Net, the problem I'm experiencing same as in[ URL]
I've searched all web, but no1 seems to have an answer
here's the error
The Microsoft Jet database engine could not find the object 'Cities'. Make sure the object exists and that you spell its name and the path name correctly.
<link> <linkto>[URL]/broadcasts/index.cfm?fuseaction=usrbrd&broadcasterid=57468</linkto> <image>report_button</image> <alt>Report a Problem</alt> <target></target> </link> <linkto> tag have the url. But because of the "&" symbol in the url thats completely crashing the page. I tried using like amp; next to the & symbol but this doesn't give the correct url.
I'm developing a small ASP.NET Mvc project in Mono 2.4, Ubuntu 10.10. There is an array of objects, each one of them corresponds to a certain xml file. Reading of the xmls is performed with XmlTextReader. That does not work because xml files have rare "cp866nav" encoding, which is not supported by XmlTextReader ("System.ArgumentException: Encoding name 'cp866nav' not supported"). But it works fine if encoding in xml header is changed to "cp866". I found a kind of solution which consists in initializing XmlTextReader with a StreamReader with a certain encoding instead of file name, like in the code below:
XmlTextReader reader = new XmlTextReader(new StreamReader(Server.MapPath(filename), Encoding.GetEncoding("cp866")));
The issue is that the directory which contains xml files is read only (I can not change it), so I get "System.UnauthorizedAccessException: Access to the path '' is denied.". Rather strange, because XmlTextReader initialized with a filename seems to read the files. Is there any solution, considering that program cannot modify or create files?
when my xml file is big, I get the following error unexpected end of file while parsing name has occurred. Line 1, position 2034 the xmlData debug showed that not all the xml file is being read
[Code]....
I read that the memory of the string variable is not big enough to handle the xml file, what can I use to be able to read big xml files. my program works for small xml text.
I have the following data that basically I only need a few bits of information from:
Resource:X - Y;Z - Å;Type:(all) From Date: 07/12/2010 - To Date 07/12/2010 Sort by:Time Include Referring source/physician:No Footer:Default Criteria:None ","Appointments","X, Y","ZAssociates","Monday, July 12, 2010","Time","Patient Name","Patient ID","Appt. Type","Ref. Source/ Physician","Phone","Type","DOB ","Z, X","Y","7/12/2010 12:00:00AM","Time","Patient Name","Patient ID","Appt. Type","Phone","Type","DOB "," 7:30 [snip]
The only things I need from this are:
Patients Name Drs Name Patients Phone Number Appt Time Appt Date
and the rest of the information I can discard. A customer uploads this as a .csv file (even though it really isn't as you can see) and I'd like to parse the needed information and post that to my SQL database and discard the rest. I think I can do this with a dataset but I've never built that before. The fields from the customer will always be the same and the fields I will need will always be the same. Also, the date time has to be in the format of yyyy/mm/dd:hhmm and the phone number always has to have 512 as a prefix. Here is the code I currently have for my site:
Imports System.IO Imports System.Data Imports System.Data.SqlClient Partial Class _Default Inherits System.Web.UI.Page Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load End Sub Protected Sub Submit1_Click(ByVal sender As Object, ByVal e As System.EventArgs) Dim SaveLocation = "\xxxWEB3wwwrootWebfile1RemindersDoug_Ancildoug.csv" If UploadFile(SaveLocation) Then 'the file was uploaded: now try saving it to the database SaveToDatabase(SaveLocation) End If End Sub Private Function UploadFile(ByVal SavePath As String) As Boolean Dim fileWasUploaded As Boolean = False 'indicates whether or not the file was uploaded 'Checking if the file upload control contains a file If Not File1.PostedFile Is Nothing And File1.PostedFile.ContentLength > 0 Then Try 'checking if it was .txt file BEFORE UPLOADING IT! 'You used to upload it first...but the file could be a virus If File1.FileName = ("doug.csv") = False Then 'The file is not the expected type...do not upload it 'just post the validation message message.Text = "Sorry, thats not the correct file." message2.Text = "Please locate and upload 'doug.csv'" Else 'The file is a .txt file 'checking to see if the file exists already 'If it does exist Deleting the existing one so that the new one can be created If IO.File.Exists(SavePath) Then IO.File.Delete(SavePath) End If 'Now upload the file (save it to your server) File1.PostedFile.SaveAs(SavePath) 'After saving it check to see if it exists If File.Exists(SavePath) Then 'Upload was sucessful message.Text = "Thank you for your submission." fileWasUploaded = True Else 'the file was not saved message.Text = "Unable to save the file." End If End If Catch Exc As Exception 'We encountered a problem message.Text = "Your file was not in the correct format. Please contact Customer Service at xxxx-xxxx-xxxx." End Try Else 'No file was selected for uploading message.Text = "Please select a file to upload." End If Return fileWasUploaded End Function Private Sub SaveToDatabase(ByVal SavePath As String) Try Dim sqlQueryText As String = _ "BULK INSERT dialerresults " + _ "FROM '" & SavePath & "' " + _ "WITH ( FIELDTERMINATOR = ',' , ROWTERMINATOR = ' ' )" ' and bulk import the data: 'If ConfigurationManager.ConnectionStrings("Dialerresults") IsNot Nothing Then 'Dim connection As String = ConfigurationManager.ConnectionStrings("Dialerresults").ConnectionString Dim connection As String = "data source=10.2.1.40;initial catalog=IVRDialer;uid=xxxx;password=xxxxx;" Using con As New SqlConnection(connection) con.Open() ' execute the bulk import Using cmd As New SqlCommand(sqlQueryText, con) cmd.ExecuteNonQuery() End Using End Using 'Else 'message.Text="ConfigurationManager.ConnectionStrings('Dialerresults') is Nothing!" 'End If Catch ex As Exception message.Text = "Your file was not in the correct format. Please contact Customer Service at xxxxxxx." End Try End Sub End Class
Now the same code when I copy it in my project having masterpage . there is no compiler error but it is genereating excel file withno data in it but in fact there is data in the grid view at runtime.
I need to remove the password protection from an uploaded excel file. I have been doing this directly with the Excel assemblies (ASP.NET/C#) and then I tried with the ooxmlcrypto, both worked, but the problem is that I can not deploy either of those solution because of the dependencies.
I mean, my sysadmin does not want us installing Excel on the server and, as I have been searching for some alternatives, I can not find one.
My question is: is there a way to put the necessary dlls in the server without installing Excel or the Office suite?
I found this: Office 2007 Primary Interop Assemblies redistributable package but still, it requires a Microsoft Office Product.
I need to scrape a remote html page looking for images and links. I need to find an image that is "most likely" the product image on the page and links that are "near" that image. I currently do this with a javascript bookmarklet so that I am able to get the rendered x/y coordinates of images and links to help me determine if those are the ones that I want. What I want is the ability to get this information by just using a url and not the bookmarklet. The issues it that by using the url and trying something like httpwebrequest and getting the html on the server, I will not have location values since it wasn't rendered in a browser. I need the location of images and links to help me determine the images and links that I want.So how can I get html from a remote site on the server AND use the rendered location
I am creating an app in which i need to parse the documents uploaded by people, they can upload document either of the two types i.e. doc type or pdf type i want to know what are the various methods available to do that and which one is the best among them. iam creating the app in asp.net with c#
I'm having difficulties scraping dynamically generated table in ASPX. Trying to scrape the gas prices from a site like this GasPrices. I can extract all the information in the gas price table (address, time submitted etc.), except for the actual gas price.
Is there a way I could scrape the gas prices? i.e. somehow get a text representation of it. I'm not very familiar with ASP/ASPX - but what's being generated now is not showing up in the final HTML. I'm using Python to do the scraping, but that's irrelevant unless there's a specific library...
We have a site that was scraping a site to gather all models available of a product. The 3rd party site recently changed the website so it now uses ajax for users to select the manufacturer and then once they select that it loads a dropdown with products using ajax.
I currently was using httpwebrequest for all requests (see below).
Code: Public Function fnRequest(ByVal sPOSTData As String, Optional ByVal bAutoRedirect As Boolean = False) As String Dim uriSite As Uri Dim sReturn As String Dim srReader As StreamReader Dim sTemp As String
[Code] ....
Now in fiddler the post appears to be done with ajax. I tried to send the post data the normal way but it didn't like that. Any example of how to do this? To get an idea go to [URL] ... and see Mount finder and select Projector.
I've created a web application in asp.net so far. where i've tried to get some data(site scraping) from secure page of a web site.I've used the HttpWebRequest class for this functionality but i haven't accessed the secure page yet. Every time the login pages was scraped not secure page.I have the site user id and password and don't know that which language site has been developed in.
I'm trying to php/curl scrape data from an .NET site (those with __VIEWSTATE, __EVENTVALIDATION). I monitor headers and post vars using Tamper Data so I'm pretty sure I haven't missed anything. My approach is to micmic the post back when the user click on one of the links and parse the response. But the response I'm getting is a page redirect to "Unable to validate data".
I'm trying to write a small application to collect(Scrape) one piece of data from a web site. I would like to be able to simply run the app and it will open the page, find the one piece of data and display it. So far so good...my problem is that the web site is a secure site, meaning I have to provide a user name and password. I've searched all over the web, found many discussions but have yet to find anything that provides specifics on how to accomplish this. I understand a little bit about tokens etc, but I'm really looking for a detailed description of how to do this. Please feel free to direct me to a different forum if I'm in the wrong place.
I am building a site that need to scrape information from a partner site. Now my scraping code works great with other sites but not this one. It is a regular .html site. My thoughts is that it might be generated some how with php (site is build with php).
If it matters here is my code I use. The htmlDocument is htmlAgilityPack but that has nothing to do with it. Result is null on the site I try.
[Code]....
this is from the w3 validator, might have something with this? The site checked is this
[URL]
I am unable to validate this document because on line 422 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). check both the content of the file and the character encoding indication.
The error was: utf8 "xA9" does not map to Unicode""
I've created a web application in asp.net so far. where I've tried to get some data(site scraping) from secure page of a web site.I've used the HttpWebRequest class for this functionality but I haven't accessed the secure page yet. Every time the login pages was scraped not secure page.I have the site user id and password and don't know that which language site has been developed in.
I need to screen scrape a web page and change its style to match the look and feel of the site where it will be displayed in. Is this possible? I'll be using asp.net to do the screen scraping.
I am using asp.net oledb to export information to excel file. I encounter problems when the information to export becomes too big, in this case the code I have given below, the excel file generated becomes an empty spreadsheet. If I changed the loop to 1123 for insertion of the rows. The generated excel file is fine, 1125 rows, and 4 columns shown. A test program in windows form is also working fine regardless of how many rows. Code has been simplified, "information ..." in the sql insertion command represents 1803 characters.
ExcelObjConn = "Provider=Microsoft.Ace.OLEDB.12.0;" & _ "Data Source=" & fileName & ";Extended Properties=Excel 12.0 XML" ExcelConnection = New System.Data.OleDb.OleDbConnection(ExcelObjConn) ExcelConnection.Open() Try SqlCommand = "CREATE TABLE ABC ([row1] text, [row2] text, [row3] text, [row4] text)" ExcelCommand = New OleDb.OleDbCommand(SqlCommand, ExcelConnection) ExcelCommand.ExecuteNonQuery() ExcelCommand.Dispose() For i As Integer = 0 To 1124 SqlCommand = "Insert into ABC ([row1], [row2], [row3], [row4]) Values ('information...', 'information ...', 'information ...', 'information ...')" ExcelCommand = New OleDb.OleDbCommand(SqlCommand, ExcelConnection) ExcelCommand.ExecuteNonQuery() ExcelCommand.Dispose() Next Catch ex As Exception Finally If ExcelConnection IsNot Nothing Then ExcelConnection.Close() ExcelConnection.Dispose() End If End Try
I couldn't find a solution to my problem as well. What I did eventually was to run the process using another separate windows service. The code works perfectly fine running from a windows form or service program, but not asp.net, not sure why.
I have created an excel sheet from datatable using function. I want to read the excel sheet programatically using the below connectionstring. This string works fine for all other excel sheets but not for the one i created using the function. I guess it is because of excel version problem.
OleDbConnection conn= new OleDbConnection("Data Source='" + path +"';provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=Excel 8.0;";);
which i can create an excel sheet such that it is readable again using above query. I cannot use Microsoft InterOp library as it is not supported by my host. I have even changed different encoding formats. Still it doesnt work