Web Forms :: Webrequest And Uses Data Scraping Technique To Retrieve Some Url Links
Feb 24, 2010
I am working on a project that uses data scraping technique to retrieve some url links. I encounter this problem when i pass in the url of a [previous page button] link frm the html code and pass it in to httpWebRequest, the httpWebResponse that i get back is different form the actually content. i have been try to sovle this problem for days and no result, as anyone encounter similar problem and manage to sovle it? below is my sample code: [previous page button] [URL] note: i have change the domain name to a dummy address which is localhost
I am new to web programming. I am developing a web program using asp.net(vb) that scrapes data of a certain website. I am using System.Net.HttpWebRequest and System.Net.HttpWebResponse to read the HTTP codes.My problem is I can not retrieve the codes of certain frame/container where the data that I needed is located in the website.I understand that iframe has its own URL or link aside from the main URL of the website
Every night I have a program that runs that creates a natural gas report. In this report is something called "expert analysis." The is just text which is a commentary of the current Natural Gas market and because it's a volatile market there is new commentary every day. So also every day, an employee in our office runs a public webpage which displays this text, copies and pastes it into a webpage that is part of our system and it gets saved to a file which gets pulled into our nightly report.
The page that is in our system is as ASP page and it uses FileUp to create the file. We want to break our dependancy on FileUp (it's a SoftArtisans product, an excellent vendor, but we are trying to save some money). Right now we don't have FileUp because our trial expired. So every day I run the public webpage and copy the text and paste it to my expert.txt file all manually. Another problem is that the expert analysis is updated after our office staff has all gone home,
Here is my question - can I write a program in .NET that runs a page, parses the resulting HTML and extracts out the expert analysis all automatically? I won't need FileUp anymore becaus I can use the .NET upload control. And I won't have to log on every night to manually copy and paste the expert analysis. Is that what web/screen scraping is? What happens then the public page's format changes because they do maintenance on their site? Is it just like if I were to run [URL] and try to read the Moderator column of the ASP.NET forum? I'd be okay for a while (would I?) but then if the layout changed I wouldn't find the Moderator anymore?
I'm trying to php/curl scrape data from an .NET site (those with __VIEWSTATE, __EVENTVALIDATION). I monitor headers and post vars using Tamper Data so I'm pretty sure I haven't missed anything. My approach is to micmic the post back when the user click on one of the links and parse the response. But the response I'm getting is a page redirect to "Unable to validate data".
Need to have the server make a POST to an API, how do I add POST values to a WebRequest object and how do I send it and get the response (it will be a string) out?I need to POST TWO values, and sometimes more, I see in these examples where it says string postData = "a string to post"; but how do I let the thing I am POSTing to know that there is multiple form values?
I am trying to send data to DotNetOpenAuth website as described here [URL] Sender receive (500) Internal Server Error. The same code for blank website without DotNetOpenAuth works fine. Should I tweak something?
I am building a site that need to scrape information from a partner site. Now my scraping code works great with other sites but not this one. It is a regular .html site. My thoughts is that it might be generated some how with php (site is build with php).
If it matters here is my code I use. The htmlDocument is htmlAgilityPack but that has nothing to do with it. Result is null on the site I try.
[Code]....
this is from the w3 validator, might have something with this? The site checked is this
[URL]
I am unable to validate this document because on line 422 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). check both the content of the file and the character encoding indication.
The error was: utf8 "xA9" does not map to Unicode""
I have created a simple webrequest script to return the HTML content of a remote site and everything works great on almost every site I tap. However, the site I want to utiilize this script on returns no content. It does return headers, but no content!! Does anyone know if AJAX or some other method could be utilized to block the use of webrequests?
I've created a web application in asp.net so far. where I've tried to get some data(site scraping) from secure page of a web site.I've used the HttpWebRequest class for this functionality but I haven't accessed the secure page yet. Every time the login pages was scraped not secure page.I have the site user id and password and don't know that which language site has been developed in.
I have a parent page which has a button.When the button is clicked , it will bring user to the target page with variables from parent page.
My problem is my target page do not show up in the ie even though i have successfully pass the variable to the target page.By debugging, i can really see that i have already reach the target with all the variable from parent page is successfully read. BUT the target page don not show up in the ie, the page in the ie is still parent page.
I seem to be having some challenges with the data I am retriveing from a Webpage using the Webclient class. The code works fine, however I observe that the regular expression is not picking up the negative or positive sign in the Daily_Movement data. For example, a daily movement can be -0.31 or +0.31 but the code is not picking the sign in front of the decimal values.Here is my code
[Code]....
I think where the problem lies is the part of the code Regex r1 = new Regex("<span class="quoteData">.*</span>"); It picks up the values between the tag quite well, but not the signs in front of it. [Code]....
I need to scrape a remote html page looking for images and links. I need to find an image that is "most likely" the product image on the page and links that are "near" that image. I currently do this with a javascript bookmarklet so that I am able to get the rendered x/y coordinates of images and links to help me determine if those are the ones that I want. What I want is the ability to get this information by just using a url and not the bookmarklet. The issues it that by using the url and trying something like httpwebrequest and getting the html on the server, I will not have location values since it wasn't rendered in a browser. I need the location of images and links to help me determine the images and links that I want.So how can I get html from a remote site on the server AND use the rendered location
I get an error on da line objPost.Close();.....the unusual error is that when I debug this code line by line slowly using F10 in visual studio 2010...the code works..but when I just run the program or even debug the program fast...it throws an error at that line.. it gives an error that the connection which was expected to be open was closed by the server..
How the does the gridview populate all records?.As the reader is capable of reading one row per read,I thought only the last row is available for GridView to display.
I'm having difficulties scraping dynamically generated table in ASPX. Trying to scrape the gas prices from a site like this GasPrices. I can extract all the information in the gas price table (address, time submitted etc.), except for the actual gas price.
Is there a way I could scrape the gas prices? i.e. somehow get a text representation of it. I'm not very familiar with ASP/ASPX - but what's being generated now is not showing up in the final HTML. I'm using Python to do the scraping, but that's irrelevant unless there's a specific library...
We have a site that was scraping a site to gather all models available of a product. The 3rd party site recently changed the website so it now uses ajax for users to select the manufacturer and then once they select that it loads a dropdown with products using ajax.
I currently was using httpwebrequest for all requests (see below).
Code: Public Function fnRequest(ByVal sPOSTData As String, Optional ByVal bAutoRedirect As Boolean = False) As String Dim uriSite As Uri Dim sReturn As String Dim srReader As StreamReader Dim sTemp As String
[Code] ....
Now in fiddler the post appears to be done with ajax. I tried to send the post data the normal way but it didn't like that. Any example of how to do this? To get an idea go to [URL] ... and see Mount finder and select Projector.
I am trying to "behind the scenes" log myself into a website, from the VB code behind my ASP.NET website. But I am dumbfounded as to how to do this.
As far as I know I should be using the WebRequest or Webclient class. That is about as much as I know. I am not sure how to use the class.
I want to click a button on my website and have its Click event send a username and password to another website. This other site isot affiliated with mine. I realize the concept may seem stupid, but I plan on taking this further later, but Just need to know this now.
I have MVC application (applies to non MVC as well) where a user is posting in data. I need to take this data, send it off to two seperate end points (one using a WebRequest form POST and one using a Web Service), parse the result, and send the result back to the original user.
The issue at hand is that both end points take about 20-30 seconds to respond (response is a string) which means that I should probably execute these two calls asynchronously. At the same time I want to wait to respond to the original user until I get both results back. I am guessing I might have to use some sort of object lock so the response does not get sent back before the two calls are complete?
Based on the responses I decided to go with async controllers since I am already working with a MVC application.