Reindexing A Large SQL Server Database To Lucene?
Feb 24, 2011
We have a web service method which accepts some data and puts it in Lucene index. We use it to index new and updated entries from our asp.net web app.
These entries are stored in a large SQL Server table (20M rows and growing), and I need a way to be able to reindex the whole table in case if current index gets deleted or corrupted. I'm not sure what's the optimal way to retrieve chunks of data from a large table. Currently, we use the fact that the table has PK which is autoincrement, so we get chunks of 1000 rows until it starts to return nothing. Kind of like (in pseudo language):
i = 0
while (true)
{
SELECT col1, col2, col3 FROM mytable WHERE pk between i and i + 1000
.... if result is empty 20 times in a row, break ....
.... otherwise send result to web service to reindex ....
i = i + 1000
}
This way, we don't need to SELECT COUNT(*) which would be a big performance killer, and we just move up the pk values until we stop getting any results. This has it's con: if we have a hole greater than 20,000 values somewhere in the table, it will stop indexing assuming it reached the end, but that's a tradeoff we have to live for now.
way of getting data from a table to index? I would assume we are not the first ones facing this problem - search engines are widely used nowadays :)
View 3 Replies
Similar Messages:
Feb 6, 2011
I want to generate 30,000 cards and each card must be duplicate check with database. In my card, there are 2 things. Serial No and CardID. If any card already exists then I generate another card id but with the same serial no.
So how faster way I can generate 30,000 card with duplicate check? Which one I have made application, it takes about 25 minutes to insert.
View 33 Replies
Jun 7, 2010
I am developing a website with huge data which to be stored in SQL Server database. How should I optimize it to make it faster.
1. Using Stored procedures.
2. Functions / Views.
3. Any other methods
View 3 Replies
Aug 17, 2010
I have an excel sheet that contains around 30,000 rows and 18 colmns. These no of rows and columns may increase in future. I need to read all these records from excel sheet and insert into a table in sql database. For reading the excel book I am using Oledbconnections. The possible solutions I have known as per my knowledge, to insert the data are
1. To insert one record at a time which makes 30,000 database hits. How will this affect the performance?
2. To use liked servers - bu this is not working for me.I do not have database permissions to use linked servers. So, the only option i have is the first one.
View 2 Replies
Apr 17, 2010
I am trying to get a handle on Solrnet and interacting an ASP.NET site with a Solr server. However, the sample app (on the code repository) is MVC based ,does anyone know of a version in plain vanilla ASP.NET?
View 1 Replies
May 26, 2010
-Edit- Important: I updated the code to not use obsoluete functions. Now only the NoSuchDirectoryException issue remains
Edit: NOTE i can bypass the NoSuchDirectoryException by creating the folder in a winform app and copy it. However i still have a LockObtainFailedException issue if i dont shut down properly.I have an issue with (Lucene.net 2.9.2
[https://svn.apache.org/repos/asf/lucene/lucene.net/tags/]. It throws a lock exception. After poking around i notice these things.
My code below works in an app bit when calling in Application_Start i get a NoSuchDirectoryException.Not closing the writer (as my code doesnt do below) i WILL get a LockObtainFailedException with the message
Lock obtain timed out: SimpleFSLock@<FULL_PATH> from either app or asp.net
These thread hinted when spawning threads they get less permissions then i do (but! my main thread has problems as well...) and one solution is to impersonate IIS. I am using visual studios 2010. I am not sure how full blown it is but my attempt to impersonate it failed.
So my question is how do i have lucene create the directory and not throw an exception if dont close the writer for some reason (such as power going out)?
http://stackoverflow.com/questions/2341163/why-is-my-lucene-index-getting-locked/2499285#2499285
http://stackoverflow.com/questions/1123517/lucene-net-and-i-o-threading-issue/1123981#1123981
static IndexWriter writer = null;
static void lucene_init()
{
[code].....
View 1 Replies
Jan 13, 2010
I want to start a new project I need performance as well as a neat and robust GUI
about the performance I have around 2 millions documents which I like to index'em by the help of lucene installed on linux due to its performance and security.
and about GUI I'd like to have flexible and professional look website and since I'm experienced with .net I'd like to retrieve the lucene's result and show it in my own way.
I've heard about some RESTful services available inside the lucene but I don't have any clue according to that and how to connect these two together.
how can I connect asp.net to lucene?
View 2 Replies
Oct 21, 2010
How do I sort my results in a random order. my code looks something like this at the moment:
Dim searcher As IndexSearcher = New IndexSearcher(dir, True)
Dim collector As TopScoreDocCollector = TopScoreDocCollector.create(100, True)
searcher.Search(query, collector)
Dim hits() As ScoreDoc = collector.TopDocs.scoreDocs
For Each sDoc As ScoreDoc In hits
'get doc and return
Next
View 1 Replies
Sep 21, 2010
I have a database table that works as a file repository. Currently there are binaries stored in there and I want to pull the "large" ones out in chunks. Some of these files are in excess of 500 MB. I have business rules that dictate if the file is >5MB to transmit in chunks. <5MB and I can load into memory and rip out. I got the uploading in chunks to work, but how do I get it to pull it out of the DB in chunks?
Right now I'm getting hit with a 'System.OutOfMemory' exception. But when I recreate the byte array of the SAME size (empty though) it doesn't break.
Download Chunks (DAL)
public byte[] getBytesByDataID(int chunkSize, string dataID) { string query = "SELECT data.data " +" FROM data " + " WHERE dataID = @dataID"; openConnection(); cmd = new SqlCommand(query, myConnection); cmd.Parameters.AddWithValue("@dataID", dataID);
[code]....
View 3 Replies
Jun 11, 2010
I am looking for any open source application that uses lucene.net. I am working on a complicated web application and would like to see how others have implemented lucene.net.
View 2 Replies
Nov 4, 2010
Am using Lucene.net 3.0 in my Application which as frequent updates to index.
But when new data on a forum it's not available in the search . it's taking few minutes to update index.
how can i overcome this.....
View 1 Replies
Jun 1, 2010
In my web application I am working with files. Some files are very large. I use Response.Write() to write the file to the browser. This goes well for the smaller files, but for large files this can take a while and the bandwidth is fully used.
Is it possible to split large documents and send it piece by piece to the browser? Are there other ways to send the document quicker to the browser?I hold the document as a property of an object.
View 6 Replies
Aug 6, 2010
I want to upload some large files from a web page to MS SQL Sever database, I am very sure that it is sounding weird.
File sizes are around 100MB.
I am having following settings,
SessionTimeOut period = 60 Mins,
Server Operation timeout = 60 Mins,
SQL Connection Timeout = 4 mins (Not sure if this is helping)
This is page is going to be used by our client only once a week & as they are having web farm environment we are avpiding to store these files on file systems.
Currenty we are able to upload files with sizes upto 8 MB succesfully. But when we are uploading a file of 100MB it fails, its for sure that operation takes lots of time.
View 4 Replies
May 14, 2010
Situation:I have an ASP .NET application that will search through docs using Lucene. I want to run the initial indexing (the index will be incremental after the initial run so there wont be need to index the whole directory again in future). Currently, I have about 5GB of docs (45000files).Problem: My application times out before completing the process. I have altered the TimeOut like this:HttpContext.Current.Server.ScriptTimeout = 200000;but it still does not complete the process.
View 1 Replies
Jul 28, 2010
I am working with VS 2010, Entity framework, SQl-Server 2005, ASP.Net web forms. Currently, I am working on the Data access layer library which soon will be a web service, using Entity Framework collaboration with different design patterns like repository pattern and some best practices that posts in different blogs. I am also test each repository using the Unit testing project. Thumbs up! Working fine.
The thing I am worried about is, how much is good for retrieving data from a table that can contain 80-100k records ?
View 1 Replies
Apr 8, 2013
I am curious to know the following:
We are going to develop a website where we need to store large number of files. The file size may be uptp 50 MB. What approach should we follow:
1) We should store files in Database
2) We should have a directory and store all the files in that
3) We should hire a SAN Storage and use this seperate location to store the files.
View 1 Replies
May 6, 2010
Is there a pre-existing library to extract plain text form Open XML file formats (e.g. docx, pptx, and xlsx) files?
I require this to populate a lucene.net index.
I've found this example which extracts text from docx and it seems to work okay. But before building my own solution based on this I was wondering if there's something already available for the other file formats?
View 2 Replies
Jan 3, 2011
In my program, I need update a filed to the database. The field is for a pdf file content. I know, maybe I should save the pdf file to the local file system, and save the file path to the SQL. but, right now, it is the file content saved to the SQL.
from my program, it loads the file which is input by user. I use File.ReadAllBytes. Then, run the stored procedure on the SQL server, and use the bytes[] as parameter, to insert the file to the SQL. it works fine when file size is small. However, as file size becomes larger, say 200MB, sometimes, it pops out of memory exception.
So, I'm thinking, maybe I can load parts of the file, say 1MB a time, then update the database, then loop. so, this way, no matter how large the file is, it should not have any problem.
my question is: is there a standard way to do this? so, I should not save file content to the SQL at all? shoudl save file in file system, save the file path to sql only? or there are other way to deal with this?
View 1 Replies
Jun 15, 2010
I'm using ASP .NET C# 3.5. I have a multiline textbox on my web form that allows for the input of up to 5,000 characters from the end-user. This text is a basic description of a training course. I need to display it out in a clearly formatted way. For example, I need there to be bullets and bold text.
What I did was I chose certain (not often used) characters and then used the .Replace method when displaying the text in an <asp:Label>. If the text in the database contains the character '~' then I replace that with a line break <br />. If it contains '`' I replace that with <b> and if it contains '^' I replace that with </b>.
Is there a better way of doing this? It is working properly, and I'm displaying the text properly, but I know the end-user is going to hate typing text like this for formatting. I do want this all to stay database driven as well
View 4 Replies
Dec 18, 2010
i want to to sotre large amount of character in one field of a table i use text-nvarchar(max) and but it show me this exception
(String or binary data would be truncated The statement has been terminated.) I know that text and nvarchar can store 8000 character
View 7 Replies
Feb 9, 2010
I have a requirement to upload large files using dotnet using ftp. Any opensource programs available?
View 5 Replies
Feb 17, 2011
Handling large number of images downloaded from server
View 4 Replies
Apr 20, 2010
I am building UI for a large product catalog (millions of products).
I am using Sql Server, FreeText search and ASP.NET MVC.
Tables are normalized and indexed. Most queries take less then a second to return.
The issue is this. Let's say user does the search by keyword. On search results page I need to display/query for:
Display 20 matching products on first page(paged, sorted)
Total count of matching products for paging
List of stores only of all matching products
List of brands only of all matching products
List of colors only of all matching products
Each query takes about .5 to 1 seconds. Altogether it is like 5 seconds.
I would like to get the whole page to load under 1 second. There are several approaches:
Optimize queries even more. I already spent a lot of time on this one, so not sure it can be pushed further.Load products first, then load the rest of the information using AJAX. More like a workaround. Will need to revise UI.Re-organize data to be more Report friendly. Already aggregated a lot of fields.
I checked out several similar sites. For ex. [URL]. Not only they display the same information as I would like in under 1 second, but they also include statistics (number of results in each category).
The following is the search for keyword "white" [URL] How do sites like zappos, amazon make their results, filters and stats appear almost instantly?
View 3 Replies
Dec 27, 2010
In my database when I fire query it takes 40 secs on 1 crore data, similar when I use join with other table then it take more time. I have taken care non cluster index such thing. But still I want to optimize my query, what other thing I need to take like buffer, disk size etc. I am not sure on this area.
View 11 Replies
Feb 17, 2011
I have a DTS script which transforms a Tab delimited text file to a table. I get the following error when trying to transform the data of a field that is greater than 255 chars:
[Code]....
I have seen this issue with importing from Excel with the jet 4.0 engine, however I am importing a text file.
View 3 Replies