Reindexing A Large SQL Server Database To Lucene?

Feb 24, 2011

We have a web service method which accepts some data and puts it in Lucene index. We use it to index new and updated entries from our asp.net web app.

These entries are stored in a large SQL Server table (20M rows and growing), and I need a way to be able to reindex the whole table in case if current index gets deleted or corrupted. I'm not sure what's the optimal way to retrieve chunks of data from a large table. Currently, we use the fact that the table has PK which is autoincrement, so we get chunks of 1000 rows until it starts to return nothing. Kind of like (in pseudo language):

i = 0
while (true)
{
SELECT col1, col2, col3 FROM mytable WHERE pk between i and i + 1000
.... if result is empty 20 times in a row, break ....
.... otherwise send result to web service to reindex ....
i = i + 1000
}

This way, we don't need to SELECT COUNT(*) which would be a big performance killer, and we just move up the pk values until we stop getting any results. This has it's con: if we have a hole greater than 20,000 values somewhere in the table, it will stop indexing assuming it reached the end, but that's a tradeoff we have to live for now.

way of getting data from a table to index? I would assume we are not the first ones facing this problem - search engines are widely used nowadays :)

View 3 Replies


Similar Messages:

SQL Server :: Insert Large Amount Of Data In Sql Server 2005 Database With Every Time Duplicate Check?

Feb 6, 2011

I want to generate 30,000 cards and each card must be duplicate check with database. In my card, there are 2 things. Serial No and CardID. If any card already exists then I generate another card id but with the same serial no.

So how faster way I can generate 30,000 card with duplicate check? Which one I have made application, it takes about 25 minutes to insert.

View 33 Replies

C# - SQL Server Database Optimization For Large Database?

Jun 7, 2010

I am developing a website with huge data which to be stored in SQL Server database. How should I optimize it to make it faster.

1. Using Stored procedures.

2. Functions / Views.

3. Any other methods

View 3 Replies

SQL Server :: Insert Large No Of Excel Sheet Rows Into Database Table?

Aug 17, 2010

I have an excel sheet that contains around 30,000 rows and 18 colmns. These no of rows and columns may increase in future. I need to read all these records from excel sheet and insert into a table in sql database. For reading the excel book I am using Oledbconnections. The possible solutions I have known as per my knowledge, to insert the data are

1. To insert one record at a time which makes 30,000 database hits. How will this affect the performance?
2. To use liked servers - bu this is not working for me.I do not have database permissions to use linked servers. So, the only option i have is the first one.

View 2 Replies

Lucene - Solrnet /.NET Sample Without MVC?

Apr 17, 2010

I am trying to get a handle on Solrnet and interacting an ASP.NET site with a Solr server. However, the sample app (on the code repository) is MVC based ,does anyone know of a version in plain vanilla ASP.NET?

View 1 Replies

.net - Lucene.net Create + Lock Errors In .NET?

May 26, 2010

-Edit- Important: I updated the code to not use obsoluete functions. Now only the NoSuchDirectoryException issue remains

Edit: NOTE i can bypass the NoSuchDirectoryException by creating the folder in a winform app and copy it. However i still have a LockObtainFailedException issue if i dont shut down properly.I have an issue with (Lucene.net 2.9.2

[https://svn.apache.org/repos/asf/lucene/lucene.net/tags/]. It throws a lock exception. After poking around i notice these things.

My code below works in an app bit when calling in Application_Start i get a NoSuchDirectoryException.Not closing the writer (as my code doesnt do below) i WILL get a LockObtainFailedException with the message

Lock obtain timed out: SimpleFSLock@<FULL_PATH> from either app or asp.net

These thread hinted when spawning threads they get less permissions then i do (but! my main thread has problems as well...) and one solution is to impersonate IIS. I am using visual studios 2010. I am not sure how full blown it is but my attempt to impersonate it failed.

So my question is how do i have lucene create the directory and not throw an exception if dont close the writer for some reason (such as power going out)?

http://stackoverflow.com/questions/2341163/why-is-my-lucene-index-getting-locked/2499285#2499285

http://stackoverflow.com/questions/1123517/lucene-net-and-i-o-threading-issue/1123981#1123981

static IndexWriter writer = null;
static void lucene_init()
{
[code].....

View 1 Replies

To Use Lucene (on Linux) And .net (on Windows) At The Same Time?

Jan 13, 2010

I want to start a new project I need performance as well as a neat and robust GUI

about the performance I have around 2 millions documents which I like to index'em by the help of lucene installed on linux due to its performance and security.

and about GUI I'd like to have flexible and professional look website and since I'm experienced with .net I'd like to retrieve the lucene's result and show it in my own way.

I've heard about some RESTful services available inside the lucene but I don't have any clue according to that and how to connect these two together.

how can I connect asp.net to lucene?

View 2 Replies

Random Sorting Results In Lucene.Net 2.4?

Oct 21, 2010

How do I sort my results in a random order. my code looks something like this at the moment:

Dim searcher As IndexSearcher = New IndexSearcher(dir, True)
Dim collector As TopScoreDocCollector = TopScoreDocCollector.create(100, True)
searcher.Search(query, collector)
Dim hits() As ScoreDoc = collector.TopDocs.scoreDocs
For Each sDoc As ScoreDoc In hits
'get doc and return
Next

View 1 Replies

Web Forms :: Retrieve Large File From Database?

Sep 21, 2010

I have a database table that works as a file repository. Currently there are binaries stored in there and I want to pull the "large" ones out in chunks. Some of these files are in excess of 500 MB. I have business rules that dictate if the file is >5MB to transmit in chunks. <5MB and I can load into memory and rip out. I got the uploading in chunks to work, but how do I get it to pull it out of the DB in chunks?

Right now I'm getting hit with a 'System.OutOfMemory' exception. But when I recreate the byte array of the SAME size (empty though) it doesn't break.

Download Chunks (DAL)

public byte[] getBytesByDataID(int chunkSize, string dataID) { string query = "SELECT data.data " +" FROM data " + " WHERE dataID = @dataID"; openConnection(); cmd = new SqlCommand(query, myConnection); cmd.Parameters.AddWithValue("@dataID", dataID);

[code]....

View 3 Replies

C# - Finding Open Source Applications That Use Lucene.net?

Jun 11, 2010

I am looking for any open source application that uses lucene.net. I am working on a complicated web application and would like to see how others have implemented lucene.net.

View 2 Replies

How To Handle Very Frequent Updates To A Lucene Index

Nov 4, 2010

Am using Lucene.net 3.0 in my Application which as frequent updates to index.

But when new data on a forum it's not available in the search . it's taking few minutes to update index.
how can i overcome this.....

View 1 Replies

C# - How To Efficiently Send Large Files From The Database To The Browser

Jun 1, 2010

In my web application I am working with files. Some files are very large. I use Response.Write() to write the file to the browser. This goes well for the smaller files, but for large files this can take a while and the bandwidth is fully used.

Is it possible to split large documents and send it piece by piece to the browser? Are there other ways to send the document quicker to the browser?I hold the document as a property of an object.

View 6 Replies

DataSource Controls :: Storing Large Files In Database?

Aug 6, 2010

I want to upload some large files from a web page to MS SQL Sever database, I am very sure that it is sounding weird.

File sizes are around 100MB.

I am having following settings,

SessionTimeOut period = 60 Mins,

Server Operation timeout = 60 Mins,

SQL Connection Timeout = 4 mins (Not sure if this is helping)

This is page is going to be used by our client only once a week & as they are having web farm environment we are avpiding to store these files on file systems.

Currenty we are able to upload files with sizes upto 8 MB succesfully. But when we are uploading a file of 100MB it fails, its for sure that operation takes lots of time.

View 4 Replies

C# - Running Long Process: Indexing 5GB Docs With Lucene?

May 14, 2010

Situation:I have an ASP .NET application that will search through docs using Lucene. I want to run the initial indexing (the index will be incremental after the initial run so there wont be need to index the whole directory again in future). Currently, I have about 5GB of docs (45000files).Problem: My application times out before completing the process. I have altered the TimeOut like this:HttpContext.Current.Server.ScriptTimeout = 200000;but it still does not complete the process.

View 1 Replies

How Much Entity Framework Is Good For Database That Contains Tables With Large Records

Jul 28, 2010

I am working with VS 2010, Entity framework, SQl-Server 2005, ASP.Net web forms. Currently, I am working on the Data access layer library which soon will be a web service, using Entity Framework collaboration with different design patterns like repository pattern and some best practices that posts in different blogs. I am also test each repository using the Unit testing project. Thumbs up! Working fine.

The thing I am worried about is, how much is good for retrieving data from a table that can contain 80-100k records ?

View 1 Replies

Data Controls :: Where To Store Large Files (in Database Or In Folder)

Apr 8, 2013

I am curious to know the following:

We are going to develop a website where we need to store large number of files. The file size may be uptp 50 MB. What approach should we follow:

1) We should store files in Database
2) We should have a directory and store all the files in that
3) We should hire a SAN Storage and use this seperate location to store the files.

View 1 Replies

Lucene.net - Library To Extract Plain Text From Open XML File Formats?

May 6, 2010

Is there a pre-existing library to extract plain text form Open XML file formats (e.g. docx, pptx, and xlsx) files?

I require this to populate a lucene.net index.

I've found this example which extracts text from docx and it seems to work okay. But before building my own solution based on this I was wondering if there's something already available for the other file formats?

View 2 Replies

SQL Server :: How To Insert A Large Bytes

Jan 3, 2011

In my program, I need update a filed to the database. The field is for a pdf file content. I know, maybe I should save the pdf file to the local file system, and save the file path to the SQL. but, right now, it is the file content saved to the SQL.

from my program, it loads the file which is input by user. I use File.ReadAllBytes. Then, run the stored procedure on the SQL server, and use the bytes[] as parameter, to insert the file to the SQL. it works fine when file size is small. However, as file size becomes larger, say 200MB, sometimes, it pops out of memory exception.

So, I'm thinking, maybe I can load parts of the file, say 1MB a time, then update the database, then loop. so, this way, no matter how large the file is, it should not have any problem.

my question is: is there a standard way to do this? so, I should not save file content to the SQL at all? shoudl save file in file system, save the file path to sql only? or there are other way to deal with this?

View 1 Replies

Forms Data Controls :: Formatting Text Within Large Comment Field Stored In SQL Database?

Jun 15, 2010

I'm using ASP .NET C# 3.5. I have a multiline textbox on my web form that allows for the input of up to 5,000 characters from the end-user. This text is a basic description of a training course. I need to display it out in a clearly formatted way. For example, I need there to be bullets and bold text.

What I did was I chose certain (not often used) characters and then used the .Replace method when displaying the text in an <asp:Label>. If the text in the database contains the character '~' then I replace that with a line break <br />. If it contains '`' I replace that with <b> and if it contains '^' I replace that with </b>.

Is there a better way of doing this? It is working properly, and I'm displaying the text properly, but I know the end-user is going to hate typing text like this for formatting. I do want this all to stay database driven as well

View 4 Replies

SQL Server :: Store Large Amount Of Data In DB?

Dec 18, 2010

i want to to sotre large amount of character in one field of a table i use text-nvarchar(max) and but it show me this exception

(String or binary data would be truncated The statement has been terminated.) I know that text and nvarchar can store 8000 character

View 7 Replies

Web Forms :: Upload Large Files From Browser To Server Using Ftp?

Feb 9, 2010

I have a requirement to upload large files using dotnet using ftp. Any opensource programs available?

View 5 Replies

MVC :: Handling Large Number Of Images Downloaded From Server?

Feb 17, 2011

Handling large number of images downloaded from server

View 4 Replies

How To Large Product Catalog With Statistics - Alternatives To Sql Server

Apr 20, 2010

I am building UI for a large product catalog (millions of products).

I am using Sql Server, FreeText search and ASP.NET MVC.

Tables are normalized and indexed. Most queries take less then a second to return.

The issue is this. Let's say user does the search by keyword. On search results page I need to display/query for:

Display 20 matching products on first page(paged, sorted)
Total count of matching products for paging
List of stores only of all matching products
List of brands only of all matching products
List of colors only of all matching products

Each query takes about .5 to 1 seconds. Altogether it is like 5 seconds.

I would like to get the whole page to load under 1 second. There are several approaches:

Optimize queries even more. I already spent a lot of time on this one, so not sure it can be pushed further.Load products first, then load the rest of the information using AJAX. More like a workaround. Will need to revise UI.Re-organize data to be more Report friendly. Already aggregated a lot of fields.

I checked out several similar sites. For ex. [URL]. Not only they display the same information as I would like in under 1 second, but they also include statistics (number of results in each category).

The following is the search for keyword "white" [URL] How do sites like zappos, amazon make their results, filters and stats appear almost instantly?

View 3 Replies

SQL Server :: Query Slow On Large Volume Of Data?

Dec 27, 2010

In my database when I fire query it takes 40 secs on 1 crore data, similar when I use join with other table then it take more time. I have taken care non cluster index such thing. But still I want to optimize my query, what other thing I need to take like buffer, disk size etc. I am not sure on this area.

View 11 Replies

SQL Server :: Data For The Column Xx Is To Large For The Specified Buffer Size?

Feb 17, 2011

I have a DTS script which transforms a Tab delimited text file to a table. I get the following error when trying to transform the data of a field that is greater than 255 chars:

[Code]....

I have seen this issue with importing from Excel with the jet 4.0 engine, however I am importing a text file.

View 3 Replies







Copyrights 2005-15 www.BigResource.com, All rights reserved