Saturday, February 25, 2012

Data aggregation, syhcronization, and search

Hello All,
I know that this problem is not SQL Server specific but I decided to
post it here because it is a problem that I am certain many of you
expericenced (or at least thought about).
In my system, I have a central SQL Server Database and number of
remote "data providers". Some of these data providers are other SQL
Servers and others are other databases or other data provider services
that expose a database-like interface.
We provide two levels of search capability in the system. The user can
search either the central database, or search one specific data
provider. One of the new requirements, was to provide a global search,
which would search the central database and all the data providers at
once.
Performing a distributed is prohibitive in our scenario because we
have many data providers and they remotelly located. To minimize the
search time, my first reaction was to create a copy of the data from
the remote providers on the central database and keep it in synch with
the data in the remote data providers. By doing this, I would expect
to simplify the search by having simply a search in the central
database.
However, a number of new question concerning data synchronization
arise:
1. How can I keep the data in the central db in synch with the remote
data providers? (If they all were SQL Server databases, I would use
merger replication...but they aren't?)
2. How to handle disconnects? If the remote provider disconnets and
reconnects, some of the data might be stale. In pronciple, I could
take a new snapshot and keep the data in synch from that point.
However, this is very expensive!! Do you know of any other techniques
for data synchronization that would minimize network traffic?
3. Is this a good approach? What do you think?
Your opinion is gratly appreciated.
Kind regards
CD
Hi
"crbd98@.yahoo.com" wrote:

> Hello All,
> I know that this problem is not SQL Server specific but I decided to
> post it here because it is a problem that I am certain many of you
> expericenced (or at least thought about).
> In my system, I have a central SQL Server Database and number of
> remote "data providers". Some of these data providers are other SQL
> Servers and others are other databases or other data provider services
> that expose a database-like interface.
> We provide two levels of search capability in the system. The user can
> search either the central database, or search one specific data
> provider. One of the new requirements, was to provide a global search,
> which would search the central database and all the data providers at
> once.
> Performing a distributed is prohibitive in our scenario because we
> have many data providers and they remotelly located. To minimize the
> search time, my first reaction was to create a copy of the data from
> the remote providers on the central database and keep it in synch with
> the data in the remote data providers. By doing this, I would expect
> to simplify the search by having simply a search in the central
> database.
> However, a number of new question concerning data synchronization
> arise:
> 1. How can I keep the data in the central db in synch with the remote
> data providers? (If they all were SQL Server databases, I would use
> merger replication...but they aren't?)
> 2. How to handle disconnects? If the remote provider disconnets and
> reconnects, some of the data might be stale. In pronciple, I could
> take a new snapshot and keep the data in synch from that point.
> However, this is very expensive!! Do you know of any other techniques
> for data synchronization that would minimize network traffic?
> 3. Is this a good approach? What do you think?
> Your opinion is gratly appreciated.
> Kind regards
> CD
>
You don't say how you currently search the remote data providers! It may be
more acceptable for the users if the global search initially searched the
central database and returned the results and then searched the remote
databases which would return something to the user quicker and reduce the
need to speed up the remote searches.
If you held the data centrally, you would also need to know how much latency
would be acceptable for the data, if you could get away with uploading once a
day out of hours then this could be an easier solution to implement.
John
|||Answers inline.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
<crbd98@.yahoo.com> wrote in message
news:1176094164.164628.321030@.q75g2000hsh.googlegr oups.com...
> Hello All,
> I know that this problem is not SQL Server specific but I decided to
> post it here because it is a problem that I am certain many of you
> expericenced (or at least thought about).
> In my system, I have a central SQL Server Database and number of
> remote "data providers". Some of these data providers are other SQL
> Servers and others are other databases or other data provider services
> that expose a database-like interface.
> We provide two levels of search capability in the system. The user can
> search either the central database, or search one specific data
> provider. One of the new requirements, was to provide a global search,
> which would search the central database and all the data providers at
> once.
> Performing a distributed is prohibitive in our scenario because we
> have many data providers and they remotelly located. To minimize the
> search time, my first reaction was to create a copy of the data from
> the remote providers on the central database and keep it in synch with
> the data in the remote data providers. By doing this, I would expect
> to simplify the search by having simply a search in the central
> database.
> However, a number of new question concerning data synchronization
> arise:
> 1. How can I keep the data in the central db in synch with the remote
> data providers? (If they all were SQL Server databases, I would use
> merger replication...but they aren't?)
I would use transactional replication to replicate the remote SQL Servers to
the central location. For the non-SQL Server data providers depending on the
amount of changes, the volume of data and whether you can track changes you
should be able to write something that will bring the data locally.
> 2. How to handle disconnects? If the remote provider disconnets and
> reconnects, some of the data might be stale. In pronciple, I could
> take a new snapshot and keep the data in synch from that point.
> However, this is very expensive!! Do you know of any other techniques
> for data synchronization that would minimize network traffic?
You need to implement some method of change tracking so only the changes
will move each time rather than the entire data set.
> 3. Is this a good approach? What do you think?
>
The network hop is always problematic and moving data locally is one
approach to avoid it. Another option might be to move all providers locally
and then have the clients use a remote access solution to access the central
server/repository. This will work well if your links are stable and well
connected.

> Your opinion is gratly appreciated.
> Kind regards
> CD
>

No comments:

Post a Comment