5down voteaccepted
Looks fine. There are options if you start to run into performance
issues, but if this is stable (finishes within data-loading time
window, source systems aren't being drained of resources, etc),
then I see no reason to change.
Some potential issues to keep an eye on...
1.having 20+ full-cache lookup-transforms may pose a problem if
your dimensions increase in size...due to memory constraints on the
SSIS system...but since they are type 1, I wouldn't worry.
2.full-cache lookups "hydrate" pre-execution...having 20+ of them
may slow you down
A common alternative (to what you have above) is to extract the
fact table data from the source system and land it in a staging
area before doing the dimension key lookups via a single SQL
statement. Some even keep a set of dimension key mapping tables in
the staging area specifically for this purpose. This reduces
locking/blocking on the source system...if you have a lot of data
each load, and have to block the source system while you suck the
data out and run it through those 20+ lookup transforms.
Having a good staging area strategy becomes more important when you
have a large amount of data, large dimensions, complex key mappings
(usually due to multiple source systems), and short data-loading
time windows.
Lookup tables for dimensions denote natural hierarchies in a
Snowflake schema.
The Lookup transformation can route unmatched data to an error
flow.
incorrect:Try to use fully cached lookups when loading fact tables
to get the needed surrogate keys from dimension tables
Incorrect: You can only change the SQL command of the Lookup task
when you are using property expressions for the data flow task that
has a Lookup task inside it.
Incorrect: Lookup is a synchronous transformation.
Check whether the Lookup transformation is fully cached and that
the query contains only the needed columns. The last step is to
optimize the buffer
settings of the data flow.
We are currently working on SSRS reports styling using CSS style
sheets or templates. We need these styling changes to get reflected
in all the reports developed based on this template by changing the
source template without modifying each and every report.
In this regard, we would like to find out if anyone has expertise
in these functionalities. Appreciate if the respective resources
can get in touch with us at the earliest or any pointers is also
fine with us.
MERGE JOIN
:This transformation has two inputs and one output. It does not
support an error output.
•Specify the join is a FULL, LEFT, or INNER join.
•Specify the columns the join uses.
•Specify whether the transformation handles null values as equal
to other nulls.
Note
If null values are not treated as equal values, the transformation
handles null values like the SQL Server Database Engine does.
NULL Handling
We can specify whether the transformation handles null values as
equal to other nulls. Set the property TreatNullsAsEqual to True.
If null values are not treated as equal values, the transformation
handles null values like any ANSI database does.
SSIS Merge Join - Replace Null Columns
I normally use a conditional split after the merge join, the use
the ISNULL function to direct the rows to a DERIVED COLUMN
transformation, the simply replace the column with the value you
want.
then after that, put a UNION ALL transformation in and connect from
the conditional split and the derived column.
Unionall
This transformation has multiple inputs and one output. It does not
support an error output.
By including the Merge transformation in a data flow, you can
perform the following tasks:
•Merge data from two data sources, such as tables and files.
•Create complex datasets by nesting Merge transformations.
•Remerge rows after correcting errors in the data.
The Merge transformation is similar to the Union All
transformations. Use the Union All transformation instead of the
Merge transformation in the following situations:
•The transformation inputs are not sorted.
•The combined output does not need to be sorted.
•The transformation has more than two inputs.
==================================================================
--Monitor Data Flow
--Integration Services includes features and tools that you can use
to troubleshoot the data flow in a package during execution.
--Tapping the data flow during execution
Declare @execid bigint
--1. Create an execution instance of a package by using the
(ssms-->RC pkg.dtsx and click execute ,2nd click script,3.now add
data taps
EXEC [SSISDB].[catalog].[create_execution] @folder_name=N'ETL
Folder', @project_name=N'ETL Project',
@package_name=N'Package.dtsx', @execution_id=@execid OUTPUT
--2. Add a data tap by using either (
EXEC [SSISDB].[catalog].add_data_tap
@execution_id = @execid,
@task_package_path = '\Package\Data Flow Task',/*ssms--
>dft-->properties:pkg path:(value) is task pkg path*/
@dataflow_path_id_string = 'Paths[Flat File Source.Flat File
Source Output]',/*flatfilessource output(datapath;ie;arrow mark of
source & dest;arrow mark properties>identification string*/
@data_filename = 'output.txt' /*in <Program Files>
\Microsoft SQL Server\110\DTS\DataDumps */
--3. Start the execution instance of the package by using the
EXEC [SSISDB].[catalog].[start_execution] @execid
==================================================================
-So you should handle rows which cause Error. this is the time when
Failure Error Output comes to help.
--Ignore Failure will ignore any errors during insertion of
records to destination
--Redirect Row will redirect any rows which cause error during
insertion to destination
--Fail Component will cause to fail component when insertion
hits any error
===============================================================
there are some packages and moved to production. 12:34 PM
later there were some changes need to make and the production
support does not know how to change.. 12:35 PM
how will you make it from developemnet env
2 and 1 more q 12:38 PM
which is better to use 12:38 PM
scd or lookup 12:38 PM
tell me quick answer
yeah,but corresponding event is might not available 5:11 PM
ie;restart,if task filaure 5:11 PM
taskfailure/success status keep in one temp table 5:12 PM
and identify if taskfailure status is available 5:12 PM
by using eventhandlers
You can have a table (TaskName,ExecutionStatus)that has all the
task names with execution status as 0. Then before executing a task
you need to read execution status for that particular task in a
package variable and based on the value you can decide
to execute the task (precedence constraint with expression). Once
the task executes successfully you need to update the table and set
the executionstatus as 1.
ExecutionStatus 0 means the task needs to be executed and 1 means
that task has been executed.
Any reason why you arenot using checkpoints?Nitesh Rai- Please mark
the post as answered if it answers your question
5:21 PM
1 more q is there.. 5:21 PM
in a shared location there are 5 flat files..they need to be
processed to the staging..among them 1 flat file missing..or there
is some corrupted data in a column 5:23 PM
how will you handle that?
A: How to check whether all the data loading into staging from flat
files which are in shared drvie
Read All 7 Posts
Have a data flowtask with the flat file source and rocount
transform to get the count of records in file. Now connect this
data flow to the existing data flow. In the existing data flow task
have another row count transfom one before the oledb destination
and one for error records.
now in the control flow check if the row count initially mtches the
count after the load are thereany records in the error.My Blog|Ask
Me
if match then succesfully loaded,otherwise if any rows are not
matched then willl send a mail 6:04 PM
ok
===================================================
http://stackoverflow.com/questions/32097020/adding-new-column-to-
excel-source-in-ssis
1. Adding new column to Excel Source in SSIS
2.How to handle datatype double-precision float [DT_R8] in ssis
data flow task
source(float double-precision float [DT_R8].)
destination(varchar(50)
Result:+------------+---------------------------------+
| SourceValue| DestinationValue After execution|
+------------+---------------------------------+
| 0.579 | 0.578999999999999965
3.
connecting to a FTP server that has a certificate using SSIS
up vote1down votefavorite
3
I am trying to download files from a FTP site inside my production
environment. The FTP site has been assigned with a certificate
(pfx). The FTP port is 21.
Due to this certificate I am unable to download the content using
FTP task in SSIS.
I tried using script task however I got the same error that ftp
cannot accomplish the connection due to certificate.
I also tried with SFTP component but that gave me a error shown
below
I googled the error and found out that SFTP does not use TLS/SSL
(neither implicit nor explicit). Explicit/Implict TLS/SSL is used
with FTP from the post
Below are the workaround for this...
There are three SSL options when we create a FTP site in IIS.
1.Allow SSL
2.Require SSL
3.No SSL
In the FTP site that I had has the Require SSL option selected. I
changed that option to Allow SSL which still enables the SSL
feature for the FTP but allows the FTP task
=======================================================
Report Layout
--What is Tablix?
--The best of Table merged with the best of Matrix
--Allows for fixed and dynamic columns and rows
--Introduces optional omission of row/column headers
--Innovative layout &Grouping capabilities
--Support for richly formatted text
___________________________________________________________________
--Chart Data:Sumvalue,Category Group,Series Group
--Tablix Properties:Border,Data
Only,Fill,Filter,General,localization,No
Rows,others,Positions,location,size,PublishedReportPart,Visiblity
--General Tab:Datasetname,FixedColumnHeaders(F),FixedRowHeaders
(F),GroupsBeforeRowHeaders(0),KeepTogether
(F),Name,PageBreak,PageName,RepeatRowHeader,ToolTip
--No Rows Tab:NoRowMessage:(Expression)
--Others:Custom Properties,Sort Expression
____________________________________________________________
RS dev Demo:
--Tablix – adjacent groups?
--Sparkline
--Bullet
--Sparkline Drillthrough
--Hide Columns or Rows Conditionally
Create tablix, sparkline, bullet, column chart for quarters
=====================================================Report Layout
--What is Tablix?
--The best of Table merged with the best of Matrix
--Allows for fixed and dynamic columns and rows
--Introduces optional omission of row/column headers
--Innovative layout &Grouping capabilities
--Support for richly formatted text
___________________________________________________________________
--Chart Data:Sumvalue,Category Group,Series Group
--Tablix Properties:Border,Data
Only,Fill,Filter,General,localization,No
Rows,others,Positions,location,size,PublishedReportPart,Visiblity
--General Tab:Datasetname,FixedColumnHeaders(F),FixedRowHeaders
(F),GroupsBeforeRowHeaders(0),KeepTogether
(F),Name,PageBreak,PageName,RepeatRowHeader,ToolTip
--No Rows Tab:NoRowMessage:(Expression)
--Others:Custom Properties,Sort Expression
____________________________________________________________
RS dev Demo:
--Tablix – adjacent groups?
--Sparkline
--Bullet
--Sparkline Drillthrough
--Hide Columns or Rows Conditionally
Create tablix, sparkline, bullet, column chart for quarters
================================================
--1.SSIS is an in-memory pipeline, so ensure that all
transformations occur in memory.
--2.The following Network perfmon counters can help you tune your
topology:•Network Interface / Current Bandwidth: This counter
provides an estimate of current bandwidth.
--•Network Interface / Bytes Total / sec: The rate at which bytes
are sent and received over each network adapter.
--•Network Interface / Transfers/sec
--3.Baseline source system extract speed.
--Execute the package from the command line (DTEXEC) and measure
the time it took for it to complete its task. Use the Integration
Services log output to get an accurate calculation of the time. You
want to calculate rows per second:
--Rows / sec = Row Count / TimeData Flow
--To increase this Rows / sec calculation, you can do the
following:
--Improve drivers and driver configurations
--Start multiple connections
--Use multiple NIC cards
--Optimize the SQL data source, lookup transformations, and
destination.
--DataAccess Mode:When you execute SQL statements within
Integration Services (as noted in the above Data access mode dialog
box), whether to read a source, to perform a look transformation,
or to change tables, some standard optimizations significantly help
performance:
-- •Use the NOLOCK or TABLOCK hints to remove locking overhead.
-- •To optimize memory usage, SELECT only the columns you actually
need. If you SELECT all columns from a table (e.g., SELECT * FROM)
you will needlessly use memory and bandwidth to store and retrieve
columns that do not get used. .
-- •If possible, perform your datetime conversions at your source
or target databases, as it is more expensive to perform within
Integration Services
-- •In SQL Server 2008 Integration Services, there is a new
feature of the shared lookup cache. When using parallel pipelines
(see points #8 and #10 below), it provides a high-speed, shared
cache. .
--•If Integration Services and SQL Server run on the same server,
use the SQL Server destination instead of the OLE DB destination to
improve performance..
--•Commit size 0 is fastest on heap bulk targets, because only one
transaction is committed.
--If you cannot use 0, use the highest possible value of commit
size to reduce the overhead of multiple-batch writing.
-- Commit size = 0 is a bad idea if inserting into a Btree –
because all incoming rows must be sorted at once into the target
Btree—and if your memory is limited, you are likely to spill.
-- Batchsize = 0 is ideal for inserting into a heap. For an indexed
destination, I recommend testing between 100,000 and 1,000,000 as
batch size.
--•Use a commit size of <5000 to avoid lock escalation when
inserting; note that in SQL Server 2008 you can now enable/disable
lock escalation at the object level, but use this wisely.
--•Heap inserts are typically faster than using a clustered index.
--This means that you may want to drop indexes and rebuild if you
are changing a large part of the destination table;
--you will want to test your inserts both by keeping indexes in
place and by dropping all indexes and rebuilding to validate..
--•Use partitions and partition SWITCH command; i.e., load a work
table that contains a single partition and SWITCH it in to the main
table after you build the indexes and put the constraints on..
--Use data types – yes, back to data types! –wisely.
--Of all the points on this top 10 list, this is perhaps the most
obvious. Yet, it is such an important point that it needs to be
made separately. Follow these guidelines:
-- •Make data types as narrow as possible so you will allocate less
memory for your transformation.
-- •Do not perform excessive casting of data types – it will only
degrade performance. Match your data types to the source or
destination and explicitly specify the necessary data type
casting..
-- •Watch precision issues when using the money, float, and decimal
types. Also, be aware the money is faster than decimal, and money
has fewer precision considerations than float.
-- declare a cursor and loop to update stats and recompile.
---oledb destination(insert into table) There quite a few
strategies that can be adopted while loading data into a table.
In the OLEDB Destination Component properties there are different
AccessMode, one of them is called OpenRowSet Using Fast Load
option. When this option is set one gets to work with other
FastLoad Properties such as:
FastLoadKeepIdentity
FastLoadKeepNulls
FastLoadOptions
FastLoadMaxInsertCommitSize
---In Summary oledb destination, while using FastLoad option one
needs to aware of what kind of load is being performed in terms of
number of records and then tune the fast load options to work
effectively.
To prevent the package from stopping, set the Maximum insert commit
size option to 2147483647.
----------------
oledb destination component:Table or View – Profiler Results :Looks
like we have a cursor on the go.. (bad news for performance in most
cases!)
oledb destination component:Table or View – Fast Load results:bulk
insert..The real question is to why not use Fast load. I can’t
think of many, The Fast Loads will cause a table lock because of
doing a BULK Load, so if you’re running this during the day you may
find that it’s not suitable, but it’s definitely worth testing!.
----------
--Query Options toolbar-->General-->Set Rowcount(0)
-->Advanced-->Set()
Nocount,ARITHABORT(Checked),no EXEC,SHOWPLAN_TEXT,PARSE
ONLY,STATISTICS TIME,CONCAT_NULL_YIELDS_NULL,STATISTICS IO
--Transaction ISOLATION LEVEL
===================================================================
====
Handling errors at row level A. Ignore failure B. Fail
component C. Redirect row D. Delete row
Fuzzy components are limited to character data comparisons,whereas
DQS matching can compare numbers and dates as well.
copy data from Table/views,no restrication for multiple data
flow,whereas by using query,to copy then we can restrict,but
onlyallow single dataflow.
Dev/Prod env is differ is in "configuration of data store";Dev:same
server(same db);Platform:ssis version diff(Prod/Dev) and also O/S
differ(DEV/Prod)
A success constraint will only allow the process to continue to the
following task if the preceding task completed successfully,
whereas a completion constraint will allow the process to continue
as soon as the preceding task has completed, regardless of the
outcome.
The Merge transformation is similar to Union All, but with Merge,
the sources have to be sorted and the sort position is preserved.
The Lookup transformation does not need a sorted input; it is a
non-blocking transformation, and in cases when more matching rows
exist from the referenced dataset, only the first one will be
retrieved. This means that the transformation will never retrieve
more rows than exist in the input rows. With the Merge Join
transformation more rows can be retrieved, because all matching
data is retrieved.
Parameters are exposed to the caller, but variables are not. If a
particular property needs to be set dynamically but should not be
set by the administrator, -- use
a variable; otherwise, use a parameter.
Some log types provide better integration with the operating
system,whereas others provide better integration with the data in a
database. Not every log type is best suited for every intended use.
Table object-Index --new index,Rebuild All,Reorganize All,disable
All For individual existed index object--create/disable index,new
index,Rebuild ,Reorganize ,disable Index Properties General
Tab:tablename,indexname,index type-->Clustered/non-
clustered/primary/Spatil and also Unique(checkbox) and index key
columns list in grid Index Properties Options Tab:Rebuild index
(Default-unchecked),ignore duplicate values(disable mode-
unchecked),automatically recompute statistics(checked),use row
locks when accessing the index(checked),use Page locks when
accessing the index(checked) Included columns:In addition to
index key,you can include other table column in the index;Non-key
columns to include in the index
Control flows contain the definitions of
data management operations.
■■ Control flows determine the order and the conditions of execution.
Control flow is defined by precedence
constraints that determine the order and conditions of execution.
■■ Tasks representing logical units of work can be grouped in
containers.
■■ Loop containers allow a unit of work to be executed repeatedly.
Creating a Robust and
Restartable Package
--■■ You can enlist a
container or a task to be part of a transaction.
The data flow
task provides all three essential elements of complex data
--movements—data
extraction, data transformations, and data loading. In addition,all three types
of operations can be performed in memory and—depending on the data provider
used by the destination connection manager—fast loading can also be used.
--bigint 40
digits:The number '1234567891234567891234567891234567891234' is out of the
range for numeric representation (maximum precision 38).
--int,if pass 11 or
more digits then giving error like:Arithmetic overflow error converting
expression to data type int.,The statement has been terminated.
--int:10
digits:allowed
--Precedence
Constraint editor
--constraint
options-->evaluation operation:constraint/expression/ expression and constraint/
expression or constraint
--value:success/failure/completion
--expression
--multiple
constraints:choose only one(radiobutton)
--logical AND:all
constraints must evaluate TRUE
--logical OR:one
constraint must evaluate TRUE
--Select
either Expression and Constraint or Expression or Constraint to use an
expression in the evaluation operation of the precedence constraint.
-- The
expression that you use must evaluate to a Boolean.
--For
example, a package could use an expression that evaluates whether the
constrained executable runs based on the amount of disk space available, the
number of rows a data flow inserted into a table, or the day of the month.
--Let's look
at some sample expressions. In the following expression, the variable
varRowCount, set by a Row Count transformation in a data flow, is compared to
the value 1000. You could use this expression to determine whether the package
should run a task that backs up a database depending on the number of rows
inserted into the destination database. The prefix @ is required when variables
are used in expressions to differentiate the variables from column names. In
most user interface, the @ is added for you automatically.
--@varRowCount
> 1000
--Likewise,
in the following expression, the variable @varDisk, set by a Script task, is
compared to the value 20. You could use this expression to determine whether to
continue package execution.
--@varDisk
>= 20
--In the
following expression, the GETDATE function determines the day. If the day is
the first day in the month, the constrained executable runs.
--DATEPART("day",GETDATE()) == 1
In your ETL process, a few values need to
be retrieved from a database at run time,
based on another value available at run
time, and they cannot be retrieved as part of
any data flow task. Which task can you use
in this case?
|
Execute T-SQL Statement task
|
Execute SQL task
|
Expression task
|
Execute Process task
|
|
task cannot
retrieve values from the database,
--nor does it support parameters, both of which are
needed in this case.
|
used in
this case, because it allows values to
--be passed
as input parameters to the SQL statement retrieved as output parameters
--from the statement.
|
cannot be used to retrieve data from a data source.
|
Although
arguments can be passed to the external process via the
--Execute Process task, values cannot be retrieved
this way.
|
Which task in the control flow supports transactions?
The File
System task &XML task does not provide a rollback mechanism, so it cannot support transactions.
Which transaction isolation level does not lock the records
being read?
|
Serializable
|
Snapshot
|
Chaos
|
ReadUncommitted
|
|
is the strictest isolation level
|
isolation does not lock the data
|
is similar to ReadUncommitted, but it
checks the isolation level of other pending transactions during a write
operation.
|
is the least restrictive isolation level
because it ignores locks placed by other transactions
|
|
Non-Blocking
streaming transformations
|
Partially
blocking transformation
|
Non-blocking
row based transformation
|
blocking
transformation
|
|
These transforms operate on the same
buffer from start to finish
|
Partially-blocking transforms might wait
on other inputs, and always produce new buffer
|
These transforms operate row by row
.-but can't work on same buffer simultenously
|
These transforms stop pipeline until all
row read and always produce new buffer
|
The both transformations accept one input and give us
multiple outputs
|
Multicast
|
Conditionalsplit
|
|
Directs each row of the source
to every o/p
|
Directs each row to a single
o/p
|
Both transformations can add new columns.
|
Dervied column
|
Copy column
|
|
Can
add new columns without any help from existing columns.
|
can
add new columns only through existing columns
|
|
can
assign different transformations and data types to the new columns
|
No options available
|
|
supports
error output
|
whereas
Copy column can’t support error output
|
|
|
IN
|
EXISTS
|
JOIN
|
|
|
Return expected results
|
Return expected results
|
Inner join Return join
gives us more rows than we need, because it joins to every child row
|
|
we want to get all data in ta1 that doesn't
have a child row in ta2
Null
condition
|
It will not displayed “null” records,
because null-unknown,Three-valued logic(TFUnknown)
Not
IN
|
Return null records as well by using
Not
Exists condition
|
Left join with IS null condition included
in the where caluse
|
4. What is the difference between the Union All and the Merge transformation?
The Merge transformation is
similar to Union
All, but with Merge, the sources
have to be sorted and
the sort position is preserved.
5. What is the difference between the Lookup
and Merge Join transformations?
■ The Lookup transformation does not need a sorted input; it is a non-blocking
transformation, and
in cases when
more matching rows exist from the referenced dataset,
only the first one will be retrieved. This means that the transformation will never retrieve more rows
than exist in the input rows. With
the Merge Join
transformation more rows can be retrieved, because all
matching data is retrieved.
6. There is an important difference
between lineage and
other columns:
lineage columns are never
exposed to end
users and are never
shown on end
users’ reports.
To summarize, a dimension may contain the following types of columns:
■■ Keys Used to identify entities
■■ Name columns Used for
human names of entities
■■ Attributes Used for pivoting in
analyses
■■ Member properties Used for labels in a
report
■■ Lineage columns Used for auditing, and never exposed to end users
Troubleshooting
Tools for Package Execution
Integration
Services includes features and tools that you can use to troubleshoot packages
when you execute them after they have been completed and deployed.
At design time,
SQL Server Data Tools (SSDT) provides breakpoints to pause package execution,
the Progress window, and data viewers to watch your data as it passes through
the data flow. However, these features are not available when you are running
packages that have been deployed. The main techniques for troubleshooting
deployed packages are as follows:
- Catch and handle package errors by
using event handlers.
- Capture bad data by using error
outputs.
- Track the steps of package execution
by using logging.
You can also use
the following tips and techniques to avoid problems with running packages
·
Help to ensure data integrity by using
transactions
·
Restart packages from the point of failure by
using checkpoints
|
1.
Catch and Handle Package
Errors by Using Event Handlers
|
|
2.
Troubleshoot Bad Data by
Using Error Outputs
|
|
3.
Troubleshoot Package
Execution by Using Operations Reports
|
|
4.
Troubleshoot Package
Execution by Using SSISDB Views
|
|
5.
Troubleshoot Package
Execution by Using Logging
|
|
6.
Troubleshoot Package
Execution by Using Debug Dump Files
|
|
7.
Troubleshoot Run-time
Validation Issues
|
|
8.
Troubleshoot Run-time
Permissions Issues
|
|
9.
Troubleshoot 64-bit
Issues
|
|
10.
Troubleshoot Errors
without a Description
|
1.Catch and Handle Package Errors by Using Event Handlers:Create an event handler for the OnError event
2. Troubleshoot Bad Data by Using Error Outputs:
You can use the error output available on many data flow components to direct rows that contain errors to a separate destination for later analysis.
1. Capture bad data by
using error outputs.
2. Add friendly information
to the error outputs.
3. Add the description of
the error.
4.
Add the name of the error column
a.
Create a lookup table of column names
b.
Add the column name to the output.
2. Troubleshoot Bad Data by Using Error Outputs àCapture bad data by using error outputs: Send rows that contain errors to a separate destination such as an error table or a text file.
The
error output automatically adds two numeric columns that contain the number of
the error that caused the row to be rejected, and the ID of the column in which
the error occurred.
For more info: Error Handling in
Data:Errors typically fall into one the following categories
1. Data conversion errors
2. Expression evaluation errors
3. Lookup errors, which occur if a lookup
operation fails to locate a match in the lookup table.
Configure Error output
(red-marked error flow)
a. Error and Truncation Options:Default ID
& column
b. Adding the Error Description: You can use
the Script component to include the error description in an additional column
by using a single line of script to call the GetErrorDescription method
public class ScriptMain:
UserComponent
{
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Row.ErrorDescription = this.ComponentMetaData.GetErrorDescription(Row.ErrorCode);
}
}
3.Troubleshoot Package Execution
by Using Operations Reports
Standard
operations reports are available in SQL Server Management Studio to help you
monitor Integration Services packages that have been deployed to the
Integration Services catalog. These package reports help you to view package
status and history, and, if necessary, identify the cause of failures.
Troubleshoot Package Execution by Using Logging
You can track much of what occurs in your running packages by enabling logging. Log providers capture information about the specified events for later analysis, and save that information in a database table, a flat file, an XML file, or another supported output format.
a. Enable logging.
b.
Select the package's Diagnostic event to
troubleshoot provider issues
c. Enhance the default
logging output.
Logging typically appends rows to
the logging destination each time that a package is run. Although each row of
logging output identifies the package by its name and unique identifier, and
also identifies the execution of the package by a unique ExecutionID, the large
quantity of logging output in a single list can become difficult to analyze.The following approach is one suggestion for enhancing the default logging output and making it easier to generate reports:
1. Create a parent table
that logs each execution of a package.
2. Add auditing information
to the data flow.
3. Consider capturing row
count data.
|
S.No
|
Instead of Triggers
|
After Triggers
|
|
1
|
Each table or view can have one INSTEAD OF
trigger for each triggering action (UPDATE, DELETE, and INSERT)
|
A table can have several AFTER triggers for
each triggering action.
|
|
2
|
INSTEAD OF triggers fire in place of the
triggering action and before constraints are processed.
|
AFTER triggers fire after the triggering
action (INSERT, UPDATE, or DELETE) and after any constraints are processed.
|
|
S.No
|
Clustered Indexes
|
Non-Clustered Indexes
|
|
1
|
It reorders the physical storage of records
in the table
|
It sorts and maintain a separate storage
|
|
2
|
There can be only one Clustered index per
table
|
We can have 249 non-clustered indexes in a
table
|
|
3
|
The leaf nodes contain data
|
The leaf node contains pointer to data
|
|
4
|
To create clustered index Sql server
required more memory because the leaf pages in the tree structure will
maintain actual data .
|
To create non-clustered index Sql server
requires less memory because the leaf pages will contain pointers to actual
data
|
|
5
|
By using clustered index retrieving data is
more faster,when we compare with non-clustered index.
|
By using non-clustered index retrieving data
is slower than clustered index.
|
create synonym dbo.tsample for testtable
create synonym
dbo.vsample for
v1ci--view,function,sp
How
do you search for synonyms of a word with the CONTAINS predicate?
1. You have to use the CONTAINS(FTcolumn,‘FORMSOF(THESAURUS,
SearchWord1)’) syntax.
1. Does a synonym store T-SQL or any data?
2. Can synonyms be altered?
Answer
1. No, a synonym is just a name. All that is stored with
asynonym is the object it
refers to.
2. No, to change a synonym, you must drop and recreate it.
Synonyms cannot refer to other synonyms. They can only refer to
database objects such as
tables, views, stored procedures, and functions. In other words,
synonym chaining is not allowed.
FAST LOAD:
The data flow task provides all three essential elements of
complex data movements—data extraction, data transformations, and data loading.
In addition,all three types of operations can be performed
in memory and—depending on the data provider used by the destination connection
manager—fast loading can also be used.
■■ Use
Fast Load or Batch mode when inserting data by using an ODBC or OLE DB
destination adapter.
A quick comparison between loading a SQL Server Ole DB
destination with “Table and View” vs the “Table and View Fast Load”
I’ve set up a simple data flow that loads data from a csv and pushes it into SQL Server.
I’ve set up a simple data flow that loads data from a csv and pushes it into SQL Server.
Table or View – Profiler Results
Looks like we have a cursor on the go.. (bad news for performance in most cases!)
And Table or View – Fast Load results:
A bulk insert..
The real question is to why not use Fast load. I can’t think of many, The Fast Loads will cause a table lock because of doing a BULK Load, so if you’re running this during the day you may find that it’s not suitable, but it’s definitely worth testing!.
Looks like we have a cursor on the go.. (bad news for performance in most cases!)
And Table or View – Fast Load results:
A bulk insert..
The real question is to why not use Fast load. I can’t think of many, The Fast Loads will cause a table lock because of doing a BULK Load, so if you’re running this during the day you may find that it’s not suitable, but it’s definitely worth testing!.
Note:
OLE DB
Destination
The OLE DB
destination supports all of the bulk load options for SQL Server. However, to
support ordered bulk load, some additional configuration is required. For more
information, see “Sorted Input Data”. To use the bulk API, you have to
configure this destination for “fast load”.
The OLE DB
destinationcan use both TCP/IP and named pipes connections to SQL Server. This
means that the OLE DB destination, unlike the SQL Server destination, can be
run on a computer other than the bulk load target. Because Integration Services
packages that use the OLE DB destination do not need to run on the SQL Server
computer itself, you can scale out the ETL flow with workhorse servers.
Configuring the
Integration Services OLE DB Destination for Sorted Input
While the GUI
does not directly expose it, it is possible to configure the OLE DB Destination
in Integration Services to support a sorted input stream. This is done from the
advanced properties of the component.
Figure 6:Configuring OLE
DB Destination for sorted input
Add the ORDER
hint, as illustrated above, to the FastLoadOptions property.
No comments:
Post a Comment