Tuesday, 18 June 2024

Trigger connections in talend use cases

 

Trigger connections define the processing sequence, so no data is handled through these connections.

The connection in use will create a dependency between Jobs or subJobs which therefore will be triggered one after the other according to the trigger nature.



Trigger connections fall into two categories:

  • subJob triggers: On Subjob OkOn Subjob Error and Run if,
  • component triggers: On Component OkOn Component Error and Run if.



OnSubjobOK: This connection is used to trigger the next subJob on the condition that the main subJob completed without error. This connection is to be used only from the start component of a subJob.

These connections are used to orchestrate the subJobs forming the Job or to easily troubleshoot and handle unexpected errors.

OnSubjobError: This connection is used to trigger the next subJob in case the first (main) subJob do not complete correctly. This "on error" subJob helps flagging the bottleneck or handle the error if possible.

OnComponentOK and OnComponentError are component triggers. They can be used with any source component on the subJob.

OnComponentOK will only trigger the target component once the execution of the source component is complete without error. Its main use could be to trigger a notification subJob for example.

OnComponentError will trigger the subJob or component as soon as an error is encountered in the primary Job.

The main difference between OnSubjobOK and OnComponentOK lies in the execution order of the linked subJob.

·        With OnSubjobOK, the linked subJob starts only when the previous subJob completely finishes.

·        With OnComponentOK, the linked subJob starts when the previous component finishes.

The execution order of the subJobs linked by OnComponentOK is within the execution cycle of the previous subJob.

Run if connection settings

About this task

In the Basic settings view of a Run if connection, you can set the condition to the subJob in Java.


In the following example, a message is triggered if the input file contains 0 rows of data.



Procedure

1.     Create a Job and drop three components to the design workspace: a tFileInputDelimited, a tLogRow, and a tMsgBox.

2.     Connect the components as follows:

o   Right-click the tFileInputDelimited component, select Row > Main from the contextual menu, and click the tLogRow component.

o   Right-click the tFileInputDelimited component, select Trigger > Run if from the contextual menu, and click the tMsgBox component.

3.     Configure the tFileInputDelimited component so that it reads a file that contains no data rows.

4.     Select the Run if connection between the tFileInputDelimited component and the tMsgBox component, and click the Component view. In the Condition field on the Basic settings tab, pressing Ctrl+Space to access the variable list, and select the NB_LINE variable of the tFileInputDelimited component. Edit the condition as follows:

((Integer)globalMap.get("tFileInputDelimited_1_NB_LINE"))==0

5.     Go to the Component view of the tMsgBox component, and enter a message, "No data is read from the file" for example, in the Message field.

6.     Save and run the Job. You should see the message you defined in the tMsgBox component.

 



Thursday, 27 May 2021

Talend - Out of Memory Error and Java Heap Space Error

 

The Out of Memory Error and Java Heap Space Error are two of the usual errors which occur in the Talend jobs handling a large volume of data. These errors can be avoided to an extent by following some design guidelines.

(1) Keep in mind that tMap is a heavy component. Minimize its use in your jobs.

·                     Avoid tMap if you need just simple transformations like trimming the string values, replacing null numbers by zeroes, etc. In its place you can use tJavaRow component.

·                     If you want to get only a small set of columns from a huge collection avoid using a tMap. For that you can use a lighter component- tFilterColumns

·                     Similarly, to filter rows you can use tFilterRow instead of a tMap

(2) Use store on disk option whenever necessary.

          This option is available in tMap, tUniqRow, tSortRow, etc.

·                     tMap

While using store on disk option in tMap the directory to store temporary data will be created automatically. This data will not be deleted or replaced on subsequent run(s) of the job. So it is advised to delete the temporary directory created using tFileDelete component from within the job. You can give that in On Subjob Ok of tPostJob component.

 

·                     tUniqRow

In the case of tUniqRow the temporary directory should be created manually before the job run/or can be handled within the job. If the temporary directory is not available, the component tUniqRow will give out FileNotFoundException!

 

·                     tSortRow

In the case of tSortRow the temporary directory will be created automatically

 

 

(3) The JVM arguments can be modified as and when needed

.

-Xms256M - initial memory size available to JVM is 256 MB

-Xmx1024M - maximum memory size available to JVM is 1024 MB

TALEND COMPARE DATE FUNCTION EXAMPLES

 1)if first one less than second one return number -1,

TalendDate.compareDate("2016-DEC-01" ,"2016-DEC-020","yyyy-MM-dd");

2)equlas return number 0,
TalendDate.compareDate("2016-DEC-01" ,"2016-DEC-01","yyyy-MM-dd");

3)bigger than return number 1. (can compare partly)
TalendDate.compareDate("2016-DEC-15" ,"2016-DEC-01","yyyy-MM-dd");

Working Code :
Var.start : TalendDate.parseDate("yyyy-MM-dd","2016-DEC-01")

Var.End : TalendDate.parseDate("yyyy-MM-dd","2016-DEC-01" )

TalendDate.compareDate(Var.start,Var.End,"yyyy-MM-dd");

COMPARE DATE () SUMMARY :

Date1 < Date2    :         Returns -1
Date1 = Date 2     :         Returns 0
Date1> Date 2    :         Returns 1

Monday, 27 May 2019

Put File Mask on tFileList Component dynamically in Talend

Put File Mask on tFileList Component dynamically in Talend

How I can set file mask for tFilelist component in Talend that it recognize date automatically and it will download only data for desired date?

There are two ways of doing it.
1.    Create context variable and use this variable in file mask.
2.    Directly use TalendDate.getDate() or any other date function in file mask.
See both of them in component
1st Menthod,
·         Create context variable named with dateFilter as string type.
·         Assign value to context.dateFilter=TalendDate.getDate("yyyy-MM-dd");
·         Suppose you have file name as "EMP_2015-06-19.txt" then
·         In tFileList file mask use this variable as follows.
"EMP_"+context.dateFilter+".*"

2nd Menthod
·         In tFileList file mask use date function as follows.
"EMP_"+TalendDate.getDate("yyyy-MM-dd")+".*"


Above are the two best way, you can make changes in file mask per your file names.

Thursday, 23 May 2019

Difference between tJava,tJavaRow and tJavaFlex component

Difference between tJava,tJavaRow and tJavaFlex component

There are three Java components in the Custom Code family: tJava, tJavaRow and tJavaFlex

  1. tJava
  • tJava component is used to integrate your custom Java code into a Talend program. 
  • It applies exclusively to the start part of the generated code of the subjob. 
  • It will be executed first but only once in the subjob. 
  • tJava component has no input or output data flow and is used as a separate subjob.  
  • Common use of tJava include setting global or context variables prior to the main data processing stages and printing logging messages.It is udes to display status message and variables.
    
     2.  tJavaRow
  • The tJavaRow code applies exclusively to the main part of the generated code of the subjob. 
  • The Java code inserted through the tJavaRow will be executed for each row. 
  • The tJavaRow component is used as an intermediate component and you are able to access the input flow and transform the data.

     3.  tJavaFlex  

The tJavaFlex has three Java code parts (start, main, end) that enable you to enter personalized code for different purposes. 
  • The Start code is executed prior to any rows being processed, so it is used to initialize the variables.The start part will be executed first but only once in the subjob. 
  • The Main code is executed for every row. You are able to access the input flow and modify the data. The source data is processed at runtime by the tJavaFlex
  • The End code is executed after all the rows have finished processing but only once in a job.


The tJavaFlex component is similar to the tJavaRow component, in that it is included into a flow. The difference between the two components is that the tJavaFlex component has pre and post processes that are performed before and after the individual rows are processed.

Talend Interview Questions Part2:-

1)what is flat files ? how can we handle this flat files in talend?
2)what is nested column in file(XML file)?
3)what is port number of your oracle server ?from oracle how can you extract the data?
4)what default user id in oracle and sql ?
5)what is the size of your data bases ? how many maximum records you have handled ?
6)how can you migrate the data which is 30 yrs of data ? what is process you followed? there is huge data in the source i need to pull the data monthly twice ot thrice  how can process this huge volume of data ?what is process you will followed ?
7)are you alone handling the jobs? how can you justify you are gathering the requirements ?what is the exact things you are doing in your team?
8)which version of oracle you are using ? what is ranges of the data in your database ?
9)what is the top scenario you have used ? what are the big achievements you are done ?what are the big challenges you are faced ?
10)what kind of data you have in your database is it normalized data or de normilized data ? what kind of data you will have in the data warehouse is it a normalized data
or de normalized data??
11)what are the orchestration components?
12)how can you do performance tuning in talend ? how can you do performance tuning in sql ?

performance tunning in talend.....

performance tunning in talend.....

1.Tparalleize :-
When a Job consists of several sub jobs (tRunjob component), we might want to execute some of them in parallel and then synchronize the execution of the other Subjobs at the end of parallel execution.
 To achieve that we simply use the tParallelize component to orchestrate all the Subjobs to be executed.
 

2.Configuring the JVM parameter of an individual job

In Talend Open Studio it is possible to specify the number of MB used in each streaming chunk. The default is a minimum of 256MB and a maximum of 1024 MB, but we increase the amount if we have more memory to devote.
Follow the below steps to increase the JVM
1.      In the Run view, the Advanced settings tab, select the Use specific JVM arguments checkbox.
2.      Select the Use specific JVM arguments to enable the configuration.
3.      Now double click on the individual arguments and specify the JVM arguments accordingly
-Xms represents the minimum and -Xmx represents the maximum JVM arguments





3. Multi-thread execution:-

When the Project settings check box is selected, the Multi-thread execution checkbox could be grayed out and become unavailable. In this situation, clear the Use project setting check box to activate the Multi-thread execution checkbox.

 4.Removing data column not used of the ETL process

By the removing the columns of the database, other input sources would reduce the time of loading the data from database component. When multiple database components is used as lookup component in tMap, this plays a vital effect in the loading the data at a faster rate.

5.DB Components:-
toracle out Batch Size & commit size increaseing Job Performace
Toracle Input  Cursor Size increase the job perforamce..
use bulk Load Options Increase the Performace...
6.Store on Disk Usage:-

Some of the Components Provides store on disk option
select That Components  increase the job performace...
Like
 Tsortrow,


tuiquerow,

tmap:-


Tmap:-



Look Up Model:Load Once Give The good Performace using Talend Job

Trigger connections in talend use cases

  Trigger connections define the processing sequence, so no data is handled through these connections. The connection in use will create a d...