View profile

Data Collection: Tools and Technologies to Ingest Data From Secondary Sources

Data Collection: Tools and Technologies to Ingest Data From Secondary Sources
By Arpit Choudhury • Issue #4 • View online
👋 Hey there, welcome to the astorik newsletter that helps keep pace with the data space!
In this issue, I am covering tools and technologies to collect data from secondary or third-party data sources.
This is a continuation of the previous issue where I covered tools and technologies that enable tracking data from primary data sources. If you haven’t already, I highly recommend checking out issue #3 first.

First, let’s look at why you need to collect data from secondary or third-party data sources (tools used for sales, marketing, advertising, and support) and how this data is different from and complementary to the data collected from primary data sources (website, web app, and mobile apps).Disclaimer: astorik is vendor-neutral and the tools mentioned here are not necessarily the best tools in the categories they operate in, nor are they the only ones.
Primary vs Secondary Data Sources
Users perform events by interacting with a website or app (primary source) and therefore the data tracked in the process is called event data. It is also often referred to as behavioural data as it helps understand user behaviour. [1] 
On the other hand, data from third-party apps (secondary source) is referred to as object data since data is stored as objects (contacts, leads, messages, campaigns, etc).
Here are some common use cases of collecting data from third-party apps: 
  • Data from the CRM provides context about customers and the accounts they belong to 
  • Data from sales, marketing, advertising, and support tools helps understand how users engage with your brand via multiple channels across multiple touchpoints 
  • Data from payment processing services provides insights into how users transact with your product 
Event data from first-party apps and object data from third-party tools are complementary since a combination of the two is needed to derive insights and drive action based on those insights (topic for the next issue).
Data Collection and Warehousing
Data Collection and Warehousing
Data Ingestion
You might be wondering that collecting data is fine but where this data is stored in the first place. Well, it depends on the technology you decide to use but typically both event data and object data are stored in a Data Warehouse such as Snowflake, AWS Redshift, or Google BigQuery.
Using a data warehouse is highly recommended and is a key ingredient for deriving insights and driving action.
Okay, now back to the main topic of this issue – tools and technologies to ingest data from secondary or third-party data sources. 
ELT-based Integration Tools
ELT stands for Extract, Load, Transform which refers to extracting data from the data sources, loading the data into a data warehouse, and then transforming or wrangling the data to derive insights and drive action.
Popular ELT tools include Fivetran, Stitch, Matillion as well as open-source alternatives like Meltano and Airbyte
ELT tools support a wide range of data sources including a plethora of third-party SaaS tools as well as databases (like MySQL and PostgreSQL) and cloud storage services (like Amazon S3 and Google Cloud Storage).
The tools mentioned here all have certain strengths and weaknesses especially in regard to the connectors or data sources they support. Additionally, their pricing models vary and what you end up paying might be different even with the same number of data sources and the same volume of data. 
It’s also helpful to know that ELT tools are often referred to as ETL tools since ETL (Extract, Transform, Load) is the older paradigm under which data had to be transformed before being loaded into a data warehouse. 
And even though ETL has largely been displaced by ELT (new paradigm), many ELT tools continue to be referred to as ETL tools (old paradigm) and many ETL tools are now calling themselves ELT tools (new paradigm)
In an attempt to keep things simple, Fivetran calls itself a data integration tool. However, this further complicates the matter because data integration goes beyond ETL or ELT and encompasses Reverse ETL, iPaaS, as well as CDP. [2] 
CDP or Customer Data Platforms
Since I had covered CDPs in the previous issue, I am keeping it short here. CDPs like Segment (Personas) or mParticle have the ability to extract data from a variety of cloud applications and store the data in a data warehouse.
However, CDPs are not purpose-built to ingest data into data warehouses and are more suited to building user segments or audiences by combining data from multiple sources and further syncing those segments to third-party tools.
ELT tools, on the other hand, are purpose-built to ingest data into data warehouses and therefore offer more robust integrations, faster syncing capabilities, and other advanced functionality. [3]
iPaaS-based Integration Tools
iPaaS (integration platform as a service) solutions such as Tray, Workato, Integromat or Zapier can also be used to extract data from third-party SaaS tools and load the data into data warehouses. 
However, iPaaS tools are designed to perform actions (such as loading data) based on a trigger (such as a new contact being created in the CRM) and are more suited to automate workflows rather than ingest data in a warehouse which is typically done in batches. 
It took 3 issues to cover Data Collection and I’ve had to condense a lot of information into a few words which is, quite frankly, harder than writing a 2000-word piece on this topic.
Resources to dig deeper:
Know someone who will benefit from this content? Share this issue with them and help us reach more people! 💗 
Did you enjoy this issue?
Arpit Choudhury

Tips on adopting a modern data stack to fuel growth!

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
astorik, Urbana NRI Complex, Kolkata, India