3PDS is a service that collects engagement data across various social platforms on User-Generated Content (UGC) created by JA users for use in client reporting. Engagement data consists of things like; views, upvotes, shares, reposts, impressions etc.
3PDS internally consists of two parts; the orchestrator & the workers. Here's a general overview of how the system works.
listUnlockedUGC every 5 minutes
Orchestrator -------> ugc <---*
| \
push job(s) *-- ugc_links
v |
BullMQ queue v
^ references post & community
listens
| Platform Scraper
Worker ---- requests ------> YouTube Data API V3 ---> YouTube data
|
stores data in
|
v
ugc_data
When a user posts a link to a piece of original content from an third-party platform on Just About, that link is inserted into the ugc table for future processing. The ugc table looks like this:
link : the link to the UGC on an external platform that the user createditeration_index : how many times the UGC has had its engagement data scrapediteration_started_at : when the scraping started for the previous iteration_indexiteration_ended_at : when the scraping ended for the previous iteraton_indexnext_iteration_date : when the data should next be scrapedstatus : dictates the state of the UGC
pending : data is waiting to be fetchedlocked : data is currently being fetchederrored : getting the data failed for whatever reasondead : link is dead, no statistics can be pulledskipped : link does not have a providerThe 3PDS Orchestrator manages the timings of when UGC should have its data collected & dispatching jobs to a queue to be processed by workers. UGC typically follows an exponential drop-off in terms of viewship over time, so the frequency of data collection should also reflect this, to avoid needlessly spending API calls on third-party services.
The Orchestrator checks every 5 minutes for rows in the ugc table with a pending status and a next_iteration_date that has been passed. With those returned rows, jobs are inserted into a Redis queue (which uses BullMQ) to be processed by the 3PDS Workers.
3PDS is designed to be horizontally scalable from the start, workers could be scaled up as queue congestion dictates to handle a huge amount of jobs.
The 3PDS worker takes an arbitrary link and through Platform Scrapers (detailed below) collects data relating to the link & stores that data in ugc_data.
The ugc_data table looks like this:
ugc_id : the ID of the row in the ugc table for which this data is associated withiteration_index : correlates to the iteration of the data scraping, 1 = 1st scraping, 2 = 2nd and so forthcollected_at : when the data collection startedjob_id : the job from the queue that caused this data to be collectedviews : correlates to a view / impression on a 3rd party platformupvotes : correlates to an upvote / heart / like on a 3rd party platformdownvotes : correlates to a downvote / dislike on a 3rd party platformshares : correlates to an external share on a 3rd party platformreposts : correlates to a repost / re-tweet / reblog on a 3rd party platformreplies : correlates to a direct reply on a post on a 3rd party platformA Platform Scraper (PS) is composed of two parts; the matcher, which identifies that a link can have its data scraped by this PS, and the fetcher, the instructions by which data can be scraped for that link.
A matcher is a function that returns a boolean based on if a particular UGC link matches for this platform, e.g.
const isYouTubeURL = url => url.includes('youtube.com');
// tumblr.com/video/1234 - false
// youtube.com?w=1234 - true
A fetcher is a function that takes that same link, queries a third-party API for data, and returns some engagement data, e.g.
const YouTubeDataAPIV3Fetcher = async url => {
const videoID = extractVideoIDFromYoutubeURL(url);
const response = await fetch(`api.youtube.com/video/${url}`).then(r => r.json());
return {
views: response.views,
upvotes: response.likes,
}
}
Not all platforms use the same terminology across views, impressions, upvotes, likes, hearts etc. - yet all describe a similar intent. As such, forms of engagement are translated into a consistent format to simplify reporting.
ugc_linksugc_links provides the ability to filter UGC by post or community, which is helpful in reporting, e.g. answering the question How many views were caused by UGC in community X over Y time period?. The ugc_links table looks like this:
post_id : the id of the post that this UGC exists incommunity_id : the id of the community that this UGC's post was created inugc_id : the id of the ugc that this post relates to