BigQuery and the Gdelt Project. Beyond dreams of Marketing Analysts.
Do you imagine a dataset, which is updated every 15 minutes, containing all the information in world news?
Do you imagine studying the Sentiment Analysis of a company or product in any location, or worldwide, regardless the language?
Well, the first question is answered by the breathtaking Gdelt Project. A titanic project that covers definitely more than any project can use, storing information in 65 languages.
The second question in answered by Google. The whole Gdelt is available in BigQuery, a powerful environment covering all fields of working with Data: Collecting data, creating datasets, filtering them with advanced SQL Queries (and further implementations are expected, like looping and conditioning within the Query), using Machine Learning algorithms and Visualizing them exporting your results to Google Data Studio.
The study we show below is for a company dedicated to education on digital business (Data Analytics, Web Development, UX/UI and more) and several campuses worldwide.


Taking a glance to the features we are able to confirm the following:
- The general perception of this company is really good as shown by tone.
- arf shows that found text is slightly not neutral (maybe due to personal publications, or with emphasis). Polarity is moderate, suggesting that texts found were not highly emotionally charged.
- sg_rf is moderate, which suggest there is not a belonging-to-group feeling.
- Within the studied interval, pos_score remains above neg_score and are coherent with tone. For some reason September and November 2019 were the roughest months for this company. It could be due to lack of presence on the internet, nevertheless the neg_score remains there, what implies that the good perception the people have just decreased this months.
So, this was done as a brief and simple sketch. The possibilities of BiqQuery itself, or BigQuery with the Gdelt Project will stack overflow all expectations of any analyst by far. 35GB of plain text were processed in this study.
The interactive graph can be found here.
The query itself is the following:
SELECT
EXTRACT (date
FROM
PARSE_TIMESTAMP('%Y%m%d%H%M%S',CAST(date AS string))) AS Date,
CAST(SPLIT(V2Tone, ",") [
OFFSET
(0)] AS FLOAT64) AS tone,
CAST(SPLIT(V2Tone, ",") [
OFFSET
(1)] AS FLOAT64) AS pos_score,
CAST(SPLIT(V2Tone, ",") [
OFFSET
(2)] AS FLOAT64) AS neg_score,
CAST(SPLIT(V2Tone, ",") [
OFFSET
(3)] AS FLOAT64) AS polarity,
CAST(SPLIT(V2Tone, ",") [
OFFSET
(4)] AS FLOAT64) AS arf,
CAST(SPLIT(V2Tone, ",") [
OFFSET
(5)] AS FLOAT64) AS sg_rf,
CAST(SPLIT(V2Tone, ",") [
OFFSET
(6)] AS FLOAT64) AS wc
FROM
`gdelt-bq.gdeltv2.gkg_partitioned`
WHERE
DATE(_PARTITIONTIME) >= "2018-01-01"
AND lower(DocumentIdentifier) LIKE '%ironhack%'I really love the Gdelt Project. Here you can find several leisure projects regarding this outstanding tool:
https://github.com/albertovpd/analysing_world_news_with_Gdelt
Hope you liked it.
Alberto.
This article was written as member of the Data Team at Labelium Spain.




