Advanced Queries#

Where Attribute#

The where attribute is the most used attribute to select content by applying different conditions to the available fields. Below there are some code snippets that help building a where statement, and notes that help modify the syntax according to the need.

Range of dates#

It works with the standard SQL syntax. If updates will be collected in the future, avoid using an end-date. Alternative fields are: modification_datetime and ingestion_datetime.

publication_datetime >= '2010-01-01 00:00:00' AND publication_datetime <= '2020-06-30 23:59:59'

Filter by source_codes#

The following clause is useful to select sources by their individual code.

UPPER(source_code) IN ('AASFNE', 'HTACCF', 'NLADLW', 'ADVTSR', 'AFNROL', 'AGEEOL', 'AGEE', 'HNASNI', 'APRS', 'ASXTEX', 'AUSTOL')

In case sources will be selected by their category or source family, a better option is using restrictor_codes. This field is not in the documentation, but the CSE or Integration Team can provide more information like source family codes

REGEXP_CONTAINS(restrictor_codes, r'(?i)(^|,)(jpost|nytf|wp|latm|j)($|,)')

Filter by subject_codes#

REGEXP_CONTAINS(subject_codes, r'(?i)(^|,)(mcat|ccat|ecat|gglobe|ghea|ghnwi|gcns|gpir|gdatap|greest|grisk|gsci|gspace|gtrans)($|,)')

Filtering by the region where the source is headquarted#

REGEXP_CONTAINS(region_of_origin, r'(?i)(aust|spain|italy|usa|uk)')

Filtering by language#

LOWER(language_code) IN ('en', 'es', 'it')

Filtering by company codes#

This is applicable to any company-related fields (about, occur or company_codes and other combinations with identifiers - ISIN, CUSIP…).

REGEXP_CONTAINS(company_codes, r'(?i)(^|,)(agbpet|agip|agphng|agpnme|agzgi|altgaz|bbor|brnene|distrg|eenivm|egapg|enichm|enie|enimnt)($|,)')

In case the interest is to ensure at least one company is tagged (the field is not empty), the expressions looks like this

LENGTH(company_codes) > 2

Filtering for content with at least 1 relevant company

LENGTH(company_codes_about) > 0

Filtering by Industry code#

REGEXP_CONTAINS(industry_codes, r'(?i)(^|,)(i1|i25121|i2567)($|,)')

Filtering by Executive codes#

REGEXP_CONTAINS(LOWER(person_codes), r'(?i)(^|,)(76064380|2349856)($|,)')

Filtering by the region the article is about#

REGEXP_CONTAINS(region_codes, r'(?i)(^|,)(aust|spain|italy|usa|uk)($|,)')