Join two datasets and count how many times a host appears, even if zero

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Join two datasets and count how many times a host appears, even if zero

L0 Member

Let's say I have a dataset with list of hostnames, for ex:

 

cphost cpcluster_name cphost_role
one.example.com Test Subscriber
two.example.com Test Subscriber
three.example.com Test Subscriber
four.example.com Test Subscriber

 

And another dataset with RADIUS auth logs, for example


_raw_log
"sourcetype":"syslog_clearpass" "host":"" hostname:one.example.com <135>2025-04-01 16:19:41,65 CPPM_Dashboard_Summary 38932186 1 0 session_id=,req_source=RADIUS,user_name=,service_name=Free-WiFi -MAC-Auth - Service,alerts_present=0,nas_ip=,nas_port=0,conn_status=Unknown,login_status=ACCEPT,error_code=0,mac_address=,timestamp=2025-04-01 16:17:57.076+00,write_timestamp=2025-04-01 16:17:58.899283+00
"sourcetype":"syslog_clearpass" "host":"" hostname:one.example.com <135>2025-04-01 16:19:41,78 CPPM_Dashboard_Summary 38932506 1 0 session_id=--,req_source=RADIUS,user_name=,service_name=Free-WiFi -MAC-Auth - Service,alerts_present=0,nas_ip=,nas_port=0,conn_status=Unknown,login_status=ACCEPT,error_code=0,mac_address=,timestamp=2025-04-01 16:18:15.564+00,write_timestamp=2025-04-01 16:18:16.022775+00
"sourcetype":"syslog_clearpass" "host":"" hostname:two.example.com <135>2025-04-01 16:19:41,79 CPPM_Dashboard_Summary 38932516 1 0 session_id=,req_source=RADIUS,user_name=,service_name=Free-WiFi -MAC-Auth - Service,alerts_present=0,nas_ip=,nas_port=0,conn_status=Unknown,login_status=ACCEPT,error_code=0,mac_address=,timestamp=2025-04-01 16:18:16.294+00,write_timestamp=2025-04-01 16:18:18.058162+00
"sourcetype":"syslog_clearpass" "host":"" hostname:three.example.com <135>2025-04-01 16:19:41,80 CPPM_Dashboard_Summary 38932551 1 0 session_id=,req_source=RADIUS,user_name=,service_name=Free-WiFi -MAC-Auth - Service,alerts_present=0,nas_ip=,nas_port=0,conn_status=Unknown,login_status=ACCEPT,error_code=0,mac_address=,timestamp=2025-04-01 16:18:18.329+00,write_timestamp=2025-04-01 16:18:20.146301+00

 

And I want to join and count the total number of auths for each cphost. So the output would look like this with the data above

 

cphost total_auths
one.example.com 2
two.example.com 1
three.example.com 1
four.example.com 0

 

I know something like this will provide a count of cphosts that show up in the logs

 

dataset = prod_dpa_syslog_raw
| filter _raw_log contains "syslog_clearpass*login*Free-WiFi*login_status*ACCEPT"
| alter cphost = arrayindex(regextract(_raw_log, "hostname:(\S+)"), 0)
| comp count() by cphost as total auths

 

output:

cphost total_auths
one.example.com 2
two.example.com 1
three.example.com 1

 

But I also want to display cphosts that have zero auths. What is a good approach of accomplishing that? I've been messing around with "join" but haven't been able to come up with a good solution.

 

For example I tried:

dataset = prod_dpa_syslog_raw
| filter _raw_log contains "syslog_clearpass*login*Free-WiFi*login_status*ACCEPT"
| alter cphost = arrayindex(regextract(_raw_log, "hostname:(\S+)"), 0)
| join type = left (dataset = cphost_cluster_role) as role role.cphost = cphost
| comp count() as TotalAccepts by cphost

 

But that just ends up providing total count of all and no cphost breakdown

 

output:

totalaccepts cphost
4

 

Any suggestions on how to get the output to look something like this?


cphost total_auths
one.example.com 2
two.example.com 1
three.example.com 1
four.example.com 0

 

Cortex XDR 

1 accepted solution

Accepted Solutions

L6 Presenter

Hello @CesarSaucedo ,

 

To achieve this, perform a left join between the cphost_cluster_role dataset and the aggregated authentication counts from the prod_dpa_syslog_raw dataset. Here's a conceptual query:

dataset = cphost_cluster_role
| fields cphost
| join type = left (
dataset = prod_dpa_syslog_raw
| filter _raw_log contains "syslog_clearpass*login*Free-WiFi*login_status*ACCEPT"
| alter cphost = arrayindex(regextract(_raw_log, "hostname:(\S+)"), 0)
| comp count() as total_auths by cphost
) as auth_counts on cphost = auth_counts.cphost
| alter total_auths = coalesce(auth_counts.total_auths, 0)
| fields cphost, total_auths
| sort asc cphost

 

If you feel this has answered your query, please let us know by clicking like and  on "mark this as a Solution". Thank you.

Ashutosh Patil

View solution in original post

2 REPLIES 2

L6 Presenter

Hello @CesarSaucedo ,

 

To achieve this, perform a left join between the cphost_cluster_role dataset and the aggregated authentication counts from the prod_dpa_syslog_raw dataset. Here's a conceptual query:

dataset = cphost_cluster_role
| fields cphost
| join type = left (
dataset = prod_dpa_syslog_raw
| filter _raw_log contains "syslog_clearpass*login*Free-WiFi*login_status*ACCEPT"
| alter cphost = arrayindex(regextract(_raw_log, "hostname:(\S+)"), 0)
| comp count() as total_auths by cphost
) as auth_counts on cphost = auth_counts.cphost
| alter total_auths = coalesce(auth_counts.total_auths, 0)
| fields cphost, total_auths
| sort asc cphost

 

If you feel this has answered your query, please let us know by clicking like and  on "mark this as a Solution". Thank you.

Ashutosh Patil

Thank you, I was able to get it working with your suggestions.  

  • 1 accepted solution
  • 574 Views
  • 2 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!