Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NDJSON for consistency with natively exported log data #110

Closed
philhagen opened this issue Dec 3, 2024 · 8 comments
Closed

Use NDJSON for consistency with natively exported log data #110

philhagen opened this issue Dec 3, 2024 · 8 comments

Comments

@philhagen
Copy link
Contributor

philhagen commented Dec 3, 2024

(This issue is a dependency for philhagen/sof-elk#274)

The current JSON output does not match the format that is natively exported from Azure. The native output is in NDJSON form, or:

{"id":"cc9fdede-0000-1111-ace6-67ffff5cccc1","createdDateTime":"2024-10-12T11:18:01Z",...}
{"id":"cc9fdede-1111-2222-3333-f3fff0000cc1","createdDateTime":"2024-10-12T11:18:02Z",...}

The field names should also match the case of those in the native output (initial letter lowercased with camelCase in remainder of field name). There appear to be other differences between the output of this suite of tools and the natively exported format - for example, there is no "category" set on the SignInLogs sample I received. However, as long as they are consistent between both export processes, everything should be fine.

Output in this format will allow handling the UAL data in the same manner as that of natively exported logs. While this is a specifically noted blocker for SOF-ELK, it would benefit any tooling that parses the UAL from this tool.

@JoeyInvictus
Copy link
Collaborator

JoeyInvictus commented Dec 3, 2024

Hi,

There are some known differences in field names depending on the method of exporting GUI vs Graph, beta, or non-beta cmdlets, etc. This is something Microsoft controls. We simply retrieve the logs and leave them in their default format, without modifying the field names. Attempting to specify and map all the fields would quickly become complex and messy, especially since we’d need to constantly stay updated on any new fields added by Microsoft.

Regarding the sign-in logs, we offer two methods for acquiring them. One method uses Graph, as shown below:

$apiUrl = "https://graph.microsoft.com/beta/auditLogs/signIns"
$response = Invoke-MgGraphRequest -Uri $apiUrl -Method Get -ContentType "application/json; odata.metadata=minimal; odata.streaming=true;" -OutputType Json

We simply write the $response variable to a JSON file without modifying any fields. Additionally, we currently use the -OutputType Json parameter, as NDJSON is not supported in this context.

Using the Azure AD PowerShell module as an alternative, we can retrieve the sign-in logs by running:
Get-AzureADAuditSignInLogs

When comparing the output of the two methods, Graph uses lowercase for field names (e.g., createdDateTime), whereas Azure AD outputs them with uppercase (e.g., CreatedDateTime).

In both cases, we do not modify the fields. To achieve consistent output between the two, we would need to alter the default output provided by Microsoft.

Regarding the NDJSON format, we need to discuss internally how we want to handle this. We must evaluate the potential impact of changing the output from JSON to NDJSON on the tools built around our output (example: https://github.com/evild3ad/Microsoft-Analyzer-Suite) and the workflows created by everyone using the tool loading the data.

We have never experienced issues ourselves, nor have we heard from others that their tools like Splunk, ELK, Data Explorer etc have difficulty loading the default JSON output. However, I’m not too familiar with SOF-ELK and how it processes its data so will need to look into this further.

We’re currently working on a "big" update, and one of the new features will be the -sofelk parameter for the Unified Audit Log scripts to ensure the output is in NDJSON. This could serve as an alternative if we decide not to change the default output from JSON to NDJSON.

@JoeyInvictus
Copy link
Collaborator

Accepted a great pull request from @cirosec, adding an output option for Sof-elk in the Get-UAL scripts. I'll push it to the PowerShell Gallery with the next update. If you want to use it now, you can clone the GitHub repository.

@0xffr
Copy link
Contributor

0xffr commented Dec 5, 2024

Thank you @JoeyInvictus for quickly testing and merging my pull request.

With the version currently on the main branch, it is now possible to obtain UAL logs in a format, which can be parsed by sof-elk using the following command:

Get-UALAll -StartDate 2024-12-01 -Output JSON-ELK -MergeOutput

I am currently working on implementing a similar fix for the Get-ADSignInLogsGraph cmdlet so that the data can be seamlessly imported into sof-elk. However, this will require an additional patch in the sof-elk repository, as the parsing of the AD Graph SignIn log format is currently either outdated or not properly supported.

I hope, that I can publish the pull requests required for that in the coming days.

@philhagen
Copy link
Contributor Author

I'd suggest changing the output name from JSON-ELK to SOF-ELK to avoid the appearance that the output is broadly applicable to Elastic Stack overall rather than the specific parsing expectations for the SOF-ELK implementation.

Could you provide a sample of this output (private via email is fine) so I can test as well?

@0xffr
Copy link
Contributor

0xffr commented Dec 13, 2024

Hey, @philhagen, I agree, it makes sense to rename it to SOF-ELK, as this makes it clear what tool to use this output with.
Maybe @JoeyInvictus can just replace all occurrences of JSON-ELK with SOF-ELK.

The pull-request (#115), which just got merged, should now also allow to import the logs from the Get-ADSignInLogsGraph function to be imported into sof-elk if the accompanying pull-request in the sof-elk repo (philhagen/sof-elk#342) is merged.

I hope, that I can supply you with some test data next week.

@JoeyInvictus
Copy link
Collaborator

@0xffr Yes, I've already replaced everything with SOF-ELK for the next update. I'll add the same logic to the Entra Audit Logs via Graph as well. Not sure if there’s a parser ready in SOF-ELK, but at least the data will be ready when there is.

Working on V3.0.0, if it's live I will push all changes to the Powershell gallery as well

@JoeyInvictus
Copy link
Collaborator

Hi, we just published the new update, so sof-elk should be supported from our side now by using -output sof-elk! Feel free to re-open the issue or create a new one if it’s not working as expected.

@philhagen
Copy link
Contributor Author

philhagen commented Jan 25, 2025

I don't have permission to re-open this issue but would like to request it is because because two of the formats are not parsed. They are not consistent with the native Azure tool/console/API/whatever output that is used in the FOR509 course data (which I'm using as a reference point).

  • The UAL from the new tool seems to parse fine. It is placed in the /logstash/microsoft365/ directory parsing happens quickly and (mostly?) completely. 1032 of the 1086 records in the sample @invictus-korstiaan sent me parse and appear complete. I'll have to defer to you as the experts on whether the parsed records are aligned with those exported from other paths but it looks ok to my untrained eye.
  • The SignInLogs output does not parse. It appears the records this tool creates are almost like the properties subfield object from the records in FOR509 Lab 1.4? But it's not a perfect match and doesn't fully correspond to anything in the Lab 1.4 records.
  • The AuditLogs output does not even load. For this one, it again seems like the MSES tool outputs the top level properties field in the FOR509 Lab 1.4 evidence, but it's not a perfect match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants