GLM-4.7-Flash on Nvidia GB10

Published / by jeff / Leave a Comment

So I have gotten into running my own local llm for privacy reasons, and like to use it to assist with incident response tasks and collecting OSINT, so I want to keep everything local, and only share my searches with the “public” and for this I have a searxng proxy setup, to help anonymize the web traffic (but this is all separate).

I ended up using ollama, simply because this was not working when trying to use vLLM, I’m hoping in a few weeks after some updates, I will be able to swap out ollama for vLLM. This post is hopefully will help whoever finds it or a bot / search engine. The rest of the post was written by them – I don’t claim this is mine, just want to share what eventually worked for me to help others.

If you’ve just taken delivery of an NVIDIA DGX Spark and want to run the latest GLM-4.7-Flash model at full F16 precision using Ollama, this post covers everything we had to figure out the hard way. Hardware: GB10, compute capability sm_121a, CUDA 13.1, 119.6 GiB unified memory, aarch64 Ubuntu 24.04.

What is GLM-4.7-Flash and Why Does It Matter?

GLM-4.7-Flash is a 47-billion parameter Mixture-of-Experts model from Zhipu AI’s zai-org. Despite the “Flash” name suggesting a smaller model, the version number 4.7 is a version identifier — this is a full-weight, production-grade MoE model with:

  • 47B total parameters (MoE architecture)
  • 131,072 token context window (131K)
  • Native tool calling support
  • Released under MIT licence
  • BF16 weights at approximately 59GB on disk (48 safetensor shards)

Running it at full F16 precision on a single machine requires roughly 115GB of GPU memory. The DGX Spark’s 119.6 GiB unified memory pool is one of very few consumer/prosumer platforms that can actually do this without quantisation degradation.

The difference matters for complex reasoning tasks. Q4_K_M quantisation compresses each weight from 16 bits to ~4 bits — a 75% reduction. For simple queries this is largely invisible. For multi-step reasoning chains, technical precision (CVE numbers, exact API signatures, structured outputs), and reliable tool call formatting, full precision is noticeably better.


The Hardware: NVIDIA DGX Spark (GB10)

Quick specs relevant to this guide:

  • GPU: NVIDIA GB10 — compute capability 12.1 (sm_121a)
  • Memory: 119.6 GiB unified CPU/GPU memory
  • CUDA: 13.1
  • Architecture: aarch64 (ARM64)
  • OS: Ubuntu 24.04

The GB10 shipped in early 2026 and at the time of writing the open source ecosystem (vLLM, SGLang) hadn’t fully caught up to sm_121a. This guide uses Ollama, which works reliably on the GB10 today.


Why Not vLLM or SGLang?

We spent considerable time trying both before landing on Ollama. Here’s why they didn’t work at the time:

vLLM 0.14.0

GLM-4.7 uses a glm4_moe_lite model type with MLA (Multi-head Latent Attention). vLLM 0.14.0 doesn’t recognise this architecture — it fails at model load with a weight shape mismatch. This isn’t fixable with config patches; the model architecture support simply isn’t there yet.

SGLang (stock Docker)

SGLang supports GLM-4.7 architecture and has a --tool-call-parser glm47 flag. However, the stock lmsysorg/sglang:latest Docker image bundles a version of Triton whose ptxas compiler doesn’t recognise sm_121a. The GB10 is new enough that the GPU architecture wasn’t known to PyTorch’s published builds at the time of writing.

We got further using scitrera/dgx-spark-vllm:0.14.0rc2-t4 as a base image (which includes NVIDIA’s custom PyTorch 2.10.0-rc6 and Triton 3.5.1 with sm_121a support), but ultimately hit a wall with sgl_kernel: the PyPI binary is compiled for sm100 (H100), not sm121. Building from source on the live machine was too unstable — the CUDA kernel compilation during docker build takes 20-30 minutes on aarch64, pegs available memory, and caused SSH sessions to drop mid-build.

Bottom line: Ollama works today. SGLang on the GB10 will likely be straightforward once the ecosystem catches up — watch this space.


The Working Solution: Ollama with a Custom F16 Modelfile

Step 1: Install Ollama

If you haven’t already:

curl -fsSL https://ollama.com/install.sh | sh

Confirm it’s running:

ollama list

Step 2: Set Ollama Environment Variables

Edit the Ollama systemd service to configure parallel inference and keep models loaded permanently:

sudo systemctl edit ollama

Add the following in the override file:

[Service]
Environment="OLLAMA_NUM_PARALLEL=4"
Environment="OLLAMA_KEEP_ALIVE=-1"
Environment="OLLAMA_REQUEST_TIMEOUT=600"

Then reload and restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama

OLLAMA_NUM_PARALLEL=4 allows up to 4 concurrent requests against the same model. OLLAMA_KEEP_ALIVE=-1 keeps the model permanently loaded in memory (no reload penalty between requests). OLLAMA_REQUEST_TIMEOUT=600 is important — complex reasoning tasks on large context windows can take several minutes and the default timeout is too short.

Step 3: Download the BF16 Weights from HuggingFace

Ollama’s model registry doesn’t carry the full BF16 version of GLM-4.7-Flash. We’ll pull the weights directly from HuggingFace and import them into Ollama.

Install the HuggingFace CLI if you don’t have it:

pip install huggingface-hub --break-system-packages

Download the model (approximately 59GB, 48 safetensor shards):

huggingface-cli download zai-org/GLM-4.7-Flash \
  --local-dir ~/.cache/huggingface/hub/models--zai-org--GLM-4.7-Flash/snapshots/main \
  --local-dir-use-symlinks False

The --local-dir-use-symlinks False flag is critical. Ollama’s security policy refuses to follow symlinks when importing a model. Without this flag, the HF CLI creates a symlinked structure that Ollama will reject with an “insecure path” error.

Once downloaded, note the actual snapshot path:

ls ~/.cache/huggingface/hub/models--zai-org--GLM-4.7-Flash/snapshots/

You’ll see a directory named something like 7dd20894a642a0aa287e9827cb1a1f7f91386b67. Use that exact hash in the next step.

Step 4: Create the Modelfile

Replace <SNAPSHOT_HASH> with the hash from the previous step:

cat > ~/Modelfile-glm-f16 << 'EOF'
FROM /home/<YOUR_USER>/.cache/huggingface/hub/models--zai-org--GLM-4.7-Flash/snapshots/<SNAPSHOT_HASH>
PARAMETER num_gpu 999
PARAMETER num_ctx 131072
PARAMETER temperature 0.3
PARAMETER num_predict 16384
PARAMETER stop "<|user|>"
PARAMETER stop "<|observation|>"
PARAMETER stop "<|endoftext|>"
RENDERER glm-4.7
PARSER glm-4.7

TEMPLATE """[gMASK]<sop>{{ if .System }}<|system|>
{{ .System }}{{ end }}{{ range .Messages }}{{ if eq .Role "user" }}<|user|>
{{ .Content }}{{ else if eq .Role "assistant" }}<|assistant|>
{{ .Content }}{{ else if eq .Role "tool" }}<|observation|>
{{ .Content }}{{ end }}{{ end }}<|assistant|>
"""

SYSTEM You are a helpful AI assistant.
EOF

A few important notes on this Modelfile:

  • RENDERER glm-4.7 and PARSER glm-4.7 — these two lines are essential and easy to miss. They tell Ollama that this model supports GLM-4.7 tool calling format. Without them, Ollama will refuse tool call requests with a “does not support tools” 400 error, even though the model weights absolutely do support it.
  • num_ctx 131072 — sets the full 131K context window. The model supports this natively.
  • num_predict 16384 — maximum output tokens per response. Increase if you need longer generations.
  • The TEMPLATE — GLM-4.7 uses its own chat template format with [gMASK]<sop> tokens. This must be specified explicitly.
  • SYSTEM — replace the placeholder with your actual system prompt.

Step 5: Build the Ollama Model

This step converts the 48 BF16 safetensor shards into GGUF format and imports them into Ollama’s model store. It will take 10-15 minutes and the output model will occupy approximately 115GB in Ollama’s storage.

Run it in a tmux session so it survives any SSH disconnection:

tmux new-session -s f16build
ollama create glm-forensics-f16 -f ~/Modelfile-glm-f16

Detach safely with Ctrl+B then D. Do not press Ctrl+C — this will cancel the build.

To check progress:

tmux attach -t f16build

When complete you’ll see output like:

success
Model 'glm-forensics-f16' created successfully

Confirm the model is listed:

ollama list

You should see glm-forensics-f16:latest at approximately 59GB on disk (115GB when loaded into unified memory).

Step 6: Test the Model

Quick sanity check via curl:

curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-forensics-f16",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }' | python3 -m json.tool

Test tool calling is working (the RENDERER/PARSER lines are what make this work):

curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-forensics-f16",
    "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}],
    "messages": [{"role": "user", "content": "What is the weather in London?"}]
  }' | python3 -m json.tool

You should see a finish_reason: "tool_calls" in the response with a properly structured tool call object.


Checking Memory Usage

When GLM-4.7-Flash F16 is loaded, it occupies approximately 115GB of the 119.6GB unified memory pool. Confirm what’s loaded:

ollama ps

Example output:

NAME                     ID            SIZE      PROCESSOR    UNTIL
glm-forensics-f16:latest abc123def456  115 GiB   100% GPU     Forever

With OLLAMA_KEEP_ALIVE=-1 the model stays loaded permanently. This is the right setting for a dedicated inference machine — there’s no cold start penalty when the model is needed.


Context Window and Memory Considerations

The 131K context window is one of the most valuable features of this setup. However, there’s an important trade-off to understand:

KV cache (key-value cache, used to store conversation history for attention) grows linearly with context length. At 131K tokens, the KV cache for a single conversation can consume 10-30GB depending on the model’s configuration. Since the model itself uses ~115GB of the 119.6GB available, there is limited headroom.

In practice, Ollama handles this gracefully — it will start offloading KV cache to system RAM if GPU memory runs short, which slows generation speed. For most use cases, keeping individual conversations under 50-60K tokens keeps everything comfortably in unified memory.

If you’re running shorter conversations but want true maximum throughput, you can set a lower context in the Modelfile:

PARAMETER num_ctx 65536

This halves the maximum context to 65K tokens but leaves significantly more headroom in the memory pool.


Common Issues and Fixes

400 Error: “model does not support tools”

This means the RENDERER glm-4.7 and PARSER glm-4.7 lines are missing from your Modelfile. Recreate the model with those lines added. The model name you’re calling against also needs to be the one you created with those lines — not the base glm-4.7-flash:latest pulled from Ollama’s registry.

“insecure path” error during ollama create

You downloaded the HuggingFace weights with symlinks. The HF CLI creates a blob cache with symlinked snapshot directories, which Ollama refuses to import for security reasons. Re-download with --local-dir-use-symlinks False.

Ollama connection drops / timeout during long generations

Two causes:

  1. Add Environment="OLLAMA_REQUEST_TIMEOUT=600" to the Ollama systemd service as described in Step 2.
  2. Your client-side timeout may be shorter. Ensure whatever is calling Ollama (your application, curl, etc.) has a timeout of at least 600 seconds for complex tasks.

ollama create fails immediately with “pull model manifest: file does not exist”

Your FROM path is wrong. Check the exact snapshot hash:

ls ~/.cache/huggingface/hub/models--zai-org--GLM-4.7-Flash/snapshots/

Copy that hash exactly into the Modelfile path.

Model is slow / generating at low tokens/sec

The DGX Spark’s unified memory architecture has lower memory bandwidth (~273 GB/s) compared to discrete GPU VRAM. GLM-4.7-Flash F16 generates at roughly 10-25 tokens/second depending on context length — noticeably slower than a quantised model on a discrete GPU with high bandwidth memory, but producing higher quality output. This is the trade-off of running full precision on this hardware.


Comparing Q4 vs F16 in Practice

We ran both versions side by side for several weeks. The honest summary:

  • Simple Q&A, coding, summarisation: Very little difference. Q4 is faster and good enough.
  • Multi-step reasoning chains: F16 maintains coherence better over long chains. Q4 occasionally loses track of earlier context.
  • Technical precision (specific CVEs, API details, structured outputs): F16 hallucinates less. The difference is noticeable when accuracy matters.
  • Tool call reliability: F16 makes fewer formatting errors and chooses tools more consistently.
  • Long document analysis (>30K tokens): F16 draws better inferences from long contexts.

For any workload where reasoning quality matters more than raw speed, F16 is the right choice if your hardware can support it. The DGX Spark is one of the very few platforms where this is even possible with GLM-4.7-Flash.


What’s Next

A few things worth watching:

  • SGLang support for sm_121a: When this lands, it will unlock significantly higher throughput via continuous batching and PagedAttention. The Dockerfile is essentially working (see the SGLang issues above) — it just needs a stable build environment.
  • GLM-4.5 and GLM-5: Zhipu AI has released larger models in the GLM family. GLM-4.5 Air (106B total / 12B active MoE) would require two GB10 boxes at F16. GLM-5 (744B total / 40B active) is API-only for most hardware configurations.
  • Two-box setups: A second DGX Spark connected via NVLink or high-speed networking opens up the larger models. GLM-4.5 Air at F16 requires approximately 212GB — right in the sweet spot for two GB10s combined.

Summary: The Minimal Steps

  1. Install Ollama and configure OLLAMA_NUM_PARALLEL=4OLLAMA_KEEP_ALIVE=-1OLLAMA_REQUEST_TIMEOUT=600 in systemd
  2. Download BF16 weights: huggingface-cli download zai-org/GLM-4.7-Flash --local-dir-use-symlinks False
  3. Create a Modelfile with the correct FROM path, RENDERER glm-4.7PARSER glm-4.7, GLM chat template, and stop tokens
  4. Run ollama create glm-forensics-f16 -f ~/Modelfile-glm-f16 in tmux
  5. Test with a tool call to confirm the RENDERER/PARSER lines are working

The step that trips most people up is the RENDERER/PARSER omission — models load and respond fine without those lines, but silently fail on any tool call request. Don’t skip them.

Investigating Teams Logs

Published / by jeff / Leave a Comment

Microsoft Teams logs contain information about various user activities within the Teams platform, such as messaging, meetings, calls, and other interactions. These logs can be accessed through the Microsoft 365 Compliance Center’s Audit log search or by using the Office 365 Management Activity API.

Here’s a list of some important fields available in Microsoft Teams logs, along with a brief description of what they represent:

  1. CreationTime: The date and time (UTC) when the event occurred.
  2. UserId: The ID of the user who performed the action.
  3. UserKey: The user key of the user who performed the action. It can be a user’s Azure AD ID or an external user’s email address.
  4. UserType: Indicates whether the user is internal or external to the organization.
  5. UserAgent: Information about the device, operating system, or client app used by the user who performed the action.
  6. Operation: The type of action performed by the user, such as “TeamCreated”, “ChannelDeleted”, “MeetingStart”, or “CallRecorded”.
  7. Workload: The Microsoft 365 service associated with the event. For Teams logs, this will be “MicrosoftTeams”.
  8. ResultStatus: The result of the action, such as “Succeeded” or “Failed”.
  9. ClientIP: The IP address of the user who performed the action.
  10. CorrelationId: The unique identifier for the event, which can be used to correlate multiple related events in the log.
  11. ObjectId: The ID of the object affected by the action, such as a team, channel, or message.
  12. TargetUserId: The ID of the user affected by the action, such as the recipient of a message or the user added to a team.
  13. TeamGuid: The unique identifier for the team associated with the event.
  14. ChannelGuid: The unique identifier for the channel associated with the event.
  15. MessageGuid: The unique identifier for the message associated with the event.
  16. MeetingGuid: The unique identifier for the meeting associated with the event.
  17. CallGuid: The unique identifier for the call associated with the event.
  18. ItemName: The name of the object affected by the action, such as a team or channel name.
  19. ItemType: The type of object affected by the action, such as “Team”, “Channel”, “Message”, “Meeting”, or “Call”.
  20. CustomProperties: Additional custom properties specific to the event, such as the meeting title, call duration, or message content.

These fields provide detailed information about the user activities within Microsoft Teams, allowing administrators and security professionals to monitor and analyse events for auditing, compliance, and security purposes.

Ransomware and Incident Response: Dealing with Cyber Threats

Published / by jeff / Leave a Comment

Ransomware is a major threat to businesses, governments, and individuals. It is a type of malware that targets computer systems and encrypts the files on them. The attackers then demand payment, usually in the form of cryptocurrency, for the decryption keys. Ransomware attacks have become increasingly common, and they can have serious consequences if not addressed quickly and effectively. In this blog post, we will explore the dangers posed by ransomware and the importance of incident response in dealing with cyber threats.

What is Ransomware?

Ransomware is a type of malware that encrypts the files on a computer system and demands payment for the decryption keys. There are different types of ransomware, but they all work in a similar way: once the malware infects a system, it encrypts the files and displays a message on the victim’s screen, demanding payment in exchange for the decryption keys. In many cases, the attackers threaten to delete the files if the ransom is not paid.

Ransomware attacks can be devastating for organizations and individuals. They can cause major disruptions to business operations, resulting in financial losses and reputational damage. In some cases, they can also result in the loss of sensitive data, which can have legal and regulatory implications.

How Does Ransomware Spread?

Ransomware can spread in a variety of ways, including through phishing emails, malicious websites, and infected software. It often exploits vulnerabilities in outdated software or operating systems. Once ransomware infects a system, it can quickly spread to other connected devices or network resources.

Why Is Incident Response Important?

Incident response is the process of responding to cyber threats and minimizing their impact. It involves a coordinated effort between IT professionals, security teams, and other stakeholders to detect, contain, and mitigate the damage caused by a cyber attack.

An effective incident response plan is critical for dealing with ransomware attacks. It can help organizations minimize the impact of an attack and reduce the time it takes to recover from it. A good incident response plan should include the following steps:

1. Detection: The first step in incident response is detecting the attack. This can be done with the help of security tools, monitoring systems, and user reports.

2. Containment: Once an attack has been detected, the next step is to contain it. This involves isolating the infected systems or devices to prevent the attack from spreading further.

3. Investigation: After the attack has been contained, the next step is to investigate it. This involves identifying the type of ransomware, how it entered the system, and what files have been encrypted.

4. Recovery: Once the investigation is complete, the next step is to recover from the attack. This involves restoring the affected systems or devices from backups, decrypting files, and patching vulnerabilities that were exploited by the attackers.

5. Post-incident analysis: The final step is to conduct a post-incident analysis to identify areas for improvement in the incident response plan.

Conclusion

Ransomware is a serious threat to organizations and individuals. It can cause significant financial and reputational damage, as well as the loss of sensitive data. Incident response is critical for dealing with ransomware attacks and minimizing their impact.

To protect against ransomware, organizations should take a proactive approach to cybersecurity. This includes keeping software up-to-date, training employees on how to recognize phishing attacks, and implementing security measures such as firewalls and antivirus software.

In conclusion, ransomware attacks are here to stay, and the best defense is a good offense. By being prepared and having an effective incident response plan in place, organizations can reduce the risk of a successful attack and minimize its impact if one does occur.

Nokoyawa Ransomware

Published / by jeff / Leave a Comment

In recent months, new ransomware has been discovered called Nokoyawa, which has become a considerable threat for businesses worldwide. Nokoyawa targets Windows operating systems and propagates through the network via remote execution protocols, which enables the ransomware to impact a large number of systems with minimal exposure.

Nokoyawa encrypts files on infected systems and appends filenames with “.nokoyawa” extension. It then creates a ransom note named “HOW_TO_RECOVER_YOUR_FILES.html” in all encrypted directories, with instructions on how to pay the ransom amount to get the decryption key. The ransom note also serves as proof of the successful encryption of files.

Nokoyawa has multiple communication channels with the command and control infrastructure. The malware sends information about the infected system to the remote server, receives instructions from the server, and sends back the necessary logs and user credentials back to the server. In this way, the ransomware makes it almost impossible to track down the attacker’s location.

To identify the presence of Nokoyawa ransomware, we have observed some indicators of compromise (IOCs) in the infected systems. The IOCs are as follows:

  • Network traffic to IP 103.205.134.171 on port 443
  • Network traffic to IP 45.145.95.244 on port 80
  • IOCs in PowerShell command-lines such as Base64-encoded strings, file paths, processes, and registry keys

Organizations can mitigate the risks of Nokoyawa by implementing proper security measures such as data backup and recovery systems, file and folder permission policies, email filters, and antivirus programs. Additionally, keeping systems and software up to date by applying security patches can also help to prevent the spread of Nokoyawa ransomware.

In conclusion, Nokoyawa ransomware is a significant threat to businesses and organizations. Recognizing the IOCs and applying preventive measures can help organizations safeguard against this malicious software. Maintaining updated security standards and being vigilant about suspicious network activity are essential components of a proactive security strategy.


https://securelist.com/nokoyawa-ransomware-attacks-with-windows-zero-day/109483/

Steps to respond to ransomware

Published / by jeff / Leave a Comment

In the face of increasing ransomware attacks, it has become essential to understand the necessary steps to respond to such threats effectively. If you suspect ransomware on your system, it’s imperative to take prompt action and follow the appropriate response steps to minimize the impact and recover data. Firstly, it’s crucial to disconnect the infected system from the internet to prevent further propagation of the ransomware throughout the network. Next, you must identify the type of ransomware via its extension or ransom note left on the system. It’s important to gather as much information as possible about the ransomware to determine the appropriate response.


If adequate backups of the affected data are available, it’s essential to restore them immediately. Ensure that you verify their integrity and perform a scan for any remaining traces of the ransomware. If backups aren’t available, consult with IT security professionals for possible decryption tools or approaches. However, using decryption tools can be risky and may result in additional system damage, so it should only be attempted under expert guidance.


If ransom payment is considered, it is strongly advised to consult law enforcement and IT security experts before proceeding. Ransom payment may not guarantee the safe recovery of data and can incentivize further ransomware attacks. After recovery, it’s essential to assess and improve system security to prevent future ransomware threats. Regularly updating software, implementing firewalls and antivirus programs, and educating employees on best cybersecurity practices can significantly reduce the risk of ransomware attacks.


To sum it up, responding to ransomware requires a quick response, identifying the ransomware type, restoring backups, consulting IT security professionals for decryption, considering legal and expert advice before making ransom payment, and implementing improved system security measures. Taking these steps can ensure an effective response to ransomware attacks and protect data from future threats.

Soc in a Box

Published / by jeff / Leave a Comment

Well not really, but I’m going to write a series of posts that will all tie together, which can be a very useful tool for anyone interested in having a security home lab, or even in a new or established security operations centre.

I am going to be using open source software, and showing how they can be used together and create a pretty awesome environment, that in my opinion rivals or if not better than many of the paid and expensive tools in the security industry.

Over the next few weeks and months, I will create guides for the following.

Cuckoo

The Hive

MISP

Security Onion

Elastic Stack

Google Rapid Response

I’m not necessary going to create guides in the order listed above, however I will be starting with cuckoo.

Cuckoo is a fun place to start as you can get a pretty awesome malware sandbox analysis tool up and running in a fairly short amount of time, and see real results and benefits from it.  There are so many ways you can customise it and get it working for how you want it in your own environment.  Why pay a 3rd party for your malware analysis when you can have a free and powerful version of your own.

Anyhow, enough jibber jabbing.  Time for the first update!

Boozallen Report on Petya

Published / by jeff / Leave a Comment

I came across this write up by boozallen yesterday, and found it had some very interesting thoughts and insight to how and what happened.

 

https://www.boozallen.com/content/dam/boozallen_site/sig/pdf/white-paper/telebots-group-and-petya.pdf

 

1. Four VirusTotal users uploaded the compiled VBS backdoors along with other malicious files, including the
TeleBots telegram-based backdoor, PowerShell post-exploitation scripts, Mimikatz, and other tools. For each
user, these uploads occurred within the same one- to two-day time period.
2. In most cases, these files were uploaded several months prior to the 27 June Petya incident.
3. Booz Allen Cyber4Sight also determined that in several cases, these submitters also uploaded files
associated with the MEDoc update utility to VirusTotal. This shows that these submitters were also likely
users of the MEDoc software, and the inclusion of these files with the files identified in number 1 (above)
demonstrates that MEDoc-related processes may have facilitated the installation vector for this software.

 

These past few months have been quite interesting.  The scale and ease of WannaCry and the more recent  Petya/Non Petya attacks, have created a greater awareness for individuals outside of the security world.  Major news outlets are interested in these events as they transpire and this can only be a good thing.  I still believe we are many years away from individuals and business truly changing their mindsets and realise that just reacting to these events is not enough, and more time and effort is spent on how these applications are designed and how we approach security.  We need to try harder to make applications and hardware secure by design and not rely on 3rd party products afterwards to make the product “secure”.

We are going to have several more large scale events like this until the mindset changes, humans are stubborn and we do not like to change – however this is something we must do.

 

 

Talos Update on M.E.Doc

Published / by jeff / Leave a Comment

http://blog.talosintelligence.com/2017/07/the-medoc-connection.html?m=1

Summary

The Nyetya attack was a destructive ransomware variant that affected many organizations inside of Ukraine and multinational corporations with operations in Ukraine. In cooperation with Cisco Advanced Services Incident Response, Talos identified several key aspects of the attack. The investigation found a supply chain-focused attack at M.E.Doc software that delivered a destructive payload disguised as ransomware. By utilizing stolen credentials, the actor was able to manipulate the update server for M.E.Doc to proxy connections to an actor-controlled server. Based on the findings, Talos remains confident that the attack was destructive in nature. The effects were broad reaching, with Ukraine Cyber police confirming over 2000 affected companies in Ukraine alone.
This is another good article and write up by Talos.
Gives a lot more useful insight as to how this happened, another good read, will be interesting to see how this continues to develop over the next few days and weeks.

Backdoor in M.E.Doc Application

Published / by jeff / Leave a Comment

I came across an interesting article today, with regards to the Petya / NotPetya cyber attack from last week.  This is a very good write up and analysis of how the organisation M.E.Doc appears to have been compromised and used to spread the malware in a series of updates for the software it produces.

This demonstrates how devastating these types of compromises can be and as a defender can make it very difficult to identify and stop this type of attack from happening, if you happen to be the target of said attack.

I suggest you read this very good article!

 

https://www.welivesecurity.com/2017/07/04/analysis-of-telebots-cunning-backdoor/

Analysis of TeleBots’ cunning backdoor

On the 27th of June 2017, a new cyberattack hit many computer systems in Ukraine, as well as in other countries. That attack was spearheaded by the malware ESET products detect as Diskcoder.C(aka ExPetr, PetrWrap, Petya, or NotPetya). This malware masquerades as typical ransomware: it encrypts the data on the computer and demands $300 bitcoins for recovery. In fact, the malware authors’ intention was to cause damage, so they did all that they could to make data decryption very unlikely.

 

Another good write up by bleeping computer that contains more information.

 

https://www.bleepingcomputer.com/news/security/ukrainian-police-seize-servers-from-where-notpetya-outbreak-first-spread/

Conspiracy theories

Last week, a blog post from a Ukrainian web developer went viral, after it hinted that the real culprit behind the hacked server could have been M.E.Doc’s web host, Wnet, a company that has been accused of having ties to Russia’s intelligence service (FSB).

An investigation into the man’s accusations revealed that the SBU had raided the web host on June 1, for “illegal traffic routing to Crimea in favor of Russian special services.”

 

View of someone who was impacted by Petya

Published / by jeff / 3 Comments on View of someone who was impacted by Petya

http://colsec.blogspot.de/2017/06/petya-outbreak-june-27th.html

 

My machine –

Domain joined Windows 10 Enterprise 64bit running McAfee AV + Encrypted HDD. Fully patched with June’s updates and manually disabled/removed SMBv1.

Hit at 12:40 UK time with a BSoD. Reboot “Please install operating system – no boot device”.

 

And the follow up

http://colsec.blogspot.de/2017/06/petyaa-infection-summary-of-events.html

 

I’ll just put this up here to summarise what happened and how.

We assume 1 PC was infected, that machine provided the virus with some credentials. Could have been a workstation admin’s account, giving the virus admin rights to all PCs in the local area. Over time, it must have picked up Domain Admin rights as it spread, then hitting Domain Controllers and all other Windows servers with it’s PSEXEC/WMIC code. The rest is history. We lost PCs that were encrypted with McAfee Disk Encryption due to corrupted MBR, PCs that were not encrypted with McAfee showed the ransom message.

 

This is a good demonstration of making sure everything is 100% patched and not nearly patched.  It is difficult to keep older machines patched and updated in an enterprise environment, however when these systems are designed and implemented, we should be thinking and taking into consideration how we are going to update them and keep them secure, otherwise we will have to deal with the events described above, again and again.