Pixlane

AI · 214 File Classes

File Type Detector

Detect a file's true type from its content — not its extension. Powered by Google Magika (a 3 MB neural model running entirely in your browser via WebAssembly + ONNX Runtime). Identifies 214 file categories including source code, documents, images, audio, video, archives, executables, and cryptographic keys.

Drop any files
Ready. Drop files to start.

How to Use File Type Detector in 3 Steps

  1. Drop files. Drag one or more files onto the drop zone, or click to open the file picker. Detection runs locally — nothing leaves your device.
  2. Review detected types. Each file's detected label, group (code, document, image, archive, executable…), MIME type, and confidence score are listed in the results table.
  3. Export results. Copy as JSON or CSV for triaging unknown downloads, validating content against extensions, or scripted pipelines.

Why File Type Detector on Pixlane

File extensions lie. A .txt can be a ZIP archive, a .jpg can be a PHP shell, and many formats have no extension at all. Magika — Google's content-aware classifier — reads only the first and last 4 KB of a file, packs them into a 2048-token tensor, and runs a tiny neural network (214 output classes) to return a robust label. Pixlane ships this exact model inside its WebAssembly bundle so detection is fast, offline, and private.

Frequently Asked Questions

What is Magika?

Magika is an open-source file-type detector by Google (Apache-2.0). It uses a small neural network trained on hundreds of millions of samples to reach over 99% accuracy across 200+ formats — including cases where file(1) and libmagic fail. Pixlane embeds the standard_v3_3 ONNX model (3.1 MB) directly in the browser.

Do my files get uploaded?

No. The entire detection pipeline — reading the first & last 4 KB, preprocessing, inference, postprocessing — runs inside your browser's WebAssembly sandbox. Files of any size can be dropped; only the first and last 4 KB are touched by the detector.

How does it handle tiny or empty files?

Exactly like the Python reference implementation: size 0 returns empty; under 8 meaningful (non-whitespace) bytes triggers a strict UTF-8 validity check and returns txt or unknown without running the model.

Which score should I trust?

The score column is the raw softmax probability. For standard labels, Magika applies a per-label confidence threshold; below it, results fall back to txt (text-looking) or unknown (binary), matching Magika's HIGH_CONFIDENCE mode.

Can it replace libmagic?

For most modern workflows, yes. Magika is stronger on polyglot formats, truncated files, unknown extensions, and ambiguous text types. Some legacy niche formats (obscure mainframe binaries, very old OS artifacts) still need libmagic's rule database.

Related Tools