Dataset Viewer
Auto-converted to Parquet Duplicate
path
stringlengths
68
131
c
int8
-63
59
s
int8
-49
58
ss
int8
-39
46
d
int8
-34
40
size
uint64
3.05k
1.84B
.git/modules/astronomy_submodules/shard_00/python-skyfield/objects/pack/pack-1a302566e865f73a2004537882fd8dc9ca95604f.pack
25
37
-20
26
85,581,351
.git/modules/astronomy_submodules/shard_00/astropy/objects/pack/pack-a11ca99ebc6e1ec0260f730bceb95877a6bccdad.pack
29
20
-22
40
7,326,966
.git/modules/astronomy_submodules/shard_00/open_exoplanet_catalogue/objects/pack/pack-b237b087bc41064b8a607abc81114184546011f2.pack
-56
-21
30
13
3,469,639
.git/modules/astronomy_submodules/shard_00/poliastro/objects/pack/pack-355869f9faae550611b79744efec4d93373f2f81.pack
-1
3
26
-12
11,157,966
.git/modules/astronomy_submodules/shard_00/sunpy/objects/pack/pack-9dae7207f021d98bffb2ab74da552d596d10bc11.pack
-3
46
16
-20
4,677,655
.git/modules/astronomy_submodules/shard_00/apod-api/objects/pack/pack-6b5ec97c21cfe7043e6b5cfd7a882ca0b45ec4b6.pack
46
-17
2
-8
478,601
.git/modules/astronomy_submodules/shard_00/astroML/objects/pack/pack-9f343d8c463fc3dea1e28853ba415e623cc51ebc.pack
-11
13
33
-19
582,820
.git/modules/erdfa-namespace/objects/pack/pack-9ce0df87083151dbb62b6fcd7d2c7e3267fe7b78.pack
47
57
-24
-8
2,162,177
.git/modules/zkmame/objects/pack/pack-8612ed132b74efbf18d1b7a9232882fce120c56d.pack
-63
25
46
-3
1,837,675,554
.git/modules/shard06-terpsichore/choreomaster/objects/pack/pack-472d3522279a3647ebd5fdfdb690e8100ccf4843.pack
10
58
15
38
3,048
.git/modules/shard06-terpsichore/rpemotes-reborn/objects/pack/pack-0c2971264d609ec46dd72417b027de6a848f82e9.pack
51
-3
-11
29
208,853,494
.git/modules/shard06-terpsichore/mrm-dances/objects/pack/pack-b082dc65c67964b96ea37dd05f6805063d4fb9fe.pack
42
-20
36
-34
165,127
.git/modules/shard06-terpsichore/dpemotes/objects/pack/pack-5d560a8c2de1fcf932fb4c3d74b05b2d5693d26f.pack
43
0
-31
35
11,725,470
.git/modules/shard0/pipelight/objects/pack/pack-be869ac1a360cafacec5b8e485adb9f08901ca53.pack
26
-23
-3
-25
9,840,222
.git/modules/monster/objects/pack/pack-809a0bfc171655f6fc5b0eb80a427f9115e3f512.pack
-62
-7
28
26
264,474,814
.git/modules/monster/objects/pack/pack-2e2d9968885a1bbf8244b865460f754f4f1bfc63.pack
-1
0
0
-30
34,470,586
.git/modules/monster/objects/pack/pack-7e91e5f6ce97b5539d929e5ba274f251d4598264.pack
59
-48
46
-8
267,577,180
.git/modules/shard58/harbor/objects/pack/pack-a84ec52695d5b40396d2603f8fdbf6a82c0ad83f.pack
-3
-49
-29
-24
2,924,248
.git/objects/pack/pack-a632347c2ec1edec7a90baa044a9ff9afbf6670b.pack
48
-21
1
-25
1,387,376,426
.git/objects/pack/pack-b4be5f1e94cf57fa19cadf8dc269585bc5d9cfac.pack
-4
22
-39
22
1,060,750,128
.git/objects/pack/pack-333b8fe81ae8f5c0c1077883715b24cfd6b8abd3.pack
-18
-12
10
-34
53,544

Introspector 4D Pack Sharding Dataset

Making the Monster group tractable through 71-cap, Gรถdel encoding, and automorphic introspection

๐ŸŽฏ Motivation

Git repositories are getting massive. The introspector repo contains 5.2 GB across 21 pack files - from tiny astronomy libraries to a 1.7 GB zkmame submodule. How do you efficiently distribute, query, and reason about this data?

Enter 4D hierarchical sharding.

Instead of treating git packs as opaque blobs, we map each pack to a 4-dimensional coordinate system using prime moduli:

71 ร— 59 ร— 47 ร— 41 = 8,062,237 unique addresses

This isn't arbitrary - it aligns with the CICADA-71 framework where:

  • 71 shards map to Monster group operations
  • Prime moduli ensure balanced distribution
  • 4D structure enables hierarchical queries
  • Gรถdel encoding connects proofs to coordinates

๐Ÿš€ Why This Matters

For Distributed Systems

  • Load balancing: No shard gets overloaded (max 3 packs per shard)
  • Locality: Related packs (same module) cluster together
  • Scalability: 8M address space handles massive repos

For AI/ML

  • Feature engineering: 4D coordinates as input features
  • Similarity search: Find related packs by coordinate distance
  • Anomaly detection: Outliers in coordinate space

For Cryptography

  • ZK proofs: Prove pack membership without revealing content
  • Merkle trees: 4D coordinates as leaf nodes
  • Sharded verification: Parallel proof checking across shards

For Mathematics

  • Formal verification: Proven in Lean 4 and MiniZinc
  • Monster group: Top-level mod 71 aligns with sporadic group
  • Number theory: Prime moduli guarantee properties

๐Ÿ“Š Dataset Description

  • Repository: meta-introspector/introspector
  • Packs: 21 git pack files
  • Total Size: 5.2 GB
  • Sharding Algorithm: SHA256 โ†’ (mod 71, mod 59, mod 47, mod 41)
  • Collision Probability: < 0.003% (proven)

๐Ÿงฎ 4D Coordinate System

Each pack is mapped to 4D coordinates using SHA256 hash:

  • c (mod 71): Container shard - Monster group alignment
  • s (mod 59): Subcontainer shard - 59th prime (271)
  • ss (mod 47): Sub-subcontainer shard - 47th prime (211)
  • d (mod 41): Detail shard - 41st prime (179)

Example:

zkmame pack (1.7GB) โ†’ [-63, 25, 46, -3]
python-skyfield (82MB) โ†’ [25, 37, -20, 26]

๐Ÿ“ Files

  • introspector_4d.parquet - Apache Parquet format (3.8KB, LFS)
  • introspector_4d.json - JSON format (3.3KB)

๐Ÿ“ Schema

path: string          - Relative path to pack file
c: int8              - Container coordinate (-71 to 71)
s: int8              - Subcontainer coordinate (-59 to 59)
ss: int8             - Sub-subcontainer coordinate (-47 to 47)
d: int8              - Detail coordinate (-41 to 41)
size: uint64         - Pack file size in bytes

๐Ÿ’ป Usage

Python

import pyarrow.parquet as pq

# Load dataset
table = pq.read_table('introspector_4d.parquet')
df = table.to_pandas()

# Find packs in Monster shard 25
shard_25 = df[df['c'] == 25]

# Largest packs (potential hotspots)
largest = df.nlargest(5, 'size')

# Coordinate distance (similarity)
import numpy as np
def coord_distance(p1, p2):
    return np.sqrt((p1['c']-p2['c'])**2 + (p1['s']-p2['s'])**2 + 
                   (p1['ss']-p2['ss'])**2 + (p1['d']-p2['d'])**2)

DuckDB

-- Shard distribution
SELECT c, COUNT(*) as pack_count, SUM(size) as total_size
FROM 'introspector_4d.parquet'
GROUP BY c
ORDER BY total_size DESC;

-- Find clusters (packs within distance 20)
SELECT a.path, b.path,
       SQRT(POW(a.c-b.c,2) + POW(a.s-b.s,2) + 
            POW(a.ss-b.ss,2) + POW(a.d-b.d,2)) as distance
FROM 'introspector_4d.parquet' a, 'introspector_4d.parquet' b
WHERE a.path < b.path AND distance < 20;

Rust

use parquet::file::reader::SerializedFileReader;

let file = File::open("introspector_4d.parquet")?;
let reader = SerializedFileReader::new(file)?;

for row in reader.get_row_iter(None)? {
    let c = row.get_int(1)?;
    let size = row.get_long(5)?;
    println!("Shard {}: {} bytes", c, size);
}

๐Ÿ”ฌ Formal Verification

This sharding algorithm is mathematically proven correct:

  • Lean 4: 7 theorems proven (primes, bounds, collision probability)
  • MiniZinc: Constraint satisfaction model (uniqueness, balance, locality)

See formal verification docs.

๐ŸŒ Related

๐Ÿ“œ License

This dataset is dual-licensed:

Open Source (Default)

AGPL-3.0 - GNU Affero General Public License v3.0

This ensures that any network service using this data must also be open source.

Commercial License (Available for Purchase)

MIT or Apache-2.0 - For entities that wish to use this dataset without AGPL-3.0 copyleft requirements.

ZK hackers gotta eat! ๐Ÿ•

Contact: [email protected]

For commercial licensing inquiries, custom data formats, or enterprise support.

๐Ÿ™ Citation

@dataset{introspector_4d_2026,
  title={Introspector 4D Pack Sharding Dataset},
  author={Meta-Introspector},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/datasets/introspector/introspector_4d}
}
Downloads last month
10