Post

Infobahn Ctf 2025 Disthis (rev)

Infobahn Ctf 2025 Disthis (rev)

Hey there!

This is a writeup of the “disthis” CTF challenge, a Reverse Engineering chall.

Challenge Overview

  • Name: disthis
  • Category: Reverse Engineering
  • Handout: output.pyc (10.7 MB)
  • Description: An obfuscated Python 3.13 pyc file that performs complex checks on an input file.

بِسْمِ اللهِ الرَّحْمٰنِ الرَّحِيْمِ

Part 1: The Failure of Static Analysis

The first step was to analyze the provided handout, output.pyc. The file size was a massive 10.7 MB, which is orders of magnitude larger than a typical script. This hinted at obfuscation, maybe control flow flattening.

Running file confirmed the version:

1
2
$ file output.pyc
output.pyc: python 3.13 byte-compiled

Since Python 3.13 is new, the standard toolkit immediately broke down.

  1. Decompilers: Tools like uncompyle6 and decompyle3 crashed instantly. They rely on specific bytecode mappings, and Python 3.13 changed the opcode mapping and introduced an adaptive interpreter.
  2. pycdc: Even the robust C++ decompiler pycdc outputted garbage instructions and then segfaulted due to changes like the removal of STORE_FAST in favor of optimized localized storage.

I Decided to spin up a Docker container for Python 3.13 and run python3’s dis that way:

1
python3 -m dis output.pyc

This resulted in an IndexError: tuple index out of range. This most likely meant that the instructions were referencing constants that didn’t exist in the file’s metadata. I decided to move on to Dynamic analysis, as static analysis didn’t prove to work.

Part 2: Pivoting to Dynamic Instrumentation

If I couldn’t decompile the code, I had to see it run. Python is dynamic, allowing us to monkey-patch almost everything at runtime.

I made a loader script that used marshal to load the code object, then I replaced the global __builtins__ with my own.The biggest part of the technique was a Tracked class. This class wrapped standard integers, storing the value (so the program wouldn’t crash) but also maintaining a string label representing the history of operations.

I patched every method (__add__, __xor__, __eq__, etc.) to make sure i caught everything needed/

Part 3: The “One Equation, One Byte” Failure

I ran the initial harness feeding it a dummy file of the letter “a” repeated. The logs output 40,000 lines of equations like:

1
[COMPARE] (((int(196)^97)+int(71))&int(255)) == int(148)

(in the Equation, 97 represents our byte, “a” in ASCII.) I assumed a linear relationship: Equation N checks Byte N. I wrote a solver to brute-force the bytes based on this.

This recovered a file that was 39,317 bytes long, but the expected size was 39,944 bytes. The PNG header was valid, but the image data was truncated.

The failure happened because the obfuscator did not check bytes linearly. Some bytes were checked via equations that had multiple solutions (e.g., (x & 0xF0) == 0x30 allows 16 values). My solver was discarding these equations because it didn’t know which byte was being checked, preventing me from combining multiple constraints.

Part 4: Per-Byte Index Tracking

To fix this, I needed to know that Equation A applies to Byte[0] and Equation B also applies to Byte[0], so I could intersect their solution sets.

I upgraded the instrumentation with two key classes:

  1. TrackedInt: A subclass of int that carried an .index attribute.
  2. FakeFile: A custom file object that, when read, returned TrackedInt objects tagged with their file offset.

When the obfuscator called int() on the file data, I detected the TrackedInt and updated the label to b[index]. The logs transformed into this:

1
2
[COMPARE] ((b[123] << 4) | (b[123] >> 4)) == int(20)
[COMPARE] (b[123] ^ 0xFF) == int(235)

Now I could see that b[123] was constrained by two equations. Even if the first equation had 16 solutions, the intersection of the solutions resulted in exactly one valid byte.

Part 5: The Final Solution

The final script combines the instrumentation harness and the solver. It runs the bytecode, captures the constraints, solves them per-byte by intersecting valid values, and reconstructs the image.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
#!/usr/bin/env python3
import argparse, ast, builtins, io, marshal, operator, os, re, types
from collections import defaultdict
from tqdm import tqdm



compare_lines: list[str] = []
REAL_INT = int   # keep original int for internal use



class Tracked:
    def __init__(self, value, label=None):
        self.value = value
        self.label = label or repr(value)

    def __repr__(self):
        return f"<Tracked {self.label}={self.value!r}>"

    # int-like
    def __int__(self):
        return REAL_INT(self.value)
    def __index__(self):
        return REAL_INT(self.value)

    # arithmetic
    def __add__(self, other): return Tracked(self.value + getv(other), f"({self.label}+{getl(other)})")
    def __sub__(self, other): return Tracked(self.value - getv(other), f"({self.label}-{getl(other)})")
    def __mul__(self, other): return Tracked(self.value * getv(other), f"({self.label}*{getl(other)})")
    def __floordiv__(self, other): return Tracked(self.value // getv(other), f"({self.label}//{getl(other)})")

    # comparison logger
    def _cmp(self, op, opname, other):
        l, r = self.value, getv(other)
        result = op(l, r)
        msg = f"[COMPARE] {self.label} {opname} {getl(other)} -> {result}"
        print(msg)
        compare_lines.append(msg)
        return result

    def __eq__(self, other): return self._cmp(operator.eq, "==", other)
    def __ne__(self, other): return self._cmp(operator.ne, "!=", other)
    def __lt__(self, other): return self._cmp(operator.lt, "<", other)
    def __le__(self, other): return self._cmp(operator.le, "<=", other)
    def __gt__(self, other): return self._cmp(operator.gt, ">", other)
    def __ge__(self, other): return self._cmp(operator.ge, ">=", other)

    # bitwise
    def __xor__(self, other): return Tracked(self.value ^ getv(other), f"({self.label}^{getl(other)})")
    def __and__(self, other): return Tracked(self.value & getv(other), f"({self.label}&{getl(other)})")
    def __or__(self, other):  return Tracked(self.value | getv(other), f"({self.label}|{getl(other)})")
    def __invert__(self):     return Tracked(~self.value, f"(~{self.label})")

    # reflected arithmetic
    def __radd__(self, other): return Tracked(getv(other) + self.value, f"({getl(other)}+{self.label})")
    def __rsub__(self, other): return Tracked(getv(other) - self.value, f"({getl(other)}-{self.label})")
    def __rmul__(self, other): return Tracked(getv(other) * self.value, f"({getl(other)}*{self.label})")
    def __rfloordiv__(self, other): return Tracked(getv(other) // self.value, f"({getl(other)}//{self.label})")
    def __rtruediv__(self, other):  return Tracked(getv(other) / self.value, f"({getl(other)}/{self.label})")

    # shifts
    def __lshift__(self, other): return Tracked(self.value << getv(other), f"({self.label}<<{getl(other)})")
    def __rshift__(self, other): return Tracked(self.value >> getv(other), f"({self.label}>>{getl(other)})")
    def __rlshift__(self, other): return Tracked(getv(other) << self.value, f"({getl(other)}<<{self.label})")
    def __rrshift__(self, other): return Tracked(getv(other) >> self.value, f"({getl(other)}>>{self.label})")

    __rxor__ = __xor__
    __rand__ = __and__
    __ror__  = __or__


def getv(x): return x.value if isinstance(x, Tracked) else x
def getl(x): return x.label if isinstance(x, Tracked) else repr(x)


# --------------------------------------------------------------------
# Per-byte tracking: TrackedInt, TrackedBytes, FakeFile
# --------------------------------------------------------------------

class TrackedInt(int):
    """Represents a single byte at position `index`."""
    def __new__(cls, value, index: int):
        obj = int.__new__(cls, value)
        obj.index = index
        return obj

    def __repr__(self):
        # IMPORTANT: index is a plain int, so this prints b[123]
        return f"b[{self.index}]"


class TrackedBytes(bytes):
    """Bytes object that remembers offset and yields TrackedInt on indexing."""
    def __new__(cls, data: bytes, offset: int):
        obj = bytes.__new__(cls, data)
        obj._offset = offset
        return obj

    def __getitem__(self, idx):
        # idx can be a Tracked, or int, or slice.
        if isinstance(idx, slice):
            start = 0 if idx.start is None else REAL_INT(idx.start)
            sub = bytes.__getitem__(self, slice(start, idx.stop, idx.step))
            return TrackedBytes(sub, self._offset + start)

        idx = REAL_INT(idx)  # force plain int, avoid nested Tracked indices
        if idx < 0:
            idx += len(self)
        val = bytes.__getitem__(self, idx)
        return TrackedInt(val, self._offset + idx)

    def __iter__(self):
        for i in range(len(self)):
            yield self[i]


class FakeFile(io.BytesIO):
    """Fake file filled with 'a', returning TrackedBytes on read()."""
    def __init__(self, length: int):
        super().__init__(b"a" * length)  # 0x61 = 97

    def read(self, n=-1):
        pos = self.tell()
        raw = super().read(n)
        return TrackedBytes(raw, pos)




def run_tracked_pyc_collect(pyc_path: str, fake_len: int) -> list[str]:
    global compare_lines
    compare_lines = []

    with open(pyc_path, "rb") as f:
        f.read(16)  # skip pyc header
        code = marshal.load(f)

    # Keep original builtins.int handy
    real_int = REAL_INT

    def fake_input(prompt=None):
        return "flag.txt"

    def fake_open(path, *a, **kw):
        return FakeFile(fake_len)

    def fake_exit(code=0):
        raise StopIteration

    def tracked_int(x=0, base=10):
        # If this int() is being called on a TrackedInt (file byte),
        # wrap it as a Tracked with label b[index].
        if isinstance(x, TrackedInt):
            v = real_int(x)
            return Tracked(v, label=f"b[{x.index}]")
        # Otherwise just make a Tracked constant
        v = real_int(x, base) if isinstance(x, str) else real_int(x)
        return Tracked(v, label=f"int({x})")

    b = dict(vars(builtins))
    b.update({
        "input": fake_input,
        "open":  fake_open,
        "exit":  fake_exit,
        "int":   tracked_int,
    })

    g = {"__builtins__": b}
    func = types.FunctionType(code, g)

    try:
        func()
    except StopIteration:
        print("[+] stopped cleanly")
    except Exception as e:
        print("[!] exception", e)

    return compare_lines


# --------------------------------------------------------------------
# AST evaluator: evaluate expressions with a single variable x
# --------------------------------------------------------------------

def eval_ast(node: ast.AST, x: int):
    if isinstance(node, ast.Constant):
        return node.value

    if isinstance(node, ast.Name):
        if node.id == "x":
            return x
        raise ValueError(f"Unknown name {node.id!r}")

    if isinstance(node, ast.UnaryOp):
        v = eval_ast(node.operand, x)
        if isinstance(node.op, ast.Invert): return ~v
        if isinstance(node.op, ast.UAdd):   return +v
        if isinstance(node.op, ast.USub):   return -v
        raise ValueError("Unsupported unary op")

    if isinstance(node, ast.BinOp):
        L = eval_ast(node.left, x)
        R = eval_ast(node.right, x)
        op = node.op
        if   isinstance(op, ast.Add):    return L + R
        elif isinstance(op, ast.Sub):    return L - R
        elif isinstance(op, ast.BitXor): return L ^ R
        elif isinstance(op, ast.BitAnd): return L & R
        elif isinstance(op, ast.BitOr):  return L | R
        elif isinstance(op, ast.LShift): return L << R
        elif isinstance(op, ast.RShift): return L >> R
        else:
            raise ValueError("Unsupported binop")

    if isinstance(node, ast.Compare):
        L = eval_ast(node.left, x)
        for op, comp in zip(node.ops, node.comparators):
            R = eval_ast(comp, x)
            if   isinstance(op, ast.Eq):  ok = (L == R)
            elif isinstance(op, ast.NotEq): ok = (L != R)
            elif isinstance(op, ast.Lt):    ok = (L <  R)
            elif isinstance(op, ast.LtE):   ok = (L <= R)
            elif isinstance(op, ast.Gt):    ok = (L >  R)
            elif isinstance(op, ast.GtE):   ok = (L >= R)
            else:
                raise ValueError("Unsupported cmp op")
            if not ok:
                return False
            L = R
        return True

    if isinstance(node, ast.Call):
        if not isinstance(node.func, ast.Name):
            raise ValueError("Unsupported call")
        func = node.func.id
        args = [eval_ast(a, x) for a in node.args]
        if func == "int":
            if len(args) == 0:
                return 0
            if len(args) == 1:
                return int(args[0])
            if len(args) == 2:
                return int(args[0], args[1])
            raise ValueError("int() with >2 args not supported")
        raise ValueError(f"Unsupported function: {func}")

    raise ValueError(f"Unsupported AST node: {ast.dump(node)}")


def solve_expr_for_index(expr: str, idx_label: str) -> set[int]:
    """
    Treat every occurrence of the given label (like 'b[17463]' or
    'b[<Tracked (...)=17463>]') as variable x and return all x in 0..255
    that make the expression True.
    """
    # Replace that exact label with x everywhere
    expr_sub = expr.replace(idx_label, "x")
    try:
        root = ast.parse(expr_sub, mode="eval").body
    except Exception:
        return set()

    sols = set()
    for xv in range(256):
        try:
            if eval_ast(root, xv):
                sols.add(xv)
        except Exception:
            # if evaluation blows up, treat as no information
            return set()
    return sols


# --------------------------------------------------------------------
# Group equations by byte index, intersect candidate sets
# --------------------------------------------------------------------

def extract_equation(line: str) -> str | None:
    m = re.search(r'\[COMPARE\]\s*(.*?)\s*->', line)
    return m.group(1).strip() if m else None


def parse_indices_with_label(expr: str):
    """
    Return pairs (index:int, label:str) for each distinct b[...] occurrence.

    Supports both:
      - b[123]
      - b[<Tracked (0+int(17463))=17463>]
    """
    out = []
    # match the entire b[...] token
    for m in re.finditer(r"b\[(.*?)\]", expr):
        raw_inside = m.group(1)
        label = m.group(0)  # full: "b[...]"
        # try to find trailing '=NUMBER>' pattern
        m_num = re.search(r"=(\d+)>$", raw_inside)
        if m_num:
            idx = int(m_num.group(1))
        else:
            # maybe it's plain number
            if raw_inside.isdigit():
                idx = int(raw_inside)
            else:
                # give up if we can't parse
                continue
        out.append((idx, label))
    # unique by (idx,label)
    return list({(i, l) for (i, l) in out})


def reconstruct_bytes(exprs: list[str], length: int) -> bytes:
    # group: index -> list of (expr, label_for_that_index_in_expr)
    by_idx = defaultdict(list)

    for expr in exprs:
        idx_list = parse_indices_with_label(expr)
        if len(idx_list) != 1:
            # skip eqs that either don't involve b[...] or involve more than one index
            continue
        idx, label = idx_list[0]
        if 0 <= idx < length:
            by_idx[idx].append((expr, label))

    # start with all values 0..255 for every byte
    allowed = {i: set(range(256)) for i in range(length)}

    for idx, eqs in tqdm(by_idx.items(), desc="Solving per-byte", unit="byte"):
        cur = allowed[idx]
        for expr, label in eqs:
            sols = solve_expr_for_index(expr, label)
            if not sols:
                # this equation carries no useful info (or our eval doesn't support it)
                continue
            cur &= sols
            if not cur:
                # contradictory constraints (shouldn't happen in a valid challenge)
                break
        allowed[idx] = cur

    data = bytearray(length)
    bad = []

    for i in range(length):
        opts = allowed[i]
        if len(opts) == 1:
            data[i] = next(iter(opts))
        elif len(opts) == 0:
            data[i] = 0
            bad.append((i, "no-solution"))
        else:
            # multiple possibilities; pick one but record ambiguity
            data[i] = min(opts)
            bad.append((i, f"multi:{len(opts)}"))

    if bad:
        print(f"[!] {len(bad)} byte positions not uniquely determined.")
        print("    First few:", bad[:10])
    else:
        print("[+] All bytes uniquely solved.")

    return bytes(data)


# --------------------------------------------------------------------
# File type helpers + main
# --------------------------------------------------------------------

def detect_filetype(data: bytes) -> str:
    sigs = {
        b"\x89PNG\r\n": "png",
        b"\xFF\xD8\xFF": "jpg",
        b"PK\x03\x04":   "zip",
        b"GIF89a":       "gif",
        b"%PDF":         "pdf",
    }
    for magic, ext in sigs.items():
        if data.startswith(magic):
            return ext
    return "bin"


def printable(b: int) -> str:
    return chr(b) if 32 <= b < 127 else '.'


def main(argv=None):
    ap = argparse.ArgumentParser(
        description="Recover the PNG byte-by-byte from output.pyc using all constraints per index."
    )
    ap.add_argument("pyc", help="input compiled Python file (output.pyc)")
    ap.add_argument("-o", "--output", default="out.bin",
                    help="output base name (extension auto-detected)")
    ap.add_argument("--length", type=int, default=39944,
                    help="expected file length (default 39944)")
    ap.add_argument("--save-log", default=None,
                    help="optional filepath to save raw [COMPARE] lines")
    args = ap.parse_args(argv)

    print(f"[+] Running {args.pyc} under Tracked sandbox...")
    lines = run_tracked_pyc_collect(args.pyc, fake_len=args.length)

    if args.save_log:
        with open(args.save_log, "w", encoding="utf-8") as fp:
            fp.write("\n".join(lines))
        print(f"[+] Saved comparison log to {args.save_log}")

    exprs = []
    for line in lines:
        eq = extract_equation(line)
        if eq:
            exprs.append(eq)

    print(f"[+] Extracted {len(exprs)} comparison expressions.")
    data = reconstruct_bytes(exprs, length=args.length)

    ext = detect_filetype(data)
    base = args.output
    outfile = base if base.endswith(f".{ext}") else f"{os.path.splitext(base)[0]}.{ext}"
    with open(outfile, "wb") as f:
        f.write(data)

    print("\n✅ Done!")
    print("   Saved:", outfile)
    print("   Bytes:", len(data))
    print("   Hex head:", data[:16].hex())
    print("   ASCII head:", ''.join(printable(b) for b in data[:16]))
    print("   Detected type:", ext)


if __name__ == "__main__":
    raise SystemExit(main())

Conclusion

The “disthis” challenge was a great exercise in modern Python internals. It demonstrated that when static analysis tools haven’t updated to support versions like Python 3.13, you can always use the dynamic nature of the language itself. By building a execution harness, we were able to completely bypass the obfuscation without ever actually “reading” the code.

Upon solving, the script outputted out.png, which contained the flag:

infobahn{this_is_by_far_the_worst_obfuscator_ive_had_the_displeasure_of_writing_i_dont_even_know_how_the_code_ran_e891ac534881}

This post is licensed under CC BY 4.0 by the author.