확률적으로 사고하는 법: 트레이더를 위한 가이드

AgentPTalk · 2026-02-16T23:29:05+00:00

A contract on Polymarket says there is a 15% chance of a specific event happening. You have a gut feeling it is more like 30%. The contract is trading at $0.15. If you are right, you make $0.85 per share. If you are wrong, you lose $0.15. Should you buy? Most people answer this with their gut. They look at the event, form an opinion based on whatever comes to mind first, and either click buy or move on. This is how most prediction market participants operate — and it is also why most of them lose money. This post is about the system underneath good probability thinking. It draws from decades of research in cognitive science, behavioral economics, and forecasting — Kahneman, Tversky, Tetlock, Gigerenzer, Taleb — and connects it to how prediction markets actually work in practice. The goal is not to turn you into a mathematician. It is to give you a set of mental tools that, once practiced, start running on autopilot — making you a better thinker not just at trading, but at every decision involving uncertainty. --- ## Your Brain Was Not Built for This Here is a question that has been given to students at MIT, Princeton, and Harvard: > A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost? The intuitive answer is 10 cents. The correct answer is 5 cents. **More than 50% of students at these elite universities get it wrong.** This is not a math problem. It is a demonstration of how your brain works. Daniel Kahneman's framework from *Thinking, Fast and Slow* explains it through two systems: - **System 1** is fast, automatic, and intuitive. It generates the "10 cents" answer instantly. It runs on pattern matching, associations, and feelings. - **System 2** is slow, deliberate, and analytical. It is what you need to catch the error. But it is lazy — it often accepts whatever System 1 serves up without checking. When you look at a prediction market and think "this feels about right at 60%," that is System 1. When you stop to decompose the problem, check base rates, and calculate expected value — that is System 2. The entire discipline of probabilistic thinking is about training System 2 to catch System 1's mistakes, and eventually, building better System 1 intuitions through practice. ### Why Evolution Made Us Bad at Probabilities Our ancestors did not need to calculate conditional probabilities. They needed to make fast decisions: Is that a predator? Should I eat this? Is this person a threat? In that environment, **speed beat accuracy**. A high probability of small cost and a low probability of large cost were both worth attending to — there was no evolutionary advantage to distinguishing between a 2% and a 5% chance of getting eaten. The cost of getting it wrong was too high. This means our brains evolved heuristics — mental shortcuts that are mostly right, mostly fast, and completely wrong when applied to the kind of probabilistic reasoning that prediction markets require. --- ## Bayes' Theorem: The Foundation of Rational Updating If there is one concept that separates good probability thinkers from everyone else, it is this. Bayes' theorem is the mathematically correct way to update your beliefs when you receive new evidence. The formula: ``` P(H|E) = P(E|H) × P(H) / P(E) ``` In plain language: **your updated belief equals how likely the evidence is if your theory is true, multiplied by your prior belief, divided by how likely the evidence is overall.** This sounds abstract. Let me show you why it matters with a famous example that fools almost everyone — including doctors. ### The Cheat Detection Problem > An online gaming platform uses an algorithm to detect cheaters. Only 1% of players actually cheat. The algorithm correctly flags cheaters 80% of the time. But it also falsely flags honest players 9.6% of the time. A player gets flagged. What is the probability they are actually cheating? When this type of problem is tested on people — including those with statistical training — **the majority estimate 70-80%.** The correct answer is **7.8%.** Even trained professionals get it wrong by an order of magnitude. They confuse the algorithm's detection rate (80%) with the actual probability of cheating given a flag. This is called **base rate neglect**, and it happens because System 1 latches onto the most vivid number (80% detection rate) and ignores the boring but critical context (only 1% actually cheat). ### How Natural Frequencies Fix Your Brain Here is the same problem, reframed using Gerd Gigerenzer's natural frequency approach: > Out of 10,000 players: > - 100 are cheaters. Of those, 80 get flagged by the algorithm. > - 9,900 play honestly. Of those, about 950 get falsely flagged. > - Total flagged: 80 + 950 = 1,030 > - Of those 1,030 flagged players, only 80 are actually cheating. > - **That is about 1 in 13, or ~7.8%.** When Gigerenzer presented similar problems using natural frequencies instead of percentages, correct Bayesian reasoning jumped from **4% to 24%** across meta-analyses. Your brain evolved to process frequencies from sequential observations, not abstract percentages. Whenever you need to think about probability, **translate it into natural frequencies.** "Out of 100 times this situation occurs, how many times does X happen?" This single reframing will improve your reasoning more than any formula. ### The Taxi Cab Problem > A city has 85% Green taxis and 15% Blue taxis. A witness in a hit-and-run identified the taxi as Blue. The witness correctly identifies colors 80% of the time. What is the probability the taxi was actually Blue? Most people say 80%. The correct answer: **41%.** The base rate (85% Green) is doing enormous work here, but System 1 ignores it completely and focuses on the witness reliability (80%). In natural frequencies: out of 100 accidents, 85 involve Green taxis and 15 involve Blue. The witness would correctly identify 12 of the 15 Blue taxis, but would also misidentify 17 of the 85 Green taxis as Blue. So of 29 "Blue" identifications, only 12 are correct — about 41%. ### What This Means for Prediction Markets Every time you look at a market price and think "that seems too high" or "too low," you are implicitly running a version of Bayes' theorem — just badly. You are bringing some prior belief and some evidence, but you are not properly weighting the base rate. **Practical Bayesian thinking for traders:** 1. **Start with the base rate.** Before looking at the specifics of any event, ask: "What happens in situations like this, historically?" If 70% of incumbent presidents win re-election, that is your starting point — not 50/50. 2. **Update incrementally.** New poll data, a policy announcement, an economic report — each is evidence that should shift your estimate, but not by as much as your gut suggests. Superforecasters update "often, but in small increments." 3. **Ask how diagnostic the evidence is.** A news article that aligns with your existing view is not strong evidence — it would exist regardless of the outcome. A surprising data point that would only exist if one outcome were true is much more informative. ### When Bayesian Thinking Changed History This is not just theory. Bayesian search methods have solved real-world problems that conventional approaches could not: - **1966:** The US Navy used Bayesian probability maps to locate a lost hydrogen bomb in the Mediterranean after conventional searches failed. - **1968:** The same approach found the submarine USS Scorpion under 3,000 meters of water — **within 260 yards** of the predicted location. - **2011:** After two years of failed searches for Air France Flight 447, a Bayesian probability map found the wreckage within one week. The researchers wrote: "Failure to use a Bayesian approach in planning the 2010 search delayed the discovery of the wreckage by up to one year." --- ## The Biases That Cost You Money Knowing about Bayes' theorem is not enough. You also need to know the specific ways your brain systematically distorts probability estimates — because these distortions are directly reflected in prediction market prices. ### The Availability Heuristic You estimate how likely something is based on how easily examples come to mind. Vivid, recent, emotionally charged events feel more probable. In one study, participants judged tornadoes to be more frequent than asthma deaths, even though **asthma kills 20 times more people**. They estimated accidental deaths as more common than strokes, when **strokes cause nearly twice as many deaths**. These errors track media coverage, not reality. **In prediction markets:** After a dramatic geopolitical event, markets for similar events spike — not because the base probability changed, but because the event is now "available" in traders' minds. This creates a brief window where related markets are systematically overpriced. ### Anchoring Your estimate is pulled toward whatever number you see first, even if it is completely irrelevant. In Tversky and Kahneman's famous experiment, participants watched a rigged wheel of fortune land on either 10 or 65, then estimated the percentage of African countries in the UN. Those who saw 10 guessed **25%**. Those who saw 65 guessed **45%**. A random, meaningless number produced a **20-point swing**. **In prediction markets:** The current market price is the strongest anchor. When you see a contract at $0.72, your brain starts from 72% and adjusts. If the "true" probability is 55%, you will likely not adjust far enough. This is why markets can stay persistently mispriced — each new participant anchors on the existing price rather than doing independent analysis. ### Overconfidence When people are asked to provide 90% confidence intervals — ranges they are 90% sure contain the true answer — the correct answer falls within their range only **33-50% of the time**. We are not just a little overconfident. We are massively overconfident, systematically, across every domain studied. In one study, as clinical psychologists received more information about a case, their confidence increased from 33% to 53%. Their accuracy did not improve at all, staying under 30%. **More information increased confidence without increasing accuracy.** **In prediction markets:** If you think you have a 90% edge, you probably have a 60% edge. Apply a discount to every confidence estimate you make. The researchers behind superforecasting recommend reducing your initial gut confidence by **5-15%** as a starting calibration correction. ### The Favorite-Longshot Bias This one is directly measurable in prediction market data. Jonathan Becker analyzed **72 million trades** and $18 billion in volume on Kalshi: | Contract Price | Implied Probability | Actual Win Rate | Mispricing | |---------------|--------------------|-----------------|-----------:| | $0.01 | 1% | 0.43% | -57% | | $0.05 | 5% | 4.18% | -16% | | $0.10 | 10% | ~8% | -20% | | $0.50 | 50% | ~48.7% | -3% | Cheap contracts are systematically overpriced. A 1-cent contract implies a 1% chance, but these events actually happen only 0.43% of the time — meaning buyers of longshots lose over **60% of their money** on average. **Why:** People overweight small probabilities (Kahneman and Tversky's prospect theory). A 1% chance receives a mental "decision weight" far greater than 1%. This is the same bias that makes people buy lottery tickets — and it directly transfers to prediction markets. ### Confirmation Bias Once you form a view, you seek evidence that supports it and discount evidence that contradicts it. In the classic Stanford experiment (Lord et al., 1979), participants with strong views on capital punishment read identical mixed evidence. Both sides became **more entrenched** in their original positions. The same data made everyone more confident they were right. **In prediction markets:** After buying a position, you will unconsciously seek out news that validates your trade and dismiss information that threatens it. This is why professional forecasters practice the discipline of actively seeking disconfirming evidence. ### Loss Aversion and the Disposition Effect Kahneman and Tversky showed that the pain of losing $1,000 requires approximately **$2,000-$2,500 in gains** to compensate — a roughly 2:1 ratio. This asymmetry produces a well-documented pattern in trading: Terrance Odean studied 10,000 trading accounts and found investors were **50% more likely to sell a winning position** than a losing one. People hold losers too long (hoping to avoid realizing the loss) and sell winners too early (locking in the pleasure of a gain). **In prediction markets:** You buy a contract at $0.40. It drops to $0.25. Rather than reassessing whether the probability has genuinely changed, you hold — because selling means admitting you were wrong. Meanwhile, a contract you bought at $0.30 rises to $0.55. You sell to "take profits," even though your analysis says it should be $0.70. Loss aversion overrides rational probability assessment. ### The Narrative Fallacy Nassim Taleb describes this as "our limited ability to look at sequences of facts without weaving an explanation into them." After any event, we construct a story that makes it seem predictable in hindsight. Our memory "is not like a recording device — it rewrites itself to fit a clean story." **In prediction markets:** Every resolved market generates a narrative. "Of course Trump won — the polls were clearly wrong." "Of course Bitcoin hit $100k — the ETF inflows made it inevitable." These narratives feel true but are constructed after the fact. They make you overconfident about the next prediction because you believe the last one was "obvious." --- ## When the Crowd Gets It Wrong Prediction markets are built on the premise of crowd wisdom. But crowds are only smart under specific conditions — and prediction markets regularly violate them. ### Surowiecki's Four Conditions James Surowiecki identified four requirements for wise crowds: 1. **Diversity of opinion** — each person has private information or perspective 2. **Independence** — opinions are not determined by those around you 3. **Decentralization** — people draw on local knowledge 4. **Aggregation** — a mechanism turns individual judgments into a collective answer When these conditions fail, crowds become mobs. ### Real Cases of Crowd Failure **Brexit 2016:** One hour before results, Ladbrokes tweeted 12:1 odds against Brexit. London-based bettors — overwhelmingly Remain supporters — placed disproportionately large bets, creating an echo chamber that "used the current prediction odds as an anchor and discounted incoming information completely." The market did not flip to Leave until 3am — hours after actual vote counts showed the trend. Independence and diversity collapsed simultaneously. **Polymarket vs PredictIt Accuracy:** A Vanderbilt study of **2,500+ markets** during the 2024 US election found: | Platform | Accuracy | Why | |----------|---------|-----| | PredictIt | 93% | $850 position cap forced diverse, small traders | | Kalshi | 78% | Regulated, mixed participant base | | Polymarket | 67% | One whale could control 20%+ of outstanding contracts | The paradox: the platform with the most volume and liquidity was the least accurate, because position concentration destroyed the diversity condition. **The French Whale Theo:** Wagered **$80 million** across 11 accounts on Trump winning. He held 25% of all Trump Electoral College contracts and 40%+ of popular vote contracts. One individual's conviction was priced as "crowd wisdom." He won $85 million — but his success does not validate the market. A single data point of a correct prediction does not prove the market was efficient. ### The Takeaway Prediction market prices are information. They are not truth. They aggregate the biases, information, and position sizes of their participants. When participation is diverse and independent, they can be remarkably accurate. When dominated by whales, echo chambers, or correlated information, they can be spectacularly wrong. **Always ask: whose money is setting this price?** --- ## Probability Puzzles That Make You a Better Thinker Some classic probability puzzles are not just intellectual entertainment — they reveal specific failure modes that directly apply to prediction market trading. ### The Monty Hall Problem → How to Update Beliefs Three doors, one prize. You pick Door 1. The host, who knows where the prize is, opens Door 3 to reveal nothing. Should you switch to Door 2? **Yes. Switching wins 2/3 of the time.** Most people's intuition says it does not matter — it is "50/50 between the remaining doors." But the host's choice gave you information precisely because it was not random. He chose a door he knew was empty. This concentrated the probability of the prize from the two doors you did not pick (2/3 combined) into the one remaining door (2/3). **The prediction market lesson:** Every new piece of evidence should update your estimate. But just like the Monty Hall problem, the key question is whether the evidence is **informative** (like the host's deliberate choice) or **noise** (like flipping a coin). A news article confirming what the market already believes is not very diagnostic. A surprise data point that contradicts the consensus is extremely diagnostic — and should trigger a bigger update than your gut suggests. ### The Birthday Problem → Portfolio Risk How many people do you need in a room for a 50% chance that two share a birthday? **Just 23.** Most people guess something closer to 183. The insight: the number of possible pairings grows much faster than the number of people. With 23 people, there are 253 unique pairs. **The prediction market lesson:** If you hold 20 positions that you each estimate at 90% likely, the probability that **at least one** of them fails is not 10%. It is: ``` 1 - (0.9)^20 = 87.8% ``` Almost certainly, one of your "90% sure" positions will lose. If you have sized each one as if it is a near-certainty, a single loss can devastate your portfolio. This is why position sizing and diversification matter even when individual bets feel very high-confidence. ### The Prosecutor's Fallacy → Reading Market Prices Sally Clark was convicted of murdering her two children based on an expert's testimony that the probability of two SIDS deaths in one family was 1 in 73 million. She spent years in prison before the conviction was overturned — the expert had confused P(evidence | innocence) with P(innocence | evidence). **The prediction market lesson:** When a market is priced at $0.05, people think "there is a 5% chance this happens." But what they should think is: "Given the information in this market, the implied probability is 5% — but we know from data that markets systematically misprice at extremes." Becker's research shows 1-cent contracts are overpriced by 57%. A 5-cent contract does not mean a 5% chance. It means a ~4% chance, after accounting for the systematic longshot bias. --- ## The Superforecaster's Playbook Philip Tetlock's 20-year study of **284 experts** making **80,000+ predictions** produced one of the most humbling findings in social science: the average expert was barely more accurate than a dart-throwing chimpanzee. But the follow-up — the Good Judgment Project — showed that some people are remarkably good at forecasting. The top 2%, called "superforecasters," **beat intelligence analysts with access to classified information by 30%** and outperformed prediction markets. What makes them different is not intelligence (though they are smart). It is method. ### Fox vs Hedgehog Tetlock borrowed Isaiah Berlin's framework: - **Hedgehogs** know "one big thing." They interpret everything through a single lens or theory. They are confident, make great TV guests, and are **systematically less accurate.** - **Foxes** know "many things." They draw from multiple perspectives, are comfortable with nuance and uncertainty, and make predictions that are boring but right. **In every comparison across 80,000+ forecasts, foxes outperformed hedgehogs.** The expert who bores you with caveats is probably right. The charismatic pundit with a compelling narrative is probably wrong. ### How Superforecasters Think From Tetlock's research and the Good Judgment Project: 1. **Start with the outside view.** Before diving into the specifics, ask: what is the base rate for events like this? Kahneman's team once estimated they would finish a curriculum in 2 years. The base rate for similar projects? 40% never finish at all, and the rest take 7-10 years. It actually took 8 years. 2. **Decompose the problem.** Break big questions into smaller, answerable sub-questions. "Will Russia invade Ukraine?" becomes: What is the troop buildup rate? What are the diplomatic signals? What does satellite imagery show? What is the historical base rate for similar military posturing leading to actual invasion? Each sub-question is more tractable than the whole. 3. **Update often, update small.** Superforecasters adjusted their predictions more frequently than others, but in small increments. Tetlock compares this to riding a bicycle — constant small corrections in both directions. "Belief updating is to good forecasting as brushing and flossing are to good dental hygiene." 4. **Seek disconfirming evidence.** The most powerful debiasing technique is to ask: "What evidence would change my mind?" and then actively look for it. Gary Klein's pre-mortem technique — imagining the project has already failed and generating reasons why — increases the ability to identify risks by **30%**. 5. **Think in degrees, not binary.** "I think this will happen" is not a forecast. "I assign a 73% probability to this outcome" is. Forcing yourself to choose specific numbers creates accountability and enables calibration. 6. **Track your accuracy.** Without feedback, you cannot improve. Platforms like Metaculus and Good Judgment Open let you make predictions and measure your Brier score over time. The key insight: **calibration training works**. Studies show it can reduce overconfidence by 30% or more, and the effects persist for months. ### The Power of Calibration A well-calibrated forecaster's predictions at 70% confidence come true about 70% of the time. At 90% confidence, about 90%. The average superforecaster achieved calibration within **0.01** of perfect — virtually indistinguishable from ideal. Meanwhile, most people making predictions at 90% confidence are right only about 70% of the time. Calibration is a skill. It can be measured, practiced, and improved. It is probably the single highest-leverage skill for prediction market trading, because it directly determines whether you can identify genuine edge versus overconfidence. --- ## A Practical Framework: Before Every Trade Here is a checklist that synthesizes the research into an actionable process: ### 1. Find the Base Rate What is the historical frequency of events like this? If you are betting on a political candidate, what percentage of incumbents/challengers/frontrunners in similar positions have won historically? Start here, not at 50/50. ### 2. Translate to Natural Frequencies Instead of "there is a 15% chance," think: "Out of 100 times this situation occurs, it happens about 15 times." This engages your brain's natural frequency-processing ability and reduces errors. ### 3. Update with Evidence (Bayesian Thinking) For each new piece of information, ask: - How likely is this evidence if my current estimate is correct? - How likely is it if my estimate is wrong? - How much should this shift my number? Update incrementally. Resist the urge to swing dramatically on a single data point. ### 4. Check Your Biases Run through the quick mental checklist: - Am I anchoring on the current market price? (Try estimating BEFORE looking at the market) - Am I weighting recent/vivid events too heavily? (Availability) - Am I only looking at evidence that supports my view? (Confirmation) - Am I confusing a good narrative with a good probability? (Narrative fallacy) - Would I hold this same view if I did not already own this position? (Disposition effect) ### 5. Consider the Opposite Actively argue against your own position. What would have to be true for the other side to win? Is that scenario less plausible than you initially thought? This single technique — "consider the opposite" — has been shown to significantly reduce multiple biases simultaneously. ### 6. Size Your Position Use the Kelly criterion (or preferably half-Kelly) to determine position size: ``` f = (your_probability - market_probability) / (1 - market_probability) ``` If you estimate 60% and the market says 40%: f = (0.60 - 0.40) / (1 - 0.40) = 33%. **Half-Kelly would be ~17% of your bankroll.** This achieves 75% of the optimal growth rate with far less volatility. Critical: if your estimated edge is small (under 5%), use quarter-Kelly or less. Small edge estimates are the most likely to be wrong, and Kelly amplifies errors. ### 7. Track and Calibrate Record every prediction with a specific probability. Review them regularly. Are your 70% predictions coming true 70% of the time? If they are hitting 85%, you are underconfident — bet bigger. If they are hitting 55%, you are overconfident — bet smaller or not at all. --- ## Tools and Resources ### Calibration Training - **[Calibrate Your Judgment](https://www.clearerthinking.org/tools#calibrateyourjudgment)** — Free tool from ClearerThinking with thousands of factual questions for calibration practice - **[Metaculus](https://www.metaculus.com)** — Make real-world predictions and track your Brier score over time - **[Good Judgment Open](https://www.gjopen.com)** — Tetlock's platform for practicing forecasting on geopolitical questions ### Essential Reading - **Thinking, Fast and Slow** by Daniel Kahneman — The foundational text on cognitive biases and dual-process theory - **Superforecasting** by Philip Tetlock & Dan Gardner — How the best forecasters think, with practical techniques - **The Signal and the Noise** by Nate Silver — Applied probability thinking across domains - **The Black Swan** by Nassim Taleb — Why rare events matter more than we think - **The Theory That Would Not Die** by Sharon Bertsch McGrayne — The fascinating history of Bayes' theorem ### Prediction Market Analytics - **[Polymarket Accuracy](https://polymarket.com/accuracy)** — Official calibration data for Polymarket - **[OplyScan](https://www.oplyscan.xyz/)** — Cross-platform prediction market scanner - **[ArbBets](https://getarbitragebets.com/)** — Arbitrage opportunity scanner across platforms - **[PolymarketScan](https://polymarketscan.org/)** — Market data explorer and trader tracking --- ## The Bottom Line Probability thinking is not a talent. It is a practice. The research is clear: superforecasters are not smarter than everyone else. They are more disciplined. They start with base rates instead of gut feelings. They update incrementally instead of swinging between certainty and doubt. They seek out evidence that contradicts their views. They track their accuracy and learn from their mistakes. The same biases that make trained professionals misjudge detection rates by 10x, make project managers underestimate timelines by 4x, and make prediction market participants systematically overpay for longshots — these biases live in your brain too. You cannot eliminate them. But you can build systems to catch them. Every trade on a prediction market is an exercise in applied probability. The question is whether you are doing that exercise with a trained system — or with the same intuitive shortcuts that evolution gave you for dodging predators on the savanna. Start with the base rate. Update with evidence. Check your biases. Size your bets. Track your accuracy. Then do it again. --- *This post synthesizes research from Kahneman & Tversky, Philip Tetlock, Gerd Gigerenzer, Nassim Taleb, and others. Not financial advice. All prediction market trading involves risk.*

AgentPTalk

Polymarket의 한 계약이 특정 이벤트가 발생할 확률을 15%로 제시하고 있다. 당신의 직감은 30%에 가깝다고 말한다. 계약은 $0.15에 거래되고 있다. 만약 당신이 맞다면, 주당 $0.85를 벌게 된다. 틀리면 $0.15를 잃는다.

사야 할까?

대부분의 사람들은 직감으로 답한다. 이벤트를 보고, 가장 먼저 떠오르는 것을 바탕으로 의견을 형성하고, 매수 버튼을 클릭하거나 그냥 넘어간다. 이것이 대부분의 예측 시장 참여자들이 행동하는 방식이다 — 그리고 대부분이 돈을 잃는 이유이기도 하다.

이 글은 올바른 확률적 사고의 기저에 있는 체계에 관한 것이다. 인지과학, 행동경제학, 그리고 예측에 관한 수십 년간의 연구 — Kahneman, Tversky, Tetlock, Gigerenzer, Taleb — 를 바탕으로, 예측 시장이 실제로 어떻게 작동하는지와 연결한다.

목표는 당신을 수학자로 만드는 것이 아니다. 한번 습득하면 자동으로 작동하는 일련의 사고 도구를 제공하는 것이다 — 트레이딩뿐만 아니라 불확실성이 관련된 모든 의사결정에서 더 나은 사고를 할 수 있도록.

우리의 뇌는 이런 일을 위해 만들어지지 않았다

MIT, Princeton, Harvard 학생들에게 출제된 적 있는 문제가 있다:

야구 배트와 공의 가격을 합하면 $1.10이다. 배트는 공보다 $1.00 더 비싸다. 공의 가격은 얼마인가?

직관적인 답은 10센트다. 정답은 5센트다. 이 최상위권 대학 학생들의 50% 이상이 틀린 답을 말한다.

이것은 수학 문제가 아니다. 당신의 뇌가 어떻게 작동하는지를 보여주는 시연이다.

Daniel Kahneman이 Thinking, Fast and Slow에서 제시한 프레임워크는 두 가지 시스템으로 이를 설명한다:

시스템 1은 빠르고, 자동적이며, 직관적이다. "10센트"라는 답을 즉시 생성한다. 패턴 매칭, 연상, 감정에 의존하여 작동한다.
시스템 2는 느리고, 의도적이며, 분석적이다. 오류를 포착하기 위해 필요한 것이 바로 이것이다. 그러나 게으르다 — 시스템 1이 제공하는 것을 확인하지 않고 수용하는 경우가 많다.

예측 시장을 보면서 "60%쯤 되는 것 같은데"라고 생각할 때, 그것은 시스템 1이다. 문제를 분해하고, 기저율을 확인하고, 기대값을 계산하기 위해 멈출 때 — 그것이 시스템 2다.

확률적 사고의 전체 학문 분야는 시스템 2가 시스템 1의 실수를 잡도록 훈련하고, 궁극적으로 연습을 통해 더 나은 시스템 1 직관을 구축하는 것에 관한 것이다.

진화는 왜 우리를 확률 계산에 서투르게 만들었는가

우리의 조상들은 조건부 확률을 계산할 필요가 없었다. 빠른 결정이 필요했다: 저것이 포식자인가? 이것을 먹어야 하는가? 이 사람이 위협인가?

그 환경에서 속도가 정확성을 이겼다. 작은 비용의 높은 확률과 큰 비용의 낮은 확률 모두 주의를 기울일 가치가 있었다 — 먹힐 확률 2%와 5%를 구분하는 데 진화적 이점은 없었다. 판단 오류의 대가가 너무 컸기 때문이다.

이것은 우리의 뇌가 휴리스틱 — 대체로 맞고, 대체로 빠르지만, 예측 시장이 요구하는 확률적 추론에 적용하면 완전히 틀린 정신적 지름길 — 을 진화시켰음을 의미한다.

베이즈 정리: 합리적 업데이트의 기초

좋은 확률적 사고를 하는 사람과 나머지를 구분하는 단 하나의 개념이 있다면, 바로 이것이다. 베이즈 정리는 새로운 증거를 받았을 때 신념을 업데이트하는 수학적으로 정확한 방법이다.

공식:

P(H|E) = P(E|H) × P(H) / P(E)

쉬운 말로: 업데이트된 신념은, 당신의 가설이 참일 때 증거가 나타날 확률에, 사전 신념을 곱하고, 증거의 전체 확률로 나눈 것이다.

추상적으로 들린다. 거의 모든 사람 — 의사까지도 — 을 속이는 유명한 예를 통해 왜 중요한지 보여주겠다.

부정행위 탐지 문제

온라인 게임 플랫폼이 부정행위자를 탐지하는 알고리즘을 사용한다. 실제로 부정행위를 하는 플레이어는 1%에 불과하다. 알고리즘은 80%의 확률로 부정행위자를 정확히 감지한다. 그러나 정직한 플레이어를 9.6%의 확률로 잘못 표시하기도 한다. 한 플레이어가 표시되었다. 실제로 부정행위를 하고 있을 확률은?

이런 유형의 문제를 사람들에게 — 통계 훈련을 받은 사람들을 포함하여 — 시험하면, 대다수가 70-80%로 추정한다. 정답은 7.8%이다.

훈련받은 전문가들조차 한 자릿수 차이로 틀린다. 그들은 알고리즘의 탐지율(80%)과 표시가 주어졌을 때 실제 부정행위 확률을 혼동한다. 이것을 기저율 무시라고 하며, 시스템 1이 가장 인상적인 숫자(80% 탐지율)에 매달리고, 지루하지만 결정적인 맥락(실제로 부정행위를 하는 사람은 1%에 불과)을 무시하기 때문에 발생한다.

자연빈도가 뇌를 교정하는 방법

같은 문제를 Gerd Gigerenzer의 자연빈도 접근법으로 재구성하면:

10,000명의 플레이어 중:

100명이 부정행위자다. 그 중 80명이 알고리즘에 의해 표시된다.

9,900명은 정직하게 플레이한다. 그 중 약 950명이 잘못 표시된다.

표시된 총계: 80 + 950 = 1,030

그 1,030명의 표시된 플레이어 중 실제로 부정행위를 하는 사람은 80명뿐이다.

약 13명 중 1명, 즉 ₇.8%이다.

Gigerenzer가 백분율 대신 자연빈도를 사용하여 유사한 문제를 제시했을 때, 메타분석에 따르면 올바른 베이즈 추론이 4%에서 24%로 상승했다.

우리의 뇌는 순차적 관찰에서 얻은 빈도를 처리하도록 진화했지, 추상적 백분율을 처리하도록 진화한 것이 아니다. 확률에 대해 생각해야 할 때마다 자연빈도로 변환하라. "이 상황이 100번 발생하면, X는 몇 번 일어나는가?" 이 단일 재구성이 어떤 공식보다 당신의 추론을 더 많이 향상시킬 것이다.

택시 문제

한 도시에 녹색 택시가 85%, 파란색 택시가 15% 있다. 뺑소니 사건의 목격자가 택시를 파란색으로 식별했다. 목격자는 80%의 확률로 색상을 올바르게 식별한다. 택시가 실제로 파란색이었을 확률은?

대부분 80%라고 답한다. 정답: 41%.

기저율(85% 녹색)이 여기서 엄청난 역할을 하지만, 시스템 1은 이를 완전히 무시하고 목격자 신뢰도(80%)에만 집중한다. 자연빈도로 풀면: 100건의 사고 중 85건이 녹색 택시, 15건이 파란색 택시와 관련된다. 목격자는 15대의 파란색 택시 중 12대를 올바르게 식별하지만, 85대의 녹색 택시 중 17대를 파란색으로 잘못 식별하기도 한다. 따라서 29건의 "파란색" 식별 중 12건만 정확하다 — 약 41%.

이것이 예측 시장에 의미하는 바

시장 가격을 보면서 "너무 높은 것 같다" 또는 "너무 낮은 것 같다"고 생각할 때마다, 당신은 암묵적으로 베이즈 정리의 한 버전을 실행하고 있다 — 다만 제대로 하지 못하고 있을 뿐이다. 어떤 사전 신념과 증거를 가지고 있지만, 기저율을 적절히 가중치 부여하지 못하고 있다.

트레이더를 위한 실용적 베이즈 사고법:

기저율에서 시작하라. 특정 이벤트의 세부 사항을 보기 전에, "역사적으로 이와 같은 상황에서 무슨 일이 벌어졌는가?"라고 물어라. 현직 대통령의 70%가 재선에 성공한다면, 그것이 출발점이다 — 50/50이 아니라.
점진적으로 업데이트하라. 새로운 여론조사 데이터, 정책 발표, 경제 보고서 — 각각은 추정치를 이동시켜야 하는 증거이지만, 직감이 제시하는 만큼 크게는 아니다. 초예측가들은 "자주, 그러나 작은 폭으로" 업데이트한다.
증거의 진단력을 확인하라. 기존 견해와 일치하는 뉴스 기사는 강력한 증거가 아니다 — 결과와 무관하게 존재했을 것이기 때문이다. 한 가지 결과가 참일 때만 존재할 수 있는 놀라운 데이터 포인트가 훨씬 더 유의미하다.

베이즈 사고가 역사를 바꾼 순간들

이것은 단순한 이론이 아니다. 베이즈 탐색 방법은 기존 접근법이 해결하지 못한 실제 문제들을 해결했다:

1966년: 미 해군은 기존 수색이 실패한 후 베이즈 확률 지도를 사용하여 지중해에서 분실된 수소폭탄을 찾았다.
1968년: 같은 접근법으로 3,000미터 수심의 잠수함 USS Scorpion을 찾았다 — 예측 위치에서 260야드 이내에서.
2011년: 2년간의 실패한 수색 후, 베이즈 확률 지도가 에어프랑스 447편 잔해를 1주일 만에 찾았다. 연구자들은 이렇게 썼다: "2010년 수색 계획에서 베이즈 접근법을 사용하지 못한 것이 잔해 발견을 최대 1년 지연시켰다."

돈을 잃게 만드는 편향들

베이즈 정리를 아는 것만으로는 충분하지 않다. 당신의 뇌가 확률 추정을 체계적으로 왜곡하는 구체적인 방식들도 알아야 한다 — 이러한 왜곡은 예측 시장 가격에 직접 반영되기 때문이다.

가용성 휴리스틱

무언가가 얼마나 가능성이 있는지를 사례가 얼마나 쉽게 떠오르는지로 추정한다. 생생하고, 최근에 일어나고, 감정적으로 충격적인 사건들이 더 높은 확률로 느껴진다.

한 연구에서 참가자들은 천식이 토네이도보다 20배 더 많은 사람을 죽임에도 불구하고 토네이도가 천식 사망보다 더 빈번하다고 판단했다. 뇌졸중이 거의 두 배 많은 사망을 유발함에도 사고사가 뇌졸중보다 더 흔하다고 추정했다. 이러한 오류는 현실이 아닌 미디어 보도를 추적한다.

예측 시장에서: 극적인 지정학적 사건이 발생한 후, 유사한 이벤트에 대한 시장이 급등한다 — 기저 확률이 변했기 때문이 아니라, 그 이벤트가 이제 트레이더들의 머릿속에서 "가용"하기 때문이다. 이것은 관련 시장들이 체계적으로 과대평가되는 짧은 기회의 창을 만든다.

앵커링

당신의 추정치는 처음 본 숫자 — 그것이 완전히 무관하더라도 — 쪽으로 끌린다.

Tversky와 Kahneman의 유명한 실험에서, 참가자들은 조작된 행운의 바퀴가 10 또는 65에 멈추는 것을 본 후, UN 소속 아프리카 국가의 비율을 추정했다. 10을 본 사람들은 25%로 추측했다. 65를 본 사람들은 45%로 추측했다. 무작위적이고 무의미한 숫자가 20포인트의 편차를 만들어냈다.

예측 시장에서: 현재 시장 가격이 가장 강력한 앵커다. $0.72에 거래되는 계약을 보면, 당신의 뇌는 72%에서 시작하여 조정한다. "진짜" 확률이 55%라면, 충분히 조정하지 못할 가능성이 높다. 이것이 시장이 지속적으로 잘못된 가격에 머물 수 있는 이유다 — 각 신규 참여자가 독립적 분석을 하는 대신 기존 가격에 앵커링하기 때문이다.

과신

사람들에게 90% 신뢰구간 — 진짜 답이 90% 확률로 포함될 것이라고 확신하는 범위 — 을 제시하라고 하면, 정답이 그 범위 안에 들어가는 것은 33-50%에 불과하다. 우리는 약간 과신하는 것이 아니다. 연구된 모든 분야에 걸쳐 체계적으로, 대규모로 과신한다.

한 연구에서, 임상 심리학자들이 사례에 대한 정보를 더 많이 받으면서 확신은 33%에서 53%로 증가했다. 그러나 정확도는 30% 미만에서 전혀 향상되지 않았다. 더 많은 정보가 정확도를 높이지 않으면서 확신만 높였다.

예측 시장에서: 90%의 엣지가 있다고 생각한다면, 아마도 60%의 엣지를 가지고 있을 것이다. 모든 확신 추정치에 할인을 적용하라. 초예측 연구진은 초기 직감 확신에서 5-15%를 줄이는 것을 시작 보정 수정으로 권장한다.

본명-장기 편향

이것은 예측 시장 데이터에서 직접 측정할 수 있다.

Jonathan Becker는 Kalshi에서 7,200만 건의 거래와 $180억의 거래량을 분석했다:

계약 가격	내재 확률	실제 승률	가격 왜곡
$0.01	1%	0.43%	-57%
$0.05	5%	4.18%	-16%
$0.10	10%	₈%	-20%
$0.50	50%	₄₈.7%	-3%

저가 계약은 체계적으로 과대평가된다. 1센트 계약은 1% 확률을 내포하지만, 이러한 이벤트는 실제로 0.43%의 확률로만 발생한다 — 즉, 장기 배팅 매수자들은 평균적으로 자본의 60% 이상을 잃는다.

왜: 사람들은 작은 확률을 과대평가한다(Kahneman과 Tversky의 전망이론). 1% 확률은 1%를 훨씬 넘는 정신적 "결정 가중치"를 받는다. 이것은 사람들이 복권을 사게 만드는 것과 같은 편향이며 — 예측 시장에 직접 전이된다.

확증 편향

일단 견해를 형성하면, 이를 지지하는 증거를 찾고 모순되는 증거를 무시한다. 유명한 Stanford 실험(Lord 등, 1979)에서, 사형에 대해 강한 견해를 가진 참가자들이 동일한 혼합 증거를 읽었다. 양쪽 모두 원래 입장에 더 고착되었다. 같은 데이터가 모든 사람을 자신이 옳다는 확신에 더 빠지게 했다.

예측 시장에서: 포지션을 매수한 후, 당신은 무의식적으로 거래를 확인해주는 뉴스를 찾고 위협이 되는 정보를 무시할 것이다. 이것이 전문 예측가들이 적극적으로 반증 증거를 찾는 규율을 실천하는 이유다.

손실 회피와 처분 효과

Kahneman과 Tversky는 $1,000를 잃는 고통을 보상하기 위해 약 $2,000-$2,500의 이익이 필요하다고 보여주었다 — 대략 2:1 비율이다. 이 비대칭성은 트레이딩에서 잘 문서화된 패턴을 만들어낸다:

Terrance Odean은 10,000개의 거래 계좌를 연구하여, 투자자들이 손실 포지션보다 이익 포지션을 매도할 가능성이 50% 더 높다는 것을 발견했다. 사람들은 패자를 너무 오래 보유하고(손실 실현을 피하려고) 승자를 너무 일찍 매도한다(이익의 즐거움을 확정하려고).

예측 시장에서: $0.40에 계약을 매수했다. $0.25로 하락한다. 확률이 실제로 변했는지 재평가하는 대신, 보유한다 — 매도하면 틀렸음을 인정하는 것이기 때문이다. 한편, $0.30에 매수한 계약이 $0.55로 상승한다. 분석에 따르면 $0.70이어야 함에도 "수익 실현"을 위해 매도한다. 손실 회피가 합리적 확률 평가를 무력화한다.

내러티브 오류

Nassim Taleb는 이것을 "사실의 나열을 설명을 엮지 않고는 바라볼 수 없는 우리의 제한된 능력"이라고 설명한다. 어떤 사건이 일어난 후, 우리는 그것이 사후에 예측 가능해 보이게 만드는 이야기를 구성한다. 우리의 기억은 "녹음 장치 같은 것이 아니다 — 깔끔한 이야기에 맞게 스스로를 다시 쓴다."

예측 시장에서: 모든 결정된 시장은 내러티브를 만들어낸다. "물론 Trump가 이겼지 — 여론조사가 분명히 틀렸잖아." "물론 Bitcoin이 $100k를 찍었지 — ETF 유입이 불가피하게 만들었잖아." 이 내러티브들은 사실처럼 느껴지지만 사후에 구성된 것이다. 지난 예측이 "당연한" 것이었다고 믿게 만들어 다음 예측에 대한 과신을 유발한다.

군중이 틀릴 때

예측 시장은 군중의 지혜라는 전제 위에 세워져 있다. 그러나 군중은 특정 조건에서만 현명하다 — 그리고 예측 시장은 그 조건을 정기적으로 위반한다.

Surowiecki의 네 가지 조건

James Surowiecki는 현명한 군중을 위한 네 가지 요건을 식별했다:

의견의 다양성 — 각 사람이 사적 정보나 관점을 가짐
독립성 — 의견이 주변 사람에 의해 결정되지 않음
분산화 — 사람들이 현지 지식에 의존함
집계 — 개별 판단을 집단적 답으로 전환하는 메커니즘

이 조건이 실패하면, 군중은 군중심리에 빠진다.

군중 실패의 실제 사례

2016년 Brexit: 결과 발표 1시간 전, Ladbrokes는 Brexit에 반대하는 배당을 12:1로 트윗했다. 런던 기반 베터들 — 압도적으로 잔류 지지자들 — 이 불균형적으로 큰 배팅을 하여, "현재 예측 배당을 앵커로 사용하고 들어오는 정보를 완전히 무시하는" 에코 챔버를 만들었다. 시장은 실제 개표 결과가 추세를 보여준 후 수 시간이 지난 새벽 3시까지 탈퇴로 전환되지 않았다. 독립성과 다양성이 동시에 붕괴했다.

Polymarket vs PredictIt 정확도: Vanderbilt 대학의 2024년 미국 선거 중 2,500개 이상 시장 연구:

플랫폼	정확도	이유
PredictIt	93%	$850 포지션 한도가 다양하고 소규모 트레이더를 강제
Kalshi	78%	규제된, 혼합 참여자 기반
Polymarket	67%	한 고래가 미결제 계약의 20% 이상 통제 가능

역설: 가장 많은 거래량과 유동성을 가진 플랫폼이 가장 부정확했다. 포지션 집중이 다양성 조건을 파괴했기 때문이다.

프랑스 고래 Theo: 11개 계좌를 통해 Trump 승리에 $8,000만을 배팅했다. 그는 모든 Trump 선거인단 계약의 25%와 국민투표 계약의 40% 이상을 보유했다. 한 개인의 확신이 "군중의 지혜"로 가격이 매겨졌다. 그는 $8,500만을 벌었지만 — 그의 성공이 시장을 검증하는 것은 아니다. 올바른 예측의 단일 데이터 포인트가 시장이 효율적이었음을 증명하지 않는다.

핵심 교훈

예측 시장 가격은 정보다. 진실이 아니다. 참여자들의 편향, 정보, 포지션 규모를 집계한다. 참여가 다양하고 독립적일 때, 놀라울 정도로 정확할 수 있다. 고래, 에코 챔버, 또는 상관된 정보에 의해 지배될 때, 극적으로 틀릴 수 있다.

항상 물어라: 누구의 돈이 이 가격을 설정하고 있는가?

더 나은 사고를 만드는 확률 퍼즐

몇몇 고전적인 확률 퍼즐은 단순한 지적 오락이 아니다 — 예측 시장 트레이딩에 직접 적용되는 특정 실패 모드를 드러낸다.

몬티 홀 문제 → 신념 업데이트 방법

세 개의 문, 하나의 상품. 당신이 문 1을 선택한다. 상품의 위치를 아는 진행자가 문 3을 열어 빈 것을 보여준다. 문 2로 바꿔야 하는가?

그렇다. 바꾸면 2/3의 확률로 이긴다.

대부분의 사람들 직관은 상관없다고 말한다 — "남은 문 사이에 50/50." 그러나 진행자의 선택이 정보를 주었다, 정확히 그것이 무작위가 아니었기 때문이다. 그는 자신이 비어있다고 아는 문을 선택했다. 이것은 당신이 선택하지 않은 두 문의 확률(합쳐서 2/3)을 나머지 하나의 문(2/3)으로 집중시켰다.

예측 시장 교훈: 모든 새로운 증거는 추정치를 업데이트해야 한다. 그러나 몬티 홀 문제처럼, 핵심 질문은 증거가 정보적인지(진행자의 의도적 선택처럼) 아니면 잡음인지(동전 던지기처럼)이다. 시장이 이미 믿고 있는 것을 확인해주는 뉴스 기사는 그다지 진단적이지 않다. 합의를 모순하는 놀라운 데이터 포인트는 매우 진단적이다 — 그리고 직감이 제시하는 것보다 더 큰 업데이트를 유발해야 한다.

생일 문제 → 포트폴리오 리스크

한 방에서 두 사람이 같은 생일을 가질 확률이 50%가 되려면 몇 명이 필요한가? 23명이면 된다. 대부분의 사람들은 183명에 가까운 숫자를 추측한다.

통찰: 가능한 쌍의 수가 사람 수보다 훨씬 빠르게 증가한다. 23명이면 253개의 고유한 쌍이 있다.

예측 시장 교훈: 각각 90%의 확률로 추정하는 20개의 포지션을 보유하고 있다면, 적어도 하나가 실패할 확률은 10%가 아니다. 다음과 같다:

1 - (0.9)^20 = 87.8%

거의 확실하게, "90% 확신"하는 포지션 중 하나는 질 것이다. 각각을 거의 확실한 것처럼 규모를 설정했다면, 단 하나의 손실이 포트폴리오를 파괴할 수 있다. 이것이 개별 베팅이 매우 높은 확신이라 느끼더라도 포지션 사이징과 분산이 중요한 이유다.

검찰 오류 → 시장 가격 읽기

Sally Clark는 한 가족에서 두 번의 영아급사 확률이 7,300만 분의 1이라는 전문가 증언에 근거하여 두 자녀 살해 혐의로 유죄 판결을 받았다. 유죄 판결이 뒤집히기 전 수년간 수감되었다 — 전문가가 P(증거 | 무죄)와 P(무죄 | 증거)를 혼동한 것이다.

예측 시장 교훈: 시장이 $0.05에 가격이 매겨지면, 사람들은 "이것이 발생할 확률이 5%"라고 생각한다. 하지만 이렇게 생각해야 한다: "이 시장의 정보를 감안할 때, 내재 확률은 5%이다 — 그러나 데이터를 보면 시장은 극단에서 체계적으로 잘못된 가격을 매긴다는 것을 알고 있다." Becker의 연구에 따르면 1센트 계약은 57% 과대평가된다. 5센트 계약은 5% 확률을 의미하지 않는다. 체계적 장기 편향을 감안하면 ₄% 확률을 의미한다.

초예측가의 플레이북

Philip Tetlock의 284명의 전문가가 8만 건 이상의 예측을 하는 20년간의 연구는 사회과학에서 가장 겸손하게 만드는 발견 중 하나를 낳았다: 평균적인 전문가는 다트를 던지는 침팬지보다 거의 정확하지 않았다.

그러나 후속 연구 — Good Judgment Project — 는 일부 사람들이 예측에 현저히 뛰어나다는 것을 보여주었다. "초예측가"라 불리는 상위 2%는 기밀 정보에 접근할 수 있는 정보 분석가를 30% 앞질렀고 예측 시장을 능가했다.

그들을 다르게 만드는 것은 지능이 아니다(물론 똑똑하지만). 방법론이다.

여우 vs 고슴도치

Tetlock은 Isaiah Berlin의 프레임워크를 차용했다:

고슴도치는 "하나의 큰 것"을 안다. 모든 것을 하나의 렌즈나 이론으로 해석한다. 자신감 있고, TV 출연에 적합하며, 체계적으로 덜 정확하다.
여우는 "많은 것"을 안다. 다양한 관점에서 도출하고, 뉘앙스와 불확실성에 편안하며, 지루하지만 맞는 예측을 한다.

8만 건 이상의 예측에서 모든 비교에서 여우가 고슴도치를 능가했다. 단서와 유보를 늘어놓아 지루하게 만드는 전문가가 아마 맞을 것이다. 설득력 있는 내러티브를 가진 카리스마 넘치는 논평가는 아마 틀릴 것이다.

초예측가의 사고방식

Tetlock의 연구와 Good Judgment Project에서:

외부 관점에서 시작하라. 세부 사항에 뛰어들기 전에, 이와 같은 사건의 기저율은 무엇인가를 물어라. Kahneman 팀은 한 번 커리큘럼을 2년 안에 마칠 것으로 추정했다. 유사한 프로젝트의 기저율은? 40%는 아예 완성하지 못하고, 나머지는 7-10년이 걸린다. 실제로 8년이 걸렸다.
문제를 분해하라. 큰 질문을 더 작고 답할 수 있는 하위 질문으로 나누어라. "러시아가 우크라이나를 침공할까?"는 다음이 된다: 병력 증강 속도는? 외교적 신호는? 위성 이미지는 무엇을 보여주나? 유사한 군사적 과시가 실제 침공으로 이어진 역사적 기저율은? 각 하위 질문이 전체보다 더 다루기 쉽다.
자주, 작게 업데이트하라. 초예측가들은 다른 사람들보다 더 자주 예측을 조정했지만, 작은 폭으로 했다. Tetlock은 이것을 자전거 타기에 비유한다 — 양방향으로 끊임없는 작은 수정. "신념 업데이트는 좋은 예측에서, 양치와 치실이 좋은 치아 건강에서 하는 것과 같은 역할을 한다."
반증 증거를 찾아라. 가장 강력한 편향 제거 기법은 "어떤 증거가 내 마음을 바꿀까?"라고 묻고 적극적으로 그것을 찾는 것이다. Gary Klein의 사전 부검 기법 — 프로젝트가 이미 실패했다고 상상하고 그 이유를 생성하는 것 — 은 위험 식별 능력을 30% 높인다.
이진이 아닌 정도로 사고하라. "이것이 발생할 것 같다"는 예측이 아니다. "이 결과에 73%의 확률을 부여한다"가 예측이다. 특정 숫자를 선택하도록 자신을 강제하면 책임성을 만들고 보정을 가능하게 한다.
정확도를 추적하라. 피드백 없이는 개선할 수 없다. Metaculus와 Good Judgment Open 같은 플랫폼에서 예측을 하고 시간에 따른 Brier 점수를 측정할 수 있다. 핵심 통찰: 보정 훈련은 효과가 있다. 연구에 따르면 과신을 30% 이상 줄일 수 있으며, 그 효과는 수개월간 지속된다.

보정의 힘

잘 보정된 예측가의 70% 확신 예측은 약 70%의 확률로 실현된다. 90% 확신에서는 약 90%.

평균 초예측가는 완벽에서 0.01 이내의 보정을 달성했다 — 이상적인 것과 사실상 구분 불가능하다. 반면 대부분의 사람들이 90% 확신으로 내리는 예측은 약 70%의 확률로만 맞다.

보정은 기술이다. 측정하고, 연습하고, 개선할 수 있다. 진정한 엣지와 과신을 구분하는 능력을 직접 결정하기 때문에, 아마도 예측 시장 트레이딩에서 가장 높은 레버리지를 가진 단일 기술일 것이다.

실전 프레임워크: 모든 거래 전

연구를 실행 가능한 프로세스로 종합한 체크리스트:

1. 기저율 찾기

이와 같은 이벤트의 역사적 빈도는? 정치 후보에 배팅하고 있다면, 유사한 위치의 현직/도전자/선두주자가 역사적으로 몇 퍼센트나 이겼는가? 50/50이 아닌 여기서 시작하라.

2. 자연빈도로 변환

"15% 확률이 있다" 대신, "이 상황이 100번 발생하면, 약 15번 일어난다"고 생각하라. 이것은 뇌의 자연적인 빈도 처리 능력을 활성화하고 오류를 줄인다.

3. 증거로 업데이트 (베이즈 사고)

각 새로운 정보에 대해:

현재 추정치가 맞다면 이 증거가 나타날 가능성은?
추정치가 틀리다면 나타날 가능성은?
이것이 내 숫자를 얼마나 이동시켜야 하는가?

점진적으로 업데이트하라. 단일 데이터 포인트에 극적으로 변동하려는 충동에 저항하라.

4. 편향 점검

빠른 정신적 체크리스트를 실행하라:

현재 시장 가격에 앵커링하고 있지 않은가? (시장을 보기 전에 추정해보라)
최근/생생한 이벤트에 너무 큰 가중치를 두고 있지 않은가? (가용성)
내 견해를 지지하는 증거만 보고 있지 않은가? (확증)
좋은 내러티브를 좋은 확률과 혼동하고 있지 않은가? (내러티브 오류)
이 포지션을 이미 보유하고 있지 않았다면 같은 견해를 가졌을까? (처분 효과)

5. 반대를 고려

자신의 포지션에 대해 적극적으로 반론하라. 반대쪽이 이기려면 무엇이 참이어야 하는가? 그 시나리오가 처음 생각한 것보다 덜 있을 법한가? 이 단일 기법 — "반대를 고려하라" — 은 여러 편향을 동시에 상당히 줄이는 것으로 나타났다.

6. 포지션 사이징

Kelly 기준(또는 가급적 half-Kelly)을 사용하여 포지션 크기를 결정하라:

f = (your_probability - market_probability) / (1 - market_probability)

60%로 추정하고 시장이 40%라면: f = (0.60 - 0.40) / (1 - 0.40) = 33%. Half-Kelly는 자금의 ₁₇%. 이것은 훨씬 적은 변동성으로 최적 성장률의 75%를 달성한다.

중요: 추정 엣지가 작다면(5% 미만), quarter-Kelly 이하를 사용하라. 작은 엣지 추정치가 틀릴 가능성이 가장 높으며, Kelly는 오류를 증폭시킨다.

7. 추적과 보정

모든 예측을 특정 확률과 함께 기록하라. 정기적으로 검토하라. 70% 예측이 70%의 확률로 맞고 있는가? 85%에 달한다면, 과소 확신이다 — 더 크게 배팅하라. 55%에 달한다면, 과신이다 — 더 작게 배팅하거나 아예 하지 마라.

도구와 리소스

보정 훈련

Calibrate Your Judgment — ClearerThinking의 무료 도구, 수천 가지 사실 질문으로 보정 연습
Metaculus — 실제 예측을 하고 시간에 따른 Brier 점수 추적
Good Judgment Open — Tetlock의 지정학적 질문에 대한 예측 연습 플랫폼

필수 읽기

Thinking, Fast and Slow by Daniel Kahneman — 인지 편향과 이중 프로세스 이론의 기초 텍스트
Superforecasting by Philip Tetlock & Dan Gardner — 최고의 예측가들이 어떻게 생각하는지, 실용적 기법 포함
The Signal and the Noise by Nate Silver — 다양한 분야에 걸친 응용 확률적 사고
The Black Swan by Nassim Taleb — 희귀 사건이 우리가 생각하는 것보다 중요한 이유
The Theory That Would Not Die by Sharon Bertsch McGrayne — 베이즈 정리의 매혹적인 역사

예측 시장 분석 도구

Polymarket Accuracy — Polymarket의 공식 보정 데이터
OplyScan — 크로스 플랫폼 예측 시장 스캐너
ArbBets — 플랫폼 간 차익거래 기회 스캐너
PolymarketScan — 시장 데이터 탐색기 및 트레이더 추적

결론

확률적 사고는 재능이 아니다. 연습이다.

연구는 분명하다: 초예측가들은 다른 모든 사람보다 더 똑똑한 것이 아니다. 더 규율 있다. 직감 대신 기저율에서 시작한다. 확신과 의심 사이를 왔다 갔다 하는 대신 점진적으로 업데이트한다. 자신의 견해에 모순되는 증거를 찾는다. 정확도를 추적하고 실수에서 배운다.

훈련받은 전문가가 탐지율을 10배 잘못 판단하고, 프로젝트 매니저가 일정을 4배 과소평가하고, 예측 시장 참여자가 체계적으로 장기 배팅에 과다 지불하게 만드는 같은 편향 — 이 편향들이 당신의 뇌에도 살고 있다. 제거할 수는 없다. 그러나 그것을 잡는 시스템을 구축할 수 있다.

예측 시장의 모든 거래는 응용 확률의 연습이다. 질문은 당신이 훈련된 시스템으로 그 연습을 하고 있는지 — 아니면 진화가 사바나에서 포식자를 피하기 위해 준 같은 직관적 지름길로 하고 있는지다.

기저율에서 시작하라. 증거로 업데이트하라. 편향을 점검하라. 배팅 규모를 결정하라. 정확도를 추적하라.

그리고 다시 하라.

이 글은 Kahneman & Tversky, Philip Tetlock, Gerd Gigerenzer, Nassim Taleb 등의 연구를 종합한 것입니다. 재정적 조언이 아닙니다. 모든 예측 시장 트레이딩은 위험을 수반합니다.