DartLab mascot
v0.9.23 — Sections-First Company Map

1 stock code →
full company story

Korean DART + US SEC EDGAR filings, pre-structured.
Datasets hosted on Hugging Face. One line of Python.

>>> dartlab.Company("005930").review()
Datasets · Hugging Face
eddmpython/dartlab-data
View →
dart/docsdart/financedart/reportdart/scanedgar/docsedgar/finance
What's Already Done

Already Structured, Ready to Use

Korean DART
Every filing pre-structured
US EDGAR
Same interface, different country
Account resolution
One revenue, not dozens of variations
Section canonicals
Every variant mapped to the same topic
Auto-structuring
Raw filings to DataFrame. No cleaning.
Ratios precomputed
Profitability, liquidity, valuation — ready on load
Live Visualization

See how listed companies
are connected — in 10 seconds

Every listed company · every industry · every supply-chain relationship on one screen. Click a bubble to drill into an industry, select a company for 5-year financials, supply-chain HHI, AI analysis, and deep-dive reports — all in a single card. Automatic detection of sudden changes this fiscal year.

dartlab visualizes disclosure and financial data. Not investment advice. Always cross-check with original DART filings and analyst reports.

dartlab.io/map
● Data 3h ago Semiconductor · 348 ⚠ 12 sudden changes

The Truth About Every Company Is Already Public

It's just unreadable. Annual reports are 200+ pages, quarterly filings pile up, and the data you need is buried across documents, formats, and years.

The reality today
Scroll through 200-page PDFs to find one section
Financial numbers in one tool, text in another
Comparing last year's risk factors? Manual copy-paste
DART tools don't work on EDGAR (and vice versa)
Feeding filings to AI? Hours of prompt engineering
Screening 2,700 companies? Build it yourself
With DartLab
Every section of every filing, already structured
Text + numbers + reports in one company object
5 years side-by-side with `diff()` — one line
Same `Company` interface for DART and EDGAR
Structured sections go straight into LLM context
Market-wide scans across all listed companies
Without DartLab
# 1. Download PDF from DART
pdf = download_report("005930", "2024")
# 2. Extract tables from PDF
tables = parse_pdf_tables(pdf)
# 3. Manual account mapping
mapped = manual_map(tables, my_schema)
# 4. Repeat for each quarter...
# 5. Repeat for each company...
# 6. Hope the formats match
With DartLab
import dartlab

c = dartlab.Company("005930")
c.BS       # standardized balance sheet
c.ratios   # 47 financial ratios
c.diff()   # 5 years of changes
Standardization

34,249 Account Mappings. Zero Manual Work.

Every company files with different XBRL account IDs. DartLab normalizes them through a 4-step pipeline so cross-company comparison works automatically.

1
Strip Prefixes
ifrs-full_, dart_, ifrs_, ifrs-smes_
ifrs-full_Revenue
--> Revenue
4 prefixes
2
ID Synonyms
English account ID normalization
NetIncome
--> ProfitLoss
59 rules
3
Name Synonyms
Korean account name unification
영업수익
--> 매출액
104 rules
4
Learned Map
Accumulated mapping table
ProfitLoss
--> net_income
34,249
Before — Raw XBRL
Samsungifrs-full_Revenue
SK Hynixdart_Revenue
LG EnergyRevenue
3 companies, 3 different account IDs for the same concept
After — Standardized
Samsungrevenue
SK Hynixrevenue
LG Energyrevenue
All resolve to revenue — cross-company comparison just works
~97%
mapping rate
3,143
standard accounts
34,249
XBRL mappings
Sections

From Vertical Filings to One Horizontal Map

The real product is not a parser list. It is the map.

Section Alignment
2023
companyOverview | business | risk
2024Q1
companyOverview | business
2024Q2
companyOverview | business
2024
companyOverview | business | risk
Same topic row, different period coverage. Missing periods stay empty instead of breaking the map.
Source-Aware Merge
docs
finance
report
companyOverview
BS
audit
risk text
IS
dividend
retrievalBlocks
ratios
employee
──→
merged on same spine
`show(...)` and `trace(...)` sit on top of the same company spine instead of inventing a second structure.
Number → Source
samsung.trace("revenue")
primarySource: finance
fallback: docs.sections
block: Q4 2024 · IS table
Every number reveals which filing, section, and block it came from. No black-box numbers.
Period → Change
samsung.diff("riskManagement")
2024 → 2023
+ added supply chain concentration
~ modified FX exposure paragraph
= unchanged audit opinion
Every narrative reveals what the company quietly rewrote between periods. Diff the text, not your eyeballs.
Architecture

One Company, Four Namespaces

The map stays the same. Only the source responsibility changes.

docs

Structural Spine

Owns section boundaries, narrative payloads, retrieval blocks, and the raw evidence layer that keeps the company map grounded in the filings.

`sections` as canonical spine
`retrievalBlocks` and `contextSlices`
Narrative and detail topics
Released
finance

Authoritative Numbers

Owns balance sheet, income statement, cash flow, ratios, and comparable time series. When numbers are available here, they should win.

Normalized statements
Quarterly comparable series
Source authority over numeric topics
Released
report

Structured Disclosure

Owns structured disclosure APIs such as audit, dividend, employees, executives, and similar periodic report payloads where docs should not be the first authority.

Periodic report APIs
Structured governance payloads
Source authority over report topics
Released
profile

Merged Company Layer

What the user sees by default: one company surface built on the same sections spine, ready for Python workflows now and AI interfaces next.

`c.sections`
`c.show(...)`
`c.trace(...)`
Current Default
How it works

From Stock Code to Company Map

No parser inventory first. Start from the company board.

01

Install

One line — uv add dartlab. No separate data preparation needed.

$ uv add dartlab
02

Create Company

Start from the public entrypoint. Missing data is auto-downloaded from HuggingFace.

c = dartlab.Company("005930")
03

sections = the company

One DataFrame with every topic and every period. show, diff, trace are just views on top.

c.sections # that's it
Code → Result

sections is the whole company

One DataFrame. Every topic. Every period. Here's what you actually get.

sections.py
from dartlab import Company

samsung = Company("005930")
board = samsung.sections

board # canonical company map
board.shape # (329, 106)
Output
shape: (329, 106)
chaptertopicblockType20242023
IcompanyOverviewtextFounded in 1969…Founded in 1969…
IIbusinessOverviewtextSemiconductors…Semiconductors…
IIbusinessOverviewtableRevenue (5×3)Revenue (5×3)
IIIriskManagementtextFX risk…FX risk…
VauditOpiniontextUnqualifiedUnqualified
AI Analysis

No Code Required. Just Ask.

DartLab structures the data and feeds it to the LLM. You ask questions in plain language — from your terminal or Python.

Terminal
$ dartlab ask "삼성전자 재무건전성 분석해줘"

Analyzing Samsung Electronics (005930)...
Loading financials, ratios, insights...

Samsung Electronics shows strong financial health:
• Debt ratio: 31.8% (sector avg: 45.2%)
• Current ratio: 258.6% — well above safety
• ROE recovery: 1.6% → 10.2% over 4 quarters
• Interest coverage: 22.1x
Python
import dartlab

# one-liner — streams to stdout
dartlab.ask("AAPL risk analysis")

# with options
dartlab.ask(
"삼성전자 배당 분석",
provider="openai",
include=["dividend", "IS"]
)

# 5 providers: ollama, openai,
# oauth-codex, codex, custom
AI is not read-only — engine assumptions are negotiable
you 삼성전자 DCF 계산해봐
ai c.analysis("valuation") implied WACC 18.2% looks high for this profile
you WACC 9%로 재계산
ai c.analysis("valuation", overrides={"wacc": 0.09})
revised fair value band: 91.2k ~ 115.4k KRW / share
The AI is not a read-only summarizer. It can override engine assumptions — WACC, growth rate, peer group, discount period — and recompute inside the same session. You stay in the driver's seat.

The 2-tier architecture feeds structured company data to any LLM. Basic analysis works with every provider. Tool-calling providers go deeper.

Real Data

Samsung Electronics — Actual Output

What you get from Company("005930") out of the box

python
The entire company map — 329 topics horizontalized across periods
>>> samsung.sections
chaptertopicblockType202420232022
IcompanyOverviewtextFounded in 1969…Founded in 1969…Founded in 1969…
IIbusinessOverviewtextSemiconductors, display…Semiconductors, display…Semiconductors, display…
IIbusinessOverviewtableRevenue mix (5×3)Revenue mix (5×3)Revenue mix (5×3)
IIIriskManagementtextFX risk exposure…FX risk exposure…
VauditOpiniontextUnqualifiedUnqualifiedUnqualified
shape: (329, 106) — 329 topics × 106 periods
Module Catalog

42 Modules, One Structure

All modules sit on the same sections spine. No separate schemas.

Narrative structure, section boundaries, retrieval blocks

1
sections
topic × period horizontalization — the company map
2
retrievalBlocks
RAG-ready text blocks
3
contextSlices
Evidence layer slices
4
companyOverview
Company overview
5
businessOverview
Business description
6
riskManagement
Risk management
7
auditOpinion
Audit opinion
8
segments
Segment information
9
salesOrder
Sales performance
10
notes
K-IFRS notes wrapper
42 modules on one sections spine
Coverage

DART + EDGAR — Same Interface

Korean DART and US SEC EDGAR through one Company interface

Same code, different markets
Korea (DART)
c = Company("005930")
c.sections
c.show("businessOverview")
c.BS
c.ratios
c.diff("businessOverview")
c.insights.grades()
US (EDGAR)
c = Company("AAPL")
c.sections
c.show("business")
c.BS
c.ratios
c.diff("10-K::item7Mdna")
c.insights.grades()
D
DART
Korean Electronic Disclosure
Stable
Sections mapping99.95%
Finance mapping97.07%
Verified companies283
Feature coverage14/14
E
EDGAR
SEC Filing (10-K, 10-Q)
Beta
Sections mapping100.00%
Verified companies974
10-K / 10-Q pairs6
Feature coverage10/14
FeatureDARTEDGAR
sections horizontalization
show(topic)
trace(topic)
diff(topic)
BS · IS · CF normalization
ratios time series
timeseries
report API (28 types)
insights (7-area grading)
sector classification
market ranking
AI company analysis
Excel export
Desktop GUI

Company("005930") for DART · Company("AAPL") for EDGAR — same interface, same methods

Performance

Fast Because It's Simple

Polars + Parquet + one structure = no unnecessary conversion

Response Time (Samsung Electronics)
Company creation ~2s
Data load + sections build
sections query <100ms
329 topics × 106 periods instant
show(topic) <50ms
Single topic block extraction
BS / IS / CF <100ms
Normalized financial statements
ratios series <200ms
TTM-based trailing calculation
diff(topic) <300ms
Period-over-period text comparison
Tech Stack
Runtime Polars (not Pandas)
Data format Parquet (columnar)
Auto download HuggingFace Datasets
Incremental mtime-based delta sync
Cache Company object reuse
Python 3.12+ required
Why It's Fast

One structure. All queries run on sections — no data conversion needed.

Polars. 5-10x faster DataFrame operations than Pandas.

Parquet. Columnar format reads only the columns you need.

Stability & Roadmap

Transparent Stability Tiers

Clear about what's stable and what's experimental

Stable
  • Company facade
  • DART sections / show / trace / diff
  • DART docs / finance / report
  • search / listing
  • BS · IS · CF · ratios · timeseries
Beta
  • EDGAR Company (sections, finance)
  • insights (7-area grading)
  • rank / sector
  • Excel export
  • Server API (FastAPI 40+ endpoints)
  • MCP server (60 tools)
Experimental
  • AI analysis (7 providers)
  • AI GUI (Desktop)
  • Network scanner (new)

Roadmap

Now v0.6
  • sections text structure
  • EDGAR sections 100%
  • Network scanner
Next v0.7
  • profile.sections merged view
  • TopicView implementation
  • show() completion
Later v0.8+
  • EDINET engine
  • AI GUI improvements
  • Rust pipeline (sections)
What You Can Do

Questions DartLab Answers

Every question starts from the same company map. No glue code, no context switching.

"What's their real credit risk?"

Independent credit evaluation (dCR) rebuilt from disclosure data alone — repayment capacity, capital structure, liquidity, cash-flow quality, disclosure risk. No agency rating, no black box. One call: c.credit().

"Show me the numbers in context"

Financial statements alone miss half the story. DartLab puts BS/IS/CF next to the narrative that explains them — same company, same timeline, same object.

"Screen the entire market"

Scan all 2,700 listed companies by governance quality, workforce trends, capital returns, or debt risk. One call: dartlab.governance("all"). Filter, rank, compare.

"Build a research dataset"

Standardized text + financial data across hundreds of companies. Ready for NLP, ML training, or academic research. No cleaning, no alignment — already done.

"Let AI analyze with real evidence"

Feed structured company context to any LLM — not raw PDFs. 7 providers supported. The AI reasons over actual disclosure data, not hallucinated summaries.

"One tool for Korea and US"

Same Company interface for Korean DART and US EDGAR. Learn it once, apply it to both markets. Compare Samsung and Apple with the same API.

Get Started

Installation

Start analyzing right after install

uv
$ uv add dartlab
AI analysis
$ uv add dartlab[ai] && uv run dartlab ai
Auto Download

No separate data preparation needed. Pass a stock code and missing data is automatically downloaded from HuggingFace.

from dartlab import Company
c = Company("005930")   # auto-downloads
Quick Start

3줄이면 끝

종목코드 하나면 회사 이름·공시 상태·전 분기 재무제표가 자동으로 딸려온다.

python
import
dartlab

c = dartlab.Company("005930")   # 삼성전자
c.show("IS")                    # 손익계산서 전 분기
c.show("CF")                    # 현금흐름표 — 문자열만 바꾸면 끝
실습은 노트북 섹션에서

Colab · Molab · 로컬 마리모 — 같은 코드를 세 경로로 바로 돌려볼 수 있다.

노트북으로
Notebooks

실습 노트북

11개 주제, Colab · Molab · 로컬 마리모 — 같은 코드로 돌려볼 수 있다.

01

Company

종목코드 하나로 재무/공시

02

gather

가격 · 수급 · 매크로 · 뉴스

03

scan

전종목 횡단

04

quant

25지표 + 9신호

05

analysis

14축 + 전망 · 가치평가

06

macro

사이클 · 금리 · 자산

07

credit

dCR 7축 등급

08

review

구조화 보고서

09

ai

ask · chat

10

search

공시 검색

11

listing

법인 · 공시 목록

로컬 마리모 편집

Marimo 노트북은 로컬에서 편집하는 게 가장 빠르다. 아래 명령어를 실행하면 브라우저에 편집기가 자동으로 뜬다. 파일 이름만 바꾸면 다른 노트북도 같은 방식.

shell
$ uv run marimo edit notebooks/marimo/01_company.py

Marimo 는 코드만, Colab 은 마크다운 설명 + 코드 — 같은 구성을 두 포맷으로 유지한다.

DartLab mascot

Start Reading Companies, Not PDFs

One stock code. Every filing structured. Every period comparable.

One line of Python gives you what used to take days of PDF reading.