SPVflow
All posts
TechnicalMarch 12, 2026 · 7 min read

A Developer’s Guide to the SEC EDGAR API

EDGAR exposes a surprising amount of structured data through its public APIs, but the documentation is sparse and the gotchas are real. This guide covers the architecture, endpoints, rate limits, XML parsing, and practical patterns for building reliable data pipelines against EDGAR.

This article is for informational and educational purposes only and does not constitute financial, legal, investment, or tax advice.

Key Takeaways
  • EDGAR serves all SEC filings through a static file architecture—most data is pre-rendered and served from flat files, not a traditional REST API.
  • The EFTS full-text search endpoint at efts.sec.gov/LATEST/search-index accepts JSON parameters and returns structured results, but it was not designed as a public developer API.
  • The SEC enforces a strict fair access policy: 10 requests per second maximum, a declared User-Agent header, and no anonymous scraping.
  • Form D filings filed after September 2008 are structured XML, which makes them machine-readable but introduces schema versioning and encoding quirks.
  • SPV Flow handles all of this complexity under the hood—ingesting, parsing, normalizing, and indexing every Form D filing so you can query clean data through the analytics dashboard.

EDGAR Architecture Overview

EDGAR is not a conventional API. It is better understood as a massive static file server with a few search endpoints bolted on. When a company submits a filing, EDGAR processes it and publishes the result as a set of files in a predictable directory structure under sec.gov/Archives/edgar/data/{CIK}/. Each filing gets its own subdirectory containing the primary document, any exhibits, and an index file listing everything in the submission.

This architecture has implications for developers. There is no /api/v1/filings endpoint that returns paginated JSON. Instead, you fetch filing indices, parse them, and then retrieve individual documents. The SEC also publishes daily and full index files at sec.gov/Archives/edgar/full-index/, organized by year and quarter. These index files are flat text—pipe-delimited or tab-separated—listing every filing submitted during that period.

The CIK (Central Index Key) is the primary identifier for entities in EDGAR. Every filer gets a unique CIK, and all of their filings are grouped under it. The SEC maintains a company_tickers.json file that maps CIK numbers to ticker symbols and company names, but for private entities filing Form D, there is no ticker—only the CIK and the entity name as submitted.

EFTS Full-Text Search API

The EDGAR Full-Text Search System (EFTS) is the closest thing EDGAR has to a query API. The endpoint at efts.sec.gov/LATEST/search-index accepts GET requests with query parameters and returns JSON. Key parameters include q (search query), dateRange (filing date range), forms (comma-separated filing types), and from / size for pagination.

A typical request to find recent Form D filings mentioning “real estate” looks like:

GET https://efts.sec.gov/LATEST/search-index?q=%22real+estate%22&forms=D&dateRange=custom&startdt=2026-01-01&enddt=2026-03-12

The response includes a hits array with accession numbers, filing dates, entity names, and file URLs. However, EFTS has limitations that matter for programmatic use. The size parameter caps at 100 results per page, and deep pagination beyond 10,000 results is unreliable. The search is full-text against the rendered filing content, not against structured fields—so filtering by offering amount or exemption type is not possible through EFTS alone.

For a more user-oriented walkthrough of EFTS, see the SEC EDGAR search guide.

Rate Limits and Fair Access Policy

The SEC publishes a fair access policy that every developer working with EDGAR must follow. The rules are straightforward but strictly enforced:

  • 10 requests per second maximum. Exceeding this rate will get your IP address temporarily blocked, and repeated violations can result in a permanent ban.
  • Declare a User-Agent header with your name, company, and contact email. Requests without a proper User-Agent may be throttled or rejected. The format the SEC expects is: CompanyName AdminEmail (e.g., AcmeCorp admin@acme.com).
  • Access during off-peak hours when possible. The SEC recommends bulk downloads between 9 PM and 6 AM Eastern to reduce load on the system during business hours.
  • No automated scraping without identification. Bots that do not identify themselves or that hammer the servers are treated as abuse.

In practice, the 10 req/s limit is the binding constraint. A backfill job that needs to fetch 50,000 filings at one document per request will take at minimum 83 minutes of pure fetch time. Pipeline design must account for this—parallelizing beyond the rate limit is not an option.

XML Feeds and the Filing Index

Beyond search, EDGAR provides several feeds for discovering new filings. The most useful for Form D developers are:

  • Daily index files: Published at sec.gov/Archives/edgar/daily-index/, these list every filing submitted on a given business day. The form.idx file within each daily directory is a fixed-width text file with columns for company name, form type, CIK, date filed, and the filing URL.
  • Full index files: Found at sec.gov/Archives/edgar/full-index/{year}/QTR{n}/, these aggregate an entire quarter of filings into a single index. Useful for historical backfills.
  • RSS feeds: The SEC publishes RSS feeds for recent filings by form type. The Form D RSS feed provides near-real-time discovery of new filings, typically appearing within minutes of acceptance.
  • XBRL filing feeds: Some structured data is published through the SEC’s XBRL viewer, but Form D data is not part of the XBRL taxonomy—it uses its own XML schema.

For a production system that needs to track new Form D filings, the daily index files and RSS feeds are the primary discovery mechanisms. Polling the daily index once an hour during business days catches filings reliably, and the RSS feed provides faster notification for time-sensitive use cases.

Parsing Form D XML

Form D filings submitted after September 2008 are structured XML documents conforming to the SEC’s Form D XML schema. The root element is <edgarSubmission>, and the data is organized into sections that mirror the paper form: issuer information, offering details, related persons, recipients, and signature blocks.

Key fields developers typically extract include:

  • Issuer: issuerName, issuerCik, issuerStateOrCountry, yearOfIncorporation, entityType
  • Offering: industryGroupType, investmentFundType, revenueRange, federalExemptionsExclusions (this tells you if it is a 506(b) or 506(c) offering)
  • Financial: totalOfferingAmount, totalAmountSold, totalRemaining, totalNumberAlreadyInvested
  • Related persons: Names, titles, and relationships of directors, officers, and promoters listed on the filing

The XML is generally well-formed, but you need to handle namespace declarations, optional elements that may be absent entirely (not just empty), and encoding of special characters. Amendments (Form D/A filings) resubmit the entire form, so you must track accession numbers and filing dates to identify the most current version of an offering.

Data Quirks and Gotchas

Working with EDGAR data at scale exposes several issues that are not obvious from the documentation:

  • Indefinite offering amounts: Issuers can check “Indefinite” for the total offering amount, which means the XML field will contain a boolean flag rather than a dollar value. Your parser must handle both cases.
  • Entity name inconsistency: The same fund manager may file under “ABC Capital Management LLC”, “ABC Capital Mgmt, LLC”, and “ABC Capital Management, L.L.C.” across different filings. EDGAR does not normalize these names.
  • CIK reuse is rare but possible: In edge cases, CIK numbers have been reassigned. Do not assume a CIK is a permanent, immutable identifier without cross-referencing filing history.
  • Date formatting varies: Filing dates in index files use YYYY-MM-DD, but dates within XML documents may use different formats depending on the schema version. Validate dates defensively.
  • Pre-2008 filings are unstructured: Form D filings before the XML schema transition are plain text or HTML documents with no machine-readable structure. Extracting data from these requires heuristic parsing or OCR for scanned documents.
  • Amendment chains: A single offering may have a chain of D, D/A, D/A, D/A filings. Each amendment replaces the previous version in full. You need to reconstruct the timeline to get the current state.
  • Stale test filings: EDGAR occasionally contains test filings submitted during system testing. These have real CIK numbers but nonsensical data. Filtering them requires pattern matching on issuer names and offering details.

Building a Production Pipeline

A reliable EDGAR data pipeline for Form D typically has four stages:

  1. Discovery: Poll the daily index files and RSS feeds to identify new Form D and D/A filings. Store accession numbers in a queue for processing. For historical backfills, iterate through the quarterly full-index files.
  2. Fetching: Download the primary XML document for each filing, respecting the 10 req/s rate limit. Implement exponential backoff for transient errors (EDGAR returns 503s during high load). Persist raw XML to durable storage before any processing.
  3. Parsing: Extract structured fields from the Form D XML. Handle schema variations, missing optional fields, and the indefinite-amount flag. Map exemption codes to human-readable labels. Normalize entity names and addresses.
  4. Loading: Write the parsed data into your database or search index. Implement upsert logic keyed on CIK + accession number to handle amendments correctly. Build indexes on the fields you query most: filing date, exemption type, state, industry, and offering amount.

The critical design decision is idempotency. EDGAR data does not change after publication (aside from occasional SEC corrections), but your pipeline will inevitably re-process filings during backfills, error recovery, or schema migrations. Every stage should be safe to re-run without creating duplicate records or corrupting state.

How SPV Flow Handles EDGAR Data

SPV Flow runs exactly this kind of pipeline in production. Every Form D and Form D/A filing is ingested within hours of publication, parsed against the full XML schema, and loaded into a normalized data store. The pipeline handles all of the quirks described above: indefinite amounts, amendment chains, entity name variations, and pre-2008 unstructured filings.

The result is the SPV Flow analytics dashboard, where you can filter and search the complete Form D dataset by exemption type, offering amount, industry, state, entity type, and date range. The platform resolves amendment chains automatically, showing you the latest state of each offering rather than a list of raw filings. Entity matching links filings from the same fund manager even when names vary across submissions.

For developers building their own tools, SPV Flow eliminates the need to build and maintain an EDGAR ingestion pipeline. Instead of spending weeks solving the parsing, rate-limiting, and data-quality problems described in this guide, you can work directly with clean, structured Form D data. If you are evaluating whether to build or buy this infrastructure, the answer usually depends on whether EDGAR data is your core product or a supporting input—if the latter, the engineering cost of maintaining a reliable pipeline rarely justifies the investment.

Frequently Asked Questions

Does the SEC provide an official API for EDGAR?

Not in the traditional sense. EDGAR is primarily a static file server, and the EFTS search endpoint functions like an API but is not versioned, documented with OpenAPI specs, or supported as a developer product. The SEC provides bulk data downloads and index files, but there is no official REST API with authentication tokens, SDKs, or rate-limit headers. Developers reverse-engineer the endpoints and must follow the fair access policy independently.

What happens if I exceed the EDGAR rate limit?

The SEC will temporarily block your IP address, typically for 10 to 15 minutes. Repeated violations can result in longer blocks or a permanent ban. There is no formal appeals process—if your IP is permanently blocked, you need to contact the SEC’s webmaster. The best approach is to implement a client-side rate limiter that enforces the 10 req/s ceiling with a small safety margin.

Can I access historical Form D filings through EDGAR?

Yes. EDGAR’s full-index archive goes back to 1993, and Form D filings are available from the system’s early years. However, filings before September 2008 are unstructured text or HTML rather than XML, which makes them significantly harder to parse programmatically. The quarterly full-index files are the most efficient way to discover historical filings in bulk.

How do I distinguish a Form D from a Form D/A amendment?

In the EDGAR index files and EFTS search results, the form type field will show “D” for original filings and “D/A” for amendments. Within the XML document itself, the <isAmendment> element indicates whether the filing is an amendment. Amendments resubmit the full form, so you should treat the most recently filed version as the current state of the offering. See the Form D filing guide for more on how amendments work.

Is there a sandbox or test environment for EDGAR?

No. The SEC does not provide a sandbox environment for EDGAR. All API calls and file fetches hit the production system, which means your development and testing traffic counts against the rate limit. For development, consider caching responses locally and replaying them during testing. Alternatively, use SPV Flow for structured access to Form D data without managing EDGAR infrastructure directly.

Disclaimer

The information provided in this article is for general informational and educational purposes only. Nothing in this article constitutes financial, legal, investment, or tax advice, nor does it create an attorney-client or advisory relationship. SPV Flow is a data platform that aggregates and presents publicly available information from SEC EDGAR filings. While we strive for accuracy, we make no representations or warranties about the completeness, accuracy, or timeliness of the information presented. SEC filings and regulations are subject to change. Always consult with a qualified attorney, financial advisor, or tax professional before making investment decisions, filing with the SEC, or taking any action based on information in this article. Past performance and filing data do not guarantee future results.

Track Form D filings in real time

SPV Flow surfaces SEC Form D filings as they happen. Search, filter, and set alerts.

Explore Live Filings