Skip to content
← Back

legal-expand: How to convert AEAT, IVA, IRPF into text your client can understand

8 min read
TypeScriptPythonNPMPyPILegal TechNLP

Spanish legal texts are plagued with acronyms that only lawyers understand. AEAT, IRPF, LEC, CGPJ, BOE. Your client reads a ruling and understands nothing. This library automatically expands 646 legal acronyms verified by the RAE, adding the meaning in parentheses. Available for NPM and Python, with semantic HTML support and structured output.

Your client receives a tax notification. They read: “La AEAT, en aplicación del art. 108 de la LGT, requiere la presentación de la declaración del IRPF correspondiente al ejercicio fiscal, conforme al RD 1065/2007 y la LIRPF”. They call asking what that means. You spend ten minutes explaining that AEAT is the Tax Agency, that LGT is the General Tax Law, that IRPF is income tax, that RD is Royal Decree, and that LIRPF is the Income Tax Law.

This scenario repeats constantly in law firms, tax consultancies, and legal departments. Spanish legal texts use acronyms for everything: institutions, laws, taxes, courts, procedures. A two-page document can have thirty different acronyms. For a legal professional they’re obvious. For everyone else they’re hieroglyphics.

The problem isn’t just client comprehension. It also affects automated systems. If you use an LLM to analyze legal documents, the model may not know what CGPJ or TSJC means. If you index rulings for search, unexpanded acronyms make it harder to find relevant results. If you generate automatic summaries, acronyms are copied as-is without context.

The solution: expand acronyms automatically

The idea is simple. You take a text with acronyms and transform it into a text where each acronym has its meaning in parentheses:

Input:

La AEAT notifica el IVA

Output:

La AEAT (Agencia Estatal de Administración Tributaria) notifica el IVA (Impuesto sobre el Valor Añadido)

The text remains the same but now anyone can understand it. The client reads “AEAT” and next to it has “(Agencia Estatal de Administración Tributaria)”. No need to search Google or ask you what it means.

legal-expand does exactly this. It’s a library containing 646 verified Spanish legal acronyms from official sources. You pass it a text, it returns the text with expanded acronyms. Available for NPM (TypeScript/JavaScript) and PyPI (Python).

Installation

For Node.js projects:

npm install legal-expand

For Python projects:

pip install legal-expand

Both versions have the same functionality and equivalent API.

Basic usage

In TypeScript/JavaScript:

import { expandirSiglas } from 'legal-expand';

const texto = 'La AEAT notifica el IVA';
const resultado = expandirSiglas(texto);
// 'La AEAT (Agencia Estatal de Administración Tributaria) notifica el IVA (Impuesto sobre el Valor Añadido)'

In Python:

from legal_expand import expandir_siglas

texto = 'La AEAT notifica el IVA'
resultado = expandir_siglas(texto)
# 'La AEAT (Agencia Estatal de Administración Tributaria) notifica el IVA (Impuesto sobre el Valor Añadido)'

The function detects acronyms in the text, looks up their meaning in the dictionary, and adds the expansion in parentheses after each acronym.

Intelligent variant detection

Acronyms in real texts appear written in different ways. Sometimes with periods, sometimes without, sometimes in uppercase, sometimes with only the first letter capitalized. The library recognizes all variants:

expandirSiglas('La AEAT requiere...');    // Detects AEAT
expandirSiglas('La A.E.A.T. requiere...'); // Detects A.E.A.T.
expandirSiglas('La A.E.A.T requiere...');  // Detects A.E.A.T
expandirSiglas('La Aeat requiere...');     // Detects Aeat

All these variants expand correctly to the same meaning. You don’t need to normalize the text before processing.

Output formats

The library supports three output formats depending on the use case.

Plain text (default)

Adds the expansion in parentheses after each acronym. It’s the simplest format and works for any context.

expandirSiglas('El BOE publica la LEC');
// 'El BOE (Boletín Oficial del Estado) publica la LEC (Ley de Enjuiciamiento Civil)'

Semantic HTML

Uses <abbr> tags with the title attribute. In browsers, users can hover over the acronym and see the meaning in a native tooltip.

expandirSiglas('El BOE publica la LEC', { format: 'html' });
// 'El <abbr title="Boletín Oficial del Estado">BOE</abbr> (Boletín Oficial del Estado) publica la <abbr title="Ley de Enjuiciamiento Civil">LEC</abbr> (Ley de Enjuiciamiento Civil)'

This format is ideal for web applications where you want to keep the text visually clean but offer additional information to users.

Structured output

Returns an object with the expanded text, list of found acronyms, and processing statistics. Useful for analysis or when you need programmatic access to detected acronyms.

const resultado = expandirSiglas('El BOE publica la LEC', { format: 'structured' });

// resultado.expandedText: 'El BOE (...) publica la LEC (...)'
// resultado.acronyms: [
//   { acronym: 'BOE', expansion: 'Boletín Oficial del Estado', position: 3 },
//   { acronym: 'LEC', expansion: 'Ley de Enjuiciamiento Civil', position: 20 }
// ]
// resultado.stats: { total: 2, expanded: 2, skipped: 0 }

In Python, structured output returns a StructuredOutput object with equivalent properties.

Expand only the first occurrence

In long documents, expanding every appearance of an acronym generates repetitive, uncomfortable-to-read texts. If a twenty-page document mentions “AEAT” fifty times, you don’t want to see “(Agencia Estatal de Administración Tributaria)” fifty times.

The expandOnlyFirst option expands only the first appearance of each acronym:

const texto = 'La AEAT notifica. La AEAT requiere. La AEAT informa.';

expandirSiglas(texto, { expandOnlyFirst: true });
// 'La AEAT (Agencia Estatal de Administración Tributaria) notifica. La AEAT requiere. La AEAT informa.'

The first time AEAT appears it gets expanded. Subsequent appearances are left as-is. This is standard behavior in technical and legal texts where acronyms are defined once at the beginning.

Exclude and include specific acronyms

Sometimes you want to control which acronyms get expanded. Maybe your client already knows what IVA is but doesn’t know the rest. Or maybe you’re processing a text where certain acronyms have context-specific meanings.

To exclude acronyms:

expandirSiglas('La AEAT notifica el IVA', { exclude: ['IVA'] });
// 'La AEAT (Agencia Estatal de Administración Tributaria) notifica el IVA'

To expand only specific acronyms:

expandirSiglas('La AEAT notifica el IVA y el IRPF', { include: ['AEAT'] });
// 'La AEAT (Agencia Estatal de Administración Tributaria) notifica el IVA y el IRPF'

Acronyms with multiple meanings

Some acronyms have more than one meaning. DGT can be “Dirección General de Tributos” (Tax Directorate) or “Dirección General de Tráfico” (Traffic Directorate). AN can be “Audiencia Nacional” (National Court) or “Acuerdo de Nación” (National Agreement). Context determines which is correct, but the library can’t infer it automatically.

By default, acronyms with multiple meanings don’t get expanded to avoid errors:

expandirSiglas('La DGT comunica...');
// 'La DGT comunica...' (not expanded because DGT has multiple meanings)

You can query what meanings an acronym has:

import { buscarSigla } from 'legal-expand';

const info = buscarSigla('DGT');
// {
//   found: true,
//   meanings: [
//     'Dirección General de Tributos',
//     'Dirección General de Tráfico'
//   ],
//   hasDuplicates: true
// }

To manually resolve which meaning to use:

expandirSiglas('La DGT comunica...', {
  duplicateResolution: {
    'DGT': 'Dirección General de Tráfico'
  }
});
// 'La DGT (Dirección General de Tráfico) comunica...'

You can also enable automatic resolution, which uses the most common meaning based on usage frequency:

expandirSiglas('La DGT comunica...', { autoResolveDuplicates: true });
// Expands using the most frequent meaning

Global configuration

If your application always uses the same options, you can configure them globally instead of passing them in each call:

import { configurarGlobalmente } from 'legal-expand';

configurarGlobalmente({
  enabled: true,
  defaultOptions: {
    format: 'html',
    expandOnlyFirst: true,
    exclude: ['art.', 'núm.', 'apdo.']
  }
});

// Now all calls use these options by default
expandirSiglas('La AEAT notifica...');

To reset to original configuration:

import { resetearConfiguracion } from 'legal-expand';

resetearConfiguracion();

Special context protection

The library detects and protects contexts where acronyms shouldn’t be expanded:

URLs: If the text contains https://aeat.es, it doesn’t try to expand “aeat” as an acronym.

Email addresses: info@aeat.es is left intact.

Code blocks: Content between triple backticks isn’t processed.

const texto = 'Visita https://aeat.es para más información sobre la AEAT';
expandirSiglas(texto);
// 'Visita https://aeat.es para más información sobre la AEAT (Agencia Estatal de Administración Tributaria)'

Only the AEAT in normal text gets expanded, not the one appearing in the URL.

The 646 acronyms in the dictionary

The dictionary contains verified acronyms from official sources:

Taxes and tributes (45 acronyms): AEAT, IVA, IRPF, IS, ISD, ITP, AJD, IAE, IBI, IIVTNU, ICIO, IGIC, IPSI…

Laws and regulations (80+ acronyms): CC (Civil Code), CCom (Commercial Code), CE (Spanish Constitution), CP (Criminal Code), LEC (Civil Procedure Law), LECrim (Criminal Procedure Law), LGT (General Tax Law), LIRPF, LIVA, LIS, LOPJ, LOPD, LOPDGDD…

Institutions (60+ acronyms): AN (National Court), BOE, CGPJ (General Council of the Judiciary), CNMV, DGT, DGRN, INEM, INSS, SEPE, TGSS, TS (Supreme Court), TC (Constitutional Court), TSJ, TSJC, TSJM…

Common abbreviations (30+ acronyms): art. (article), apdo. (section), cfr. (compare), núm. (number), pág. (page), ss. (following), vid. (see)…

Company types (15+ acronyms): S.A., S.L., S.L.U., S.Coop., S.Com., UTE, AIE, SICAV…

Other (30+ acronyms): pyme, I+D, ERE, ERTE, LAU, LAR, LPH, RAE, DPEJ…

Sources include the RAE’s Libro de Estilo de la Justicia, the Diccionario Panhispánico del Español Jurídico (DPEJ), the BOE, and current legislation. All acronyms are normalized according to RAE criteria, sorted by usage frequency, and verified to avoid incorrect duplicates.

Auxiliary functions

To list all available acronyms:

import { listarSiglas } from 'legal-expand';

const siglas = listarSiglas();
// ['AEAT', 'IVA', 'IRPF', 'BOE', 'LEC', ...]

To get dictionary statistics:

import { obtenerEstadisticas } from 'legal-expand';

const stats = obtenerEstadisticas();
// {
//   totalAcronyms: 646,
//   withDuplicates: 23,
//   categories: { taxes: 45, laws: 80, institutions: 60, ... }
// }

Framework integration

React

function LegalText({ texto }: { texto: string }) {
  const expandido = expandirSiglas(texto, { format: 'html' });
  return <div dangerouslySetInnerHTML={{ __html: expandido }} />;
}

Next.js (Server Components)

export default async function SentenciaPage({ params }) {
  const sentencia = await getSentencia(params.id);
  const textoExpandido = expandirSiglas(sentencia.texto);

  return <article>{textoExpandido}</article>;
}

FastAPI (Python)

from fastapi import FastAPI
from legal_expand import expandir_siglas, ExpansionOptions

app = FastAPI()

@app.post("/expandir")
async def expand(texto: str, formato: str = "plain"):
    resultado = expandir_siglas(texto, ExpansionOptions(format=formato))
    return {"resultado": resultado}

Django (Python)

from django.http import JsonResponse
from legal_expand import expandir_siglas

def expandir_view(request):
    texto = request.POST.get('texto', '')
    resultado = expandir_siglas(texto)
    return JsonResponse({'resultado': resultado})

LLM integration

When using language models to analyze legal texts, expanding acronyms before sending the text improves results. The model has more context and doesn’t have to infer what each acronym means.

import { expandirSiglas } from 'legal-expand';
import OpenAI from 'openai';

const openai = new OpenAI();

async function analizarSentencia(texto: string) {
  // Expand acronyms before sending to LLM
  const textoExpandido = expandirSiglas(texto, { expandOnlyFirst: true });

  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      { role: 'system', content: 'You are a legal assistant specialized in Spanish law.' },
      { role: 'user', content: `Analyze this ruling:\n\n${textoExpandido}` }
    ]
  });

  return response.choices[0].message.content;
}

The LLM receives “AEAT (Agencia Estatal de Administración Tributaria)” instead of just “AEAT”, which improves analysis accuracy.

Performance

The library is optimized to process long texts:

Optimizations include precompiled regular expressions, indexed dictionary with O(1) lookup, and zero external dependencies that would add overhead.

The NPM package size is approximately 4KB gzipped.

Compatibility

NPM (legal-expand):

PyPI (legal-expand):

The code is on GitHub

The NPM package is published as legal-expand. Source code is at github.com/686f6c61/legal-expand.

The Python package is published as legal-expand. Source code is at github.com/686f6c61/pypi-legal-expand.

Both projects are MIT licensed. If you find missing acronyms or incorrect meanings, open an issue or send a pull request. The dictionary is kept updated with new acronyms as they appear in legislation and legal practice.