ecnf-format.txt

﻿Notes on database entries for elliptic curves over number fields (other than Q)

In the existing (mongodb) Database elliptic_curves there is currently
a Collection called curves.  We have added *one* new collection,
called nfcurves. This will hold one "document" (mongo-ese for a single
database record) for each curve.

Each document has keys which we group for convenience here into two
parts: field keys and curve keys.  The field keys identify the field K
over which the curve E is defined, with enough detail so that searches
can be done for curves over specific fields or (for example) over real
quadratic fields.  The curve fields identify the curve and contains
various invariants and properties of the curve.  In many cases some of
these will not be known, in which case they are simply omitted; and
the code which deals with these documents and creates web pages will
have to be aware of this partial data.

I have given these keys long names here for clarity.  Harald
recommends using short names to save space, as in our code we can
easily define longer pseudonyms (e.g. conductor_norm = "cn").

FIELD KEYS (Mandatory keys marked with *)

field_label  *   string          2.2.5.1
degree       *   int             2
signature    *   [int,int]       [2,0]
abs_disc     *   int             5

CURVE KEYS (Mandatory keys marked with *)

label              *     string (see below)
short_label        *     string
conductor_label    *     string
iso_label          *     string (letter code of isogeny class)
conductor_ideal    *     string
conductor_norm     *     int
number             *     int    (number of curve in isogeny class, from 1)

Here,

 label = “%s-%s” % (field_label, short_label)
 short_label = “%s-%s%s” % (conductor_label, iso_label, str(number))

We do not insist on a common format for the conductor_label or
conductor_ideal.  The conductor_ideal should contain enough
information to uniquely construct the conductor as an ideal.  The
conductor_label need not, it could be something like "17.1" for the
first ideal of norm 17 in some order.  For each type of field we
include (real quadratic, imaginary quadratic, etc) these label formats
will need to be specified somewhere and functions provided in the code
to parse them.  See the file quadratic_ideals.txt for the quadratic
case.

All number field elements (NFelt) will be represented in the database
as a list of length d where d=[K:Q], each component a string which
represents the rational number num/den.  Originally we stored
rationals as pairs of integers [num,den] but this failed as soon as we
had a curve who coefficients were loo large for Mongo to manage (it
has a limitation of 8-byte integers).  The d rationals in the list are
the coefficients of the number field element with respect to the power
basis on the generator of K whose minimal polynomial is stored in the
number field database.  The a-invariants will be a list of 5 of these
lists of strings.  Points will be lists of three of these, the
projective coordinates, and the gens and torsion_gens fields are lists
of points.

CURVE KEYS continued. (Mandatory keys marked with *)

ainvs                  *         5-list of NFelts (Weierstrass coefficients)
jinv                   *         NFelt
cm                     *         int (a negative discriminant, or 0)
base_change            *         False or label of curve over Q
Q-curve                *         boolean (True, False)
rank                             int
rank_bounds                      2-list of ints
analytic_rank                    int
torsion_order                    int
torsion_structure                {0,1,2}-list of ints
gens                             list of 3-lists of NFelts
torsion_gens                     list of 3-lists of NFelts
sha_an                           int

There is no need for the Weierstrass coefficients to have any
properties such as minimal / reduced model, but they should be
integral.  Obviously one would not have both rank_bounds and rank (and
similar).

N.B. base_change should be not False if and only if E is isomorphic
over K to an elliptic curve E0 defined over Q.  For this it is
necessary but not sufficient that j(E) lie in Q!  If not False, the
value is the label of the elliptic curve over Q.  This field depends
on the curve, not just its isogeny class.  By contrast, Q-curve is
True iff the curve is isogenous to all its Galois conjugates, which is
an isogeny-invariant property.  If any curve in the class is a
base-change then all curves in the class are Q-curves (including those
defined over Q) but it is possible for an isogeny class of Q-curves
not to contain any individual curves defined over Q.

Raw data files for input / upload into the database

We all agree to create plain text files with one line per curve
containing some of the above fields.  See the file sample-nfcurve for
the details.  There is no need for contributors to provide all of the
fields for two reasons:

1. There is no need to include j-invariants as these can be computed
on the fly when uploading (similarly for torsion, and other "easy"
invariants);

2. We may not have complete data for all curves, e.g. no generators or
no rank.  We have common wrtten parsing and uploads scripts, similar
to those in import_ec_data.py, for curves and curve_data files (see
later).

3. We initially asked for torsion structure and gens to be supplied,
but later decided that it was easy and quick to complete these on the
fly.  We keep the number of data fields per input line constant (or
predictable), by putting "?" into the raw data files when a field is
missing (either unknown, or redundant -- e.g. rank bounds not needed
if rank known).

The import scripts can handle more than one set of files, each of
which has exactly one line per curve, provided that in each set of
files there is enough labelling to uniquely identify each curve.
These common fields for every file are the following 4:

1. field_label
2. conductor_label
3. iso_label
4. number

since the unique label is then obtained by concatenating these, joined
as field_label + "-" + conductor_label + "-" + iso_label + "." +
number.  (I have tried to make this compatible with HMF labels, so I
hope I have it correct.  A typical HMF complete label is
2.2.5.1-31.1-a where ‘2.2.5.1’ is the field_label and ‘31.1’ the
conductor_label and ‘a’ the iso_label.)

In the input file NFelts are formatted as strings consisting of d
coefficients separated by “,”, each an integer or rational, with no
embedded spaces.

Currently we need 1 or 2 sets of files: curves (obligatory) and
curve_data (optional) with the following fields after the above 4.
The file names should be of the form “curves.xyz” and “curve_data.xyz”
with the same suffix (here “xyz”) to identify the dataset in some way.
So far we have xyz=sample for the sample curves files, xyz=qsqrt5 for
Alyson’s data and cubic-23 for Paul and Dan’s, but if they provide
additional curves with (say) different conductor ranges then we would
need to change these suffices to be more specific.  The intention is
that these raw data files should be kept under revision control, by
their creators and/or John, who has created the git repository
https://github.com/JohnCremona/ecnf-data for this.

Additional data fields in curves.xyz:

5: conductor_ideal
6: conductor_norm
7...11: ainvs (in 5 fields separated by space)
12: cm (0 or a negative discriminant)
13: Q_curve (use 0 and 1 for False and True)

Sample line: 2.2.5.1 31.1 a 1 [31,18,1] 31  1,0 -1,-1 0,1 0,0 0,0 0 0

Additional data fields in curve_data.xyz:

5: rank                   int or ?
6: rank_bounds      [int,int] or ?
7: analytic_rank          int or ?
8: ngens (number of generators of infinite order which follow) int
9...8+ngens: gens (number of fields equals ngens or absent if ngens=0) point-string
9+ngens:  sha_an          int or ?

For points the syntax will be [x:y:z] with x, y, z all NFelts as
strings and no embedded spaces.

Sample line:

2.2.5.1 31.1 a 1 ? ? 0 0 ?

Some of the database fields (e.g. short_label, jinv) are trivially
obtained from these, and the torsion structure and gens will also be
computed during the upload.  I divided up the two raw data files so
that if you only know the basics about curves (its equation, and cm &
Q_curve flags) then you only need provide the curves file while if
you know more about their Mordell-Weil group then you can provide
curve_data files too.  The upload scripts will check for consistency
that the conductor is correct. (This has already revealed some
glitches in the files, so was definitely a good idea!).  The
base_change field in the database will be computed on upload (for
curves for which Q_curve is True) by checking that the j-invariant is
in QQ and using Sage's E.descends_to(QQ) function.

If all contributors agree to keep to this then we will be able to use
the uniform upload scripts, which have been written (by John), based
on the elliptic curve upload scripts.  See lmfdb/ecnf/import_ecnf_data.py.

TODO (this is being changed…)
A mockup of a typical mongoDB entry:

{
 “_id” : <chosen by mongodb and rather random>,
 “field” : [ <label>, <degree>, [ <sig1>, <sig2> ], <abs_disc> ],
 “label” : <curve label>,
 “

the class structure (mapping the database entries to python classes)

ECNF:
* NField (it’s own class)
   * label
   * ...
* label
* ...