diff --git a/AUTHORS b/AUTHORS index ebf4d166..73c6a841 100644 --- a/AUTHORS +++ b/AUTHORS @@ -1,10 +1,11 @@ -Origin based on a pyelasticsearch of Robert Eanes and Matt Dennewitz - Ordered by date of first contribution: Alberto Paro + George Sakkis sandymahalo andrei - Tavis Aitken + Tavis Aitken Richard Boulton matterkkila Matt Chu + +Origin based on a pyelasticsearch of Robert Eanes and Matt Dennewitz diff --git a/Changelog b/Changelog index 4def732c..6e260cc0 100644 --- a/Changelog +++ b/Changelog @@ -1,209 +1,379 @@ -Changelog -========= -v. 0.19.0: - - Use default_indices instead of hardcoding ['_all'] (gsakkis) - - Complete rewrite of connection_http (gsakkis) - - Don't collect info on creation of ES object (patricksmith) - - Add interval to histogram facet. (vrachil) +================ + Change history +================ - Improved connection string construction and added more flexibility. (ferhatsb) +.. contents:: - Fixed pickling DotDict. +.. _version-0.19.1: - Fixed a bug in Decoder. +0.19.1 +====== - Added execution to TermsFilter. Fixed missing _name attribute in serialized object +News +---- - Added _cache and _cache_key parameters to filters. +- Create Manager to manage API action grouped as Elasticsearch. - Added scope, filter and global parameters to facets. closes #119 - - Use a single global ConnectionPool instead of initializing it on every execute call. (gsakkis) +- This allows to simplify ES object and to move grouped functionality in manager. We are following the ElasticSearch +- grouping of actions. For now we are adding: - Allow partial_fields to be passed in the Search class. (tehmaze) - - Propagated parameters to bulker. + - Indices Manager: to manage index operation - Support params for analyze. (akheron) + - Cluster Manager: to manage index operation - Added LimitFilter. +- Renamed field_name in name in ScriptFields - Fixed support for query as dict in Search object. +- Got docs building on readthedocs.org (Wraithan - Chris McDonald) - Added ListBulker implementation and create_bulker method. +- Added model and scan to search. - Moved imports to absolute ones. +- So one can pass custom object to be created - Removed inused urllib3 files and added timeout to connection_http. +- Added document exists call, to check is a document exists. - Add NotFilter as facet filter (junckritter) +Deprecated +---------- - Add terms facet filter +Using manager, a lot of es methods are refactored in the managers. This is the list of moved methods: -v. 0.18.7-rc1: +- .aliases -> .indices.aliases - Tested against 0.18.7, with all tests passing +- .status -> .indices.status - Added support for index_stats +- .create_index -> .indices.create_index -v. 0.17.0: +- .create_index_if_missing -> .indices.create_index_if_missing - API BREAKING: Added new searcher iterator API. (To use the old code rename ".search" in ".search_raw") +- .delete_index -> .indices.delete_index - API BREAKING: renamed indexes in indices. To be complaint to ES documentation. +- .exists_index -> .indices.exists_index - Tests refactory. - - Add model object to objetify a dict. - -v. 0.16.0: +- .delete_index_if_exists -> .indices.delete_index_if_exists - Updated documentation. +- .get_indices -> .indices.get_indices - Added TextQuery and some clean up of code. +- .get_closed_indices -> .indices.get_closed_indices - Added percolator (matterkkila). +- .get_alias -> .indices.get_alias - Added date_histogram facet (zebuline). +- .change_aliases -> .indices.change_aliases - Added script fields to Search object, also add "fields" to TermFacet (aguereca). +- .add_alias -> .indices.add_alias - Added analyze_wildcard param to StringQuery (available for ES 0.16.0) (zebuline). +- .delete_alias -> .indices.delete_alias - Add ScriptFields object used as parameter script_fields of Search object (aguereca). +- .set_alias -> .indices.set_alias - Add IdsQuery, IdsFilter and delete_by_query (aguereca). +- .close_index -> .indices.close_index - Bulk delete (acdha). +- .open_index -> .indices.open_index -v. 0.15.0: +- .flush -> .indices.flush - Only require simplejson for python < 2.6 (matterkkila) +- .refresh -> .indices.refresh - Added basic version support to ES.index and Search (merrellb) +- .optimize -> .indices.optimize - Added scan method to ES. This is only supported on ES Master (pre 0.16) (merrellb) +- .analyze -> .indices.analyze - Added GeoPointField to mapping types (merrellb) +- .gateway_snapshot -> .indices.gateway_snapshot - Disable thrift in setup.py. +- .put_mapping -> .indices.put_mapping - Added missing _routing property in ObjectField +- .get_mapping -> .indices.get_mapping - Added ExistsFilter +- .cluster_health -> .cluster.cluster_health - Improved HasChildren +- .cluster_state -> .cluster.state - Add min_similarity and prefix_length to flt. +- .cluster_nodes -> .cluster.nodes_info - Added _scope to HasChildQuery. (andreiz) +- .cluster_stats -> .cluster.node_stats - Added parent/child document in test indexing. Added _scope to HasChildFilter. +- .index_stats -> .indices.stats - Added MissingFilter as a subclass of TermFilter +- .delete_mapping -> .indices.delete_mapping - Fixed error in checking TermsQuery (merrellb) +- .get_settings -> .indices.get_settings - If an analyzer is set on a field, the returned mapping will have an analyzer +- .update_settings -> .indices.update_settings - Add a specific error subtype for mapper parsing exceptions (rboulton) - Add support for Float numeric field mappings (rboulton) +Fixes +----- - ES.get() now accepts "fields" as well as other keyword arguments (eg "routing") (rboulton) +- Fixed ResultSet slicing. - Allow dump_curl to be passed a filehandle (or still a filename), don't for filenames to be in /tmp, and add a basic test of it. +- Moved tests outside pyes code dir. Update references. Upgraded test elasticsearch to 0.19.9. - Add alias handling (rboulton) +- Added documentation links. - Add ElasticSearchIllegalArgumentException - used for example when writing to an alias which refers to more than one index. (rboulton) +- Renamed scroll_timeout in scroll. - Handle errors produced by deleting a missing document, and add a test for it. (rboulton) +- Renamed field_name in name in ScriptFields. - Split Query object into a Search object, for the search specific parts, and a Query base class. Allow ES.search() to take a query or a search object. Make some of the methods of Query base classes chainable, where that is an obviously reasonable thing to do. (rboulton) +- Added routing to delete document call. -v. 0.14.0: Added delete of mapping type. +- Removed minimum_number_should_match parameter.It is not supported by ElasticSearch and causes errors when using a BoolFilter. (Jernej Kos) - Embedded urllib3 to be buildout safe and for users sake. +- Improved speed json conversion of datetime values - Some code cleanup. +- Added boost argument to TextQuery. (Jernej Kos) - Added reindex by query (usable only with my elasticsearch git branch). +- Go back to urllib3 instead of requests. (gsakkis) - Added contrib with mailman indexing. +- Enhance Twitter River class. (thanks @dendright) - Autodetect if django is available and added related functions. +- Add OAuth authentication and filtering abilities to Twitter River. (Jack Riches) - Code cleanup and PEP8. +- HasChildFilter expects a Query. (gsakkis) - Reactivated the morelikethis query. +- Fixed _parent being pulled from _meta rather than the instance itself. (merrellb) - Fixed river support plus unittest. (Tavis Aitken) +- Add support of all_terms to TermFacet. (mouad) - Added autorefresh to sync search and write. - Added QueryFilter. +0.19.0 +====== - Forced name attribute in multifield declaration. - Added is_empty to ConstantScoreQuery and fixed some bad behaviour. +- Use default_indices instead of hardcoding ['_all'] (gsakkis) - Added CustomScoreQuery. +- Complete rewrite of connection_http (gsakkis) - Added parent/children indexing. +- Don't collect info on creation of ES object (patricksmith) - Added dump commands in a script file "curl" way. +- Add interval to histogram facet. (vrachil) - Added a lot of fix from Richard Boulton. +- Improved connection string construction and added more flexibility. (ferhatsb) -v. 0.13.1: Added jython support (HTTP only for now). +- Fixed pickling DotDict. -v. 0.13.0: API Changes: errors -> exceptions. +- Fixed a bug in Decoder. - Splitting of query/filters. +- Added execution to TermsFilter. Fixed missing _name attribute in serialized object - Added open/close of index. +- Added _cache and _cache_key parameters to filters. - Added the number of retries if server is down. +- Added scope, filter and global parameters to facets. closes #119 - Refactory Range query. (Andrei) +- Use a single global ConnectionPool instead of initializing it on every execute call. (gsakkis) - Improved HTTP connection timeout/retries. (Sandymahalo) +- Allow partial_fields to be passed in the Search class. (tehmaze) - Cleanup some imports. (Sandymahalo) +- Propagated parameters to bulker. -v. 0.12.1: Added collecting server info. +- Support params for analyze. (akheron) - Version 0.12 or above requirement. +- Added LimitFilter. - Fixed attachment plugin. +- Fixed support for query as dict in Search object. - Updated bulk insert to use new api. +- Added ListBulker implementation and create_bulker method. - Added facet support (except geotypes). +- Moved imports to absolute ones. - Added river support. +- Removed inused urllib3 files and added timeout to connection_http. - Cleanup some method. +- Add NotFilter as facet filter (junckritter) - Added default_indexes variable. +- Add terms facet filter - Added datetime deserialization. +0.18.7-rc1 +========== - Improved performance and memory usage in bulk insert replacing list with StringIO. - Initial propagation of elasticsearch exception to python. +- Tested against 0.18.7, with all tests passing -v. 0.12.0: added http transport, added autodetect of transport, updated thrift interface. +- Added support for index_stats -v. 0.10.3: added bulk insert, explain and facet. +0.17.0 +====== -v. 0.10.2: added new geo query type. +- API BREAKING: Added new searcher iterator API. (To use the old code rename ".search" in ".search_raw") -v. 0.10.1: added new connection pool system based on pycassa one. +- API BREAKING: renamed indexes in indices. To be complaint to ES documentation. -v. 0.10.0: initial working version. +- Tests refactory. + +- Add model object to objetify a dict. + +0.16.0 +====== + +- Updated documentation. + +- Added TextQuery and some clean up of code. + +- Added percolator (matterkkila). + +- Added date_histogram facet (zebuline). + +- Added script fields to Search object, also add "fields" to TermFacet (aguereca). + +- Added analyze_wildcard param to StringQuery (available for ES 0.16.0) (zebuline). + +- Add ScriptFields object used as parameter script_fields of Search object (aguereca). + +- Add IdsQuery, IdsFilter and delete_by_query (aguereca). + +- Bulk delete (acdha). + + +0.15.0 +====== + + +- Only require simplejson for python < 2.6 (matterkkila) + +- Added basic version support to ES.index and Search (merrellb) + +- Added scan method to ES. This is only supported on ES Master (pre 0.16) (merrellb) + +- Added GeoPointField to mapping types (merrellb) + +- Disable thrift in setup.py. + +- Added missing _routing property in ObjectField + +- Added ExistsFilter + +- Improved HasChildren + +- Add min_similarity and prefix_length to flt. + +- Added _scope to HasChildQuery. (andreiz) + +- Added parent/child document in test indexing. Added _scope to HasChildFilter. + +- Added MissingFilter as a subclass of TermFilter + +- Fixed error in checking TermsQuery (merrellb) + +- If an analyzer is set on a field, the returned mapping will have an analyzer + +- Add a specific error subtype for mapper parsing exceptions (rboulton) + +- Add support for Float numeric field mappings (rboulton) + +- ES.get() now accepts "fields" as well as other keyword arguments (eg "routing") (rboulton) + +- Allow dump_curl to be passed a filehandle (or still a filename), don't for filenames to be in /tmp, and add a basic test of it. + +- Add alias handling (rboulton) + +- Add ElasticSearchIllegalArgumentException - used for example when writing to an alias which refers to more than one index. (rboulton) + +- Handle errors produced by deleting a missing document, and add a test for it. (rboulton) + +- Split Query object into a Search object, for the search specific parts, and a Query base class. Allow ES.search() to take a query or a search object. Make some of the methods of Query base classes chainable, where that is an obviously reasonable thing to do. (rboulton) + +0.14.0 +====== + + +- Added delete of mapping type. + +- Embedded urllib3 to be buildout safe and for users sake. + +- Some code cleanup. + +- Added reindex by query (usable only with my elasticsearch git branch). + +- Added contrib with mailman indexing. + +- Autodetect if django is available and added related functions. + +- Code cleanup and PEP8. + +- Reactivated the morelikethis query. + +- Fixed river support plus unittest. (Tavis Aitken) + +- Added autorefresh to sync search and write. + +- Added QueryFilter. + +- Forced name attribute in multifield declaration. + +- Added is_empty to ConstantScoreQuery and fixed some bad behaviour. + +- Added CustomScoreQuery. + +- Added parent/children indexing. + +- Added dump commands in a script file "curl" way. + +- Added a lot of fix from Richard Boulton. + +0.13.1 +====== + +- Added jython support (HTTP only for now). + +0.13.0 +====== + +- API Changes: errors -> exceptions. + +- Splitting of query/filters. + +- Added open/close of index. + +- Added the number of retries if server is down. + +- Refactory Range query. (Andrei) + +- Improved HTTP connection timeout/retries. (Sandymahalo) + +- Cleanup some imports. (Sandymahalo) + +0.12.1 +====== + +- Added collecting server info. + +- Version 0.12 or above requirement. + +- Fixed attachment plugin. + +- Updated bulk insert to use new api. + +- Added facet support (except geotypes). + +- Added river support. + +- Cleanup some method. + +- Added default_indexes variable. + +- Added datetime deserialization. + +- Improved performance and memory usage in bulk insert replacing list with StringIO. + +- Initial propagation of elasticsearch exception to python. + +0.12.0 +====== + +- Added http transport, added autodetect of transport, updated thrift interface. + +0.10.3 +====== + +- Added bulk insert, explain and facet. + +0.10.2 +====== + +- Added new geo query type. + +0.10.1 +====== + +- Added new connection pool system based on pycassa one. + +0.10.0 +====== + +- Initial working version. diff --git a/FAQ b/FAQ index 4758df66..945f1730 100644 --- a/FAQ +++ b/FAQ @@ -1,3 +1,5 @@ +.. _faq: + ============================ Frequently Asked Questions ============================ @@ -5,4 +7,29 @@ .. contents:: :local: -TO be written \ No newline at end of file +.. _faq-general: + +General +======= + +.. _faq-when-to-use: + +What connection type should I use? +---------------------------------- + +For general usage I suggest to use HTTP connection versus your server. + +For more fast performance, mainly in indexing, I suggest to use thrift because its latency is lower. + +How you can return a plain dict from a resultset? +================================================= + +ResultSet iterates on ElasticSearchModel by default, to change this behaviour you need to pass a an object that +receive a connection and a dict object. + +To return plain dict object, you must pass to the search call a model parameter: + +.. code-block:: python + + model=lambda x,y:y + diff --git a/README.rst b/README.rst index a252a09f..546f5cc6 100644 --- a/README.rst +++ b/README.rst @@ -36,21 +36,61 @@ http://pyes.readthedocs.org/en/latest/ Changelog ========= -v. 0.18.7-rc1: +v. 0.19.1: - Tested against 0.18.7, with all tests passing + Renamed field_name in name in ScriptFields - Added support for index_stats + Fixed ResultSet slicing. -v. 0.17.0: + Create Manager to manage API action grouped as Elasticsearch. + + Moved tests outside pyes code dir. Update references. Upgraded test elasticsearch to 0.19.9. + + Added documentation links + + Got docs building on readthedocs.org (Wraithan - Chris McDonald) + + Renamed scroll_timeout in scroll + + Moved FacetFactory include + + Renamed field_name in name in ScriptFields + + Using only thrift_connect to manage thrift existence + + Added model and scan to query + + Added exists document call + + Added routing to delete + + Removed minimum_number_should_match parameter.It is not supported by elastic search and causes errors when using a BoolFilter. (Jernej Kos) + + Improved speed json conversion of datetime values + + Add boost argument to TextQuery + + Added boost argument to TextQuery. (Jernej Kos) + + Go back to urllib3 instead of requests. (gsakkis) + + Enhance Twitter River class. (thanks @dendright) + + Add OAuth authentication and filtering abilities to Twitter River. (Jack Riches) + + HasChildFilter expects a Query. (gsakkis) + + Fixed _parent being pulled from _meta rather than the instance itself. (merrellb) + + Add support of all_terms to TermFacet. (mouad) - API BREAKING: Added new searcher iterator API. (To use the old code rename ".search" in ".search_raw") - Tests refactory. TODO ---- +- add ORM to manage objects +- much more documentation - add coverage - add jython native client protocol diff --git a/docs/guide/appendix/glossary.rst b/docs/guide/appendix/glossary.rst index 44141b89..5ca3d45e 100644 --- a/docs/guide/appendix/glossary.rst +++ b/docs/guide/appendix/glossary.rst @@ -4,226 +4,210 @@ Glossary ======== -glossary: -- - id: analysis - text: > - Analysis is the process of converting full text_ to terms_. - Depending on which analyzer is used, these phrases: "**FOO BAR**", - "**Foo-Bar**", "**foo,bar**" will probably all result in the terms "**foo**" - and "**bar**". These terms are what is actually stored in the index. - - - A full text query (not a term_ query) for "**FoO:bAR**" will - also be analyzed to the terms "**foo**","**bar**" and will thus match - the terms stored in the index. - - - It is this process of analysis (both at index time and at search time) - that allows elasticsearch to perform full text queries. - - - Also see text_ and term_. -- - id: cluster - text: > - A cluster consists of one or more nodes_ which share the same - cluster name. Each cluster has a single master node which is - chosen automatically by the cluster and which can be replaced if - the current master node fails. - -- - id: document - text: > - A document is a JSON document which is stored in elasticsearch. It is - like a row in a table in a relational database. Each document is - stored in an index_ and has a type_ - and an id_. +.. _glossary-analysis: +analysis + Analysis is the process of converting full :ref:`text ` to :ref:`terms `. + Depending on which analyzer is used, these phrases: "**FOO BAR**", + "**Foo-Bar**", "**foo,bar**" will probably all result in the terms "**foo**" + and "**bar**". These terms are what is actually stored in the index. - A document is a JSON object (also known in other languages - as a hash / hashmap / associative array) which contains zero or more - fields_, or key-value pairs. + A full text query (not a :ref:`term ` query) for "**FoO:bAR**" will + also be analyzed to the terms "**foo**","**bar**" and will thus match + the terms stored in the index. + It is this process of analysis (both at index time and at search time) + that allows elasticsearch to perform full text queries. - The original JSON document that is indexed will be stored in the - **_source** field_, which is returned by default - when getting or searching for a document. + Also see :ref:`text ` and :ref:`term `. -- - id: id - text: > - The ID of a document_ identifies a document. The - **index/type/id** of a document must be unique. If no ID is provided, - then it will be auto-generated. (also see routing_) +.. _glossary-cluster: -- - id: field - text: > - A document_ contains a list of fields, or key-value pairs. - The value can be a simple (scalar) value (eg a string, integer, date), - or a nested structure like an array or an object. A field is similar - to a column in a table in a relational database. +cluster + A cluster consists of one or more :ref:`nodes ` which share the same + cluster name. Each cluster has a single master node which is + chosen automatically by the cluster and which can be replaced if + the current master node fails. +.. _glossary-document: - The mapping_ for each field has a field 'type' - (not to be confused with document type_) which indicates the - type of data that can be stored in that field, eg - **integer**, **string**, **object**. - The mapping also allows you to define (amongst other things) how the - value for a field should be analyzed. +document + A document is a JSON document which is stored in elasticsearch. It is + like a row in a table in a relational database. Each document is + stored in an :ref:`index ` and has a :ref:`type ` + and an :ref:`id `. -- - id: index - text: > - An index is like a 'database' in a relational database. It has a - mapping_ which defines multiple - types_. + A document is a JSON object (also known in other languages + as a hash / hashmap / associative array) which contains zero or more + :ref:`fields `, or key-value pairs. + The original JSON document that is indexed will be stored in the + **_source** :ref:`field `, which is returned by default + when getting or searching for a document. - An index is a logical namespace which maps to one or more - primary shards_ and can have zero or more - replica shards_. +.. _glossary-id: -- - id: mapping - text: > - A mapping is like a 'schema definition' in a relational database. - Each index_ has a mapping, which defines each - type_ within the index, plus a number of - index-wide settings. +id + The ID of a :ref:`document ` identifies a document. The + **index/type/id** of a document must be unique. If no ID is provided, + then it will be auto-generated. (also see :ref:`routing `) +.. _glossary-field: - A mapping can either be defined explicitly, or it will be generated - automatically when a document is indexed. -- - id: node - text: > - A node is a running instance of elasticsearch which belongs to a - cluster_. Multiple nodes can be started on a single - server for testing purposes, but usually you should have one node - per server. - - - At startup, a node will use unicast (or multicast, if specified) - to discover an existing cluster with the same cluster name and will - try to join that cluster. +field + A :ref:`document ` contains a list of fields, or key-value pairs. + The value can be a simple (scalar) value (eg a string, integer, date), + or a nested structure like an array or an object. A field is similar + to a column in a table in a relational database. -- - id: primary shard - text: > - Each document is stored in a single primary shard_. When you - index a document, it is indexed first on the primary shard, then - on all replicas_ of the primary shard. + The :ref:`mapping ` for each field has a field 'type' + (not to be confused with document :ref:`type `) which indicates the + type of data that can be stored in that field, eg + **integer**, **string**, **object**. + The mapping also allows you to define (amongst other things) how the + value for a field should be analyzed. +.. _glossary-index: - By default, an index_ has 5 primary shards. You can specify fewer - or more primary shards to scale the number of documents_ - that your index can handle. - - - You cannot change the number of primary shards in an index, once the - index is created. - - - See also routing_ - -- - id: replica shard - text: > - Each primary shard_ can have zero or more replicas. - A replica is a copy of the primary shard, and has two purposes: +index + An index is like a 'database' in a relational database. It has a + :ref:`mapping ` which defines multiple + :ref:`types `. - # increase failover: a replica shard can be promoted - to a primary shard if the primary fails + An index is a logical namespace which maps to one or more + primary :ref:`shards ` and can have zero or more + replica :ref:`shards `. - # increase performance: get and search requests can be handled by - primary or replica shards. +.. _glossary-mapping: +mapping + A mapping is like a 'schema definition' in a relational database. + Each :ref:`index ` has a mapping, which defines each + :ref:`type ` within the index, plus a number of + index-wide settings. - By default, each primary shard has one replica, but the number - of replicas can be changed dynamically on an existing index. - A replica shard will never be started on the same node as its primary - shard. - -- - id: routing - text: > - When you index a document, it is stored on a single - primary shard_. That shard is chosen by hashing - the **routing** value. By default, the **routing** value is derived - from the ID of the document or, if the document has a specified - parent document, from the ID of the parent document (to ensure - that child and parent documents are stored on the same shard). + A mapping can either be defined explicitly, or it will be generated + automatically when a document is indexed. +.. _glossary-node: - This value can be overridden by specifying a **routing** value at index - time, or a :ref:`routing field ` in the mapping_. +node + A node is a running instance of elasticsearch which belongs to a + :ref:`cluster `. Multiple nodes can be started on a single + server for testing purposes, but usually you should have one node + per server. -- - id: shard - text: > - A shard is a single Lucene instance. It is a low-level "worker" unit - which is managed automatically by elasticsearch. An index - is a logical namespace which points to primary_ - and replica_ shards. + At startup, a node will use unicast (or multicast, if specified) + to discover an existing cluster with the same cluster name and will + try to join that cluster. +.. _glossary-primary-shard: - Other than defining the number of primary and replica shards that - an index should have, you never need to refer to shards directly. - Instead, your code should deal only with an index. +primary shard + Each document is stored in a single primary :ref:`shard `. When you + index a document, it is indexed first on the primary shard, then + on all :ref:`replicas ` of the primary shard. + By default, an :ref:`index ` has 5 primary shards. You can specify fewer + or more primary shards to scale the number of :ref:`documents ` + that your index can handle. - Elasticsearch distributes shards amongst all nodes_ in - the cluster_, and can be move shards automatically from - one node to another in the case of node failure, or the addition - of new nodes. + You cannot change the number of primary shards in an index, once the + index is created. -- - id: source field - text: > - By default, the JSON document that you index will be stored in the - **_source** field and will be returned by all get and search requests. - This allows you access to the original object directly from search - results, rather than requiring a second step to retrieve the object - from an ID. + See also :ref:`routing ` - Note: the exact JSON string that you indexed will be returned to you, - even if it contains invalid JSON. The contents of this field do not - indicate anything about how the data in the object has been indexed. -- - id: term - text: > - A term is an exact value that is indexed in elasticsearch. The terms - **foo**, **Foo**, **FOO are NOT equivalent. Terms (ie exact values) can - be searched for using 'term' queries. +.. _glossary-replica-shard: - See also text_ and analysis_. -- - id: text - text: > - Text (or full text) is ordinary unstructured text, such as this - paragraph. By default, text will by :ref:`analyzed ` into - terms_, which is what is actually stored in the index. +replica shard + Each primary :ref:`shard ` can have zero or more replicas. + A replica is a copy of the primary shard, and has two purposes: + # increase failover: a replica shard can be promoted + to a primary shard if the primary fails - Text fields_ need to be analyzed at index time in order to - be searchable as full text, and keywords in full text queries must - be analyzed at search time to produce (and search for) the same - terms that were generated at index time. + # increase performance: get and search requests can be handled by + primary or replica shards. + By default, each primary shard has one replica, but the number + of replicas can be changed dynamically on an existing index. + A replica shard will never be started on the same node as its primary + shard. - See also term_ and analysis_. -- - id: type - text: > - A type is like a 'table' in a relational database. Each type has - a list of fields_ that can be specified for - documents_ of that type. The - mapping_ defines how each field in the document - is analyzed. +.. _glossary-routing: +routing + When you index a document, it is stored on a single + primary :ref:`shard `. That shard is chosen by hashing + the **routing** value. By default, the **routing** value is derived + from the ID of the document or, if the document has a specified + parent document, from the ID of the parent document (to ensure + that child and parent documents are stored on the same shard). + This value can be overridden by specifying a **routing** value at index + time, or a :ref:`routing field ` in the :ref:`mapping `. +.. _glossary-shard: + +shard + A shard is a single Lucene instance. It is a low-level "worker" unit + which is managed automatically by elasticsearch. An index + is a logical namespace which points to :ref:`primary ` + and :ref:`replica ` shards. + + Other than defining the number of primary and replica shards that + an index should have, you never need to refer to shards directly. + Instead, your code should deal only with an index. + + Elasticsearch distributes shards amongst all :ref:`nodes ` in + the :ref:`cluster `, and can be move shards automatically from + one node to another in the case of node failure, or the addition + of new nodes. + +.. _glossary-source-field: + +source field + By default, the JSON document that you index will be stored in the + **_source** field and will be returned by all get and search requests. + This allows you access to the original object directly from search + results, rather than requiring a second step to retrieve the object + from an ID. + + + Note: the exact JSON string that you indexed will be returned to you, + even if it contains invalid JSON. The contents of this field do not + indicate anything about how the data in the object has been indexed. + +.. _glossary-term: + +term + A term is an exact value that is indexed in elasticsearch. The terms + **foo**, **Foo**, **FOO** are NOT equivalent. Terms (ie exact values) can + be searched for using 'term' queries. + + See also :ref:`text ` and :ref:`analysis `. + +.. _glossary-text: + +text + Text (or full text) is ordinary unstructured text, such as this + paragraph. By default, text will by :ref:`analyzed ` into + :ref:`terms `, which is what is actually stored in the index. + + Text :ref:`fields ` need to be analyzed at index time in order to + be searchable as full text, and keywords in full text queries must + be analyzed at search time to produce (and search for) the same + terms that were generated at index time. + + See also :ref:`term ` and :ref:`analysis `. + +.. _glossary-type: + +type + A type is like a 'table' in a relational database. Each type has + a list of :ref:`fields ` that can be specified for + :ref:`documents ` of that type. The + :ref:`mapping ` defines how each field in the document + is analyzed. diff --git a/docs/index.rst b/docs/index.rst index 4a3417d6..17903bc5 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -14,6 +14,7 @@ Contents: links guide/reference/index guide/appendix/index + guide/appendix/glossary Indices and tables diff --git a/docs/manual/connections.rst b/docs/manual/connections.rst index 85217c6e..41444fbc 100644 --- a/docs/manual/connections.rst +++ b/docs/manual/connections.rst @@ -1,3 +1,5 @@ +.. _pyes-connections: + Connections =========== @@ -16,12 +18,16 @@ For thrift: >>> conn = pyes.ES() # Defaults to connecting to the server at '127.0.0.1:9500' >>> conn = pyes.ES(['127.0.0.1:9500']) + >>> conn = pyes.ES(("thrift", "127.0.0.1", "9500")) + >>> conn = pyes.ES([("thrift", "127.0.0.1", "9500"), ("thrift", "192.168.1.1", "9500"),]) For http: .. code-block:: python >>> conn = pyes.ES(['127.0.0.1:9200']) + >>> conn = pyes.ES(("http", "127.0.0.1","9200")) + >>> conn = pyes.ES([("thrift", "127.0.0.1", "9200"), ("thrift", "192.168.1.1", "8000"),]) Connections are robust to server failures. Upon a disconnection, it will attempt to connect to each server in the list in turn. If no server is available, it will raise a NoServerAvailable exception. diff --git a/docs/manual/index.rst b/docs/manual/index.rst index cbc032b7..3c6ecaa9 100644 --- a/docs/manual/index.rst +++ b/docs/manual/index.rst @@ -11,4 +11,6 @@ installation usage connections - queries \ No newline at end of file + models + queries + resultset diff --git a/docs/manual/models.rst b/docs/manual/models.rst new file mode 100644 index 00000000..b0a68e81 --- /dev/null +++ b/docs/manual/models.rst @@ -0,0 +1,65 @@ +.. _pyes-models: + +Models +====== + +DotDict +------- + +The DotDict is the base model used. It allows to use a dict with the DotNotation. + +.. code-block:: python + + >>> dotdict = DotDict(foo="bar") + >>> dotdict2 = deepcopy(dotdict) + >>> dotdict2["foo"] = "baz" + >>> dotdict.foo = "bar" + >>> dotdict2.foo== "baz" + True + +ElasticSearchModel +------------------ + +It extends DotDict adding methods for common uses. + +Every search return an ElasticSearchModel as result. Iterating on results, you iterate on ElasticSearchModel objects. + +You can create a new one with the factory or get one by search/get methods. + +.. code-block:: python + + obj = self.conn.factory_object(self.index_name, self.document_type, {"name": "test", "val": 1}) + assert obj.name=="test" + +You can change value via dot notation or dictionary. + +.. code-block:: python + + obj.name = "aaa" + assert obj.name == "aaa" + assert obj.val == 1 + +You can change ES info via ._meta property or get_meta call. + +.. code-block:: python + + assert obj._meta.id is None + obj._meta.id = "dasdas" + assert obj._meta.id == "dasdas" + +Remember that it works as a dict object. + +.. code-block:: python + + assert sorted(obj.keys()) == ["name", "val"] + +You can save it. + +.. code-block:: python + + obj.save() + obj.name = "test2" + obj.save() + + reloaded = self.conn.get(self.index_name, self.document_type, obj._meta.id) + assert reloaded.name, "test2") diff --git a/docs/manual/queries.rst b/docs/manual/queries.rst index 0dfd14d0..4b450f59 100644 --- a/docs/manual/queries.rst +++ b/docs/manual/queries.rst @@ -1,3 +1,5 @@ +.. _pyes-queries: + Queries ======= diff --git a/docs/manual/resultset.rst b/docs/manual/resultset.rst new file mode 100644 index 00000000..cfae2122 --- /dev/null +++ b/docs/manual/resultset.rst @@ -0,0 +1,40 @@ +.. _pyes-resultset: + +ResultSet +========= + +This object is returned as result of a query. It's lazy. + +.. code-block:: python + + >>> resultset = self.conn.search(Search(MatchAllQuery(), size=20), self.index_name, self.document_type) + +It contains the matched and limited records. Very useful to use in pagination. + +.. code-block:: python + + >>> len([p for p in resultset]) + 20 + +The total matched results is in the total property. + +.. code-block:: python + + >>> resultset.total + 1000 + +You can slice it. + +.. code-block:: python + + >>> resultset = self.conn.search(Search(MatchAllQuery(), size=10), self.index_name, self.document_type) + >>> len([p for p in resultset[:10]]) + 10 + +Remember all result are default ElasticSearchModel objects + +.. code-block:: python + + >>> resultset[10].uuid + "11111" + diff --git a/docs/manual/usage.rst b/docs/manual/usage.rst index 37c29572..f3692fe2 100644 --- a/docs/manual/usage.rst +++ b/docs/manual/usage.rst @@ -1,12 +1,12 @@ Usage ===== -Creating a connection: +Creating a connection. (See more details here :ref:`pyes-connections`) .. code-block:: python >>> from pyes import * - >>> conn = ES('127.0.0.1:9200') + >>> conn = ES('127.0.0.1:9200') #for http Deleting an index: @@ -17,7 +17,7 @@ Deleting an index: >>> except: >>> pass -(an exception is fored if the index is not present) +(an exception is raised if the index is not present) Create an index: @@ -25,7 +25,7 @@ Create an index: >>> conn.create_index("test-index") -Creating a mapping: +Creating a mapping via dictionary: .. code-block:: python @@ -52,6 +52,29 @@ Creating a mapping: >>> 'type': u'string'}} >>> conn.put_mapping("test-type", {'properties':mapping}, ["test-index"]) +Creating a mapping via objects: + +.. code-block:: python + + >>> from pyes.mappings import * + >>> docmapping = DocumentObjectField(name=self.document_type) + >>> docmapping.add_property( + >>> StringField(name="parsedtext", store=True, term_vector="with_positions_offsets", index="analyzed")) + >>> docmapping.add_property( + >>> StringField(name="name", store=True, term_vector="with_positions_offsets", index="analyzed")) + >>> docmapping.add_property( + >>> StringField(name="title", store=True, term_vector="with_positions_offsets", index="analyzed")) + >>> docmapping.add_property(IntegerField(name="position", store=True)) + >>> docmapping.add_property(StringField(name="uuid", store=True, index="not_analyzed")) + >>> nested_object = NestedObject(name="nested") + >>> nested_object.add_property(StringField(name="name", store=True)) + >>> nested_object.add_property(StringField(name="value", store=True)) + >>> nested_object.add_property(IntegerField(name="num", store=True)) + >>> docmapping.add_property(nested_object) + >>> settings.add_mapping(docmapping) + >>> conn.ensure_index(self.index_name, settings) + + Index some documents: .. code-block:: python @@ -63,15 +86,18 @@ Refresh an index: .. code-block:: python + >>> conn.refresh("test-index") >>> conn.refresh(["test-index"]) -Execute a query +Execute a query. (See :ref:`pyes-queries`) .. code-block:: python >>> q = TermQuery("name", "joe") >>> results = conn.search(query = q) +results is a (See :ref:`pyes-resultset`), you can iterate it. It caches some results and pages them. The default returned objects are ElasticSearchModel (See :ref:`pyes-models`). + Iterate on results: .. code-block:: python @@ -79,4 +105,4 @@ Iterate on results: >>> for r in results: >>> print r -For more examples looks at the tests. +The tests directory there are a lot of examples of functionalities. diff --git a/pyes/__init__.py b/pyes/__init__.py index f892347b..de1b9e3b 100644 --- a/pyes/__init__.py +++ b/pyes/__init__.py @@ -14,9 +14,9 @@ def is_stable_release(): - if len(VERSION) > 3 and isinstance(VERSION[3], basestring): + if len(VERSION) > 3: return False - return not VERSION[1] % 2 + return True def version_with_meta():