nv-array-section ============ - nv-array-section
Lightweight utility for input validation and data extraction in Turf.js. Ensures GeoJSON inputs are in the correct format and extracts specific components like coordinates or geometries.
An ini encoder/decoder for node
Array manipulation, ordering, searching, summarizing, etc.
Takes a grid of values (GeoJSON format) and a set of threshold ranges. It outputs polygons that group areas within those ranges, effectively creating filled contour isobands.
Float64Array.
output coverage reports using Node.js' built in coverage
Uint8Array.
Float32Array.
Uint32Array.
Uint16Array.
Test if a value is an array.
Create a filled generic array.
Return a function which tests if every element in an array passes a test condition.
Create a zero-filled generic array.
Generate contour lines from a grid of data.
Maximum length for a typed array.
Buffer.
Maximum length for a generic array.
Finds the nearest point on a line to a given point
Like front-matter, but supports multiple sections in a document.
Uint8ClampedArray.
Int32Array.
Int8Array.
Collapse allows a hash with arrays as values to merge and union common arrays instead of overwriting them.
Simplistic parsing of INI style files. Data within sections can be handled as raw lines or as an array split on spaces.
:title: The Ruby API :section: PYAPNS::Client There's python in my ruby! This is a class used to send notifications, provision applications and retrieve feedback using the Apple Push Notification Service. PYAPNS is a multi-application APS provider, meaning it is possible to send notifications to any number of different applications from the same application and same server. It is also possible to scale the client to any number of processes and servers, simply balanced behind a simple web proxy. It may seem like overkill for such a bare interface - after all, the APS service is rather simplistic. However, PYAPNS takes no shortcuts when it comes to completeness/compliance with the APNS protocol and allows the user many optimization and scaling vectors not possible with other libraries. No bandwidth is wasted, connections are persistent and the server is asynchronous therefore notifications are delivered immediately. PYAPNS takes after the design of 3rd party push notification service that charge a fee each time you push a notification, and charge extra for so-called 'premium' service which supposedly gives you quicker access to the APS servers. However, PYAPNS is free, as in beer and offers more scaling opportunities without the financial draw. :section: Provisioning To add your app to the PYAPNS server, it must be `provisioned` at least once. Normally this is done once upon the start-up of your application, be it a web service, desktop application or whatever... It must be done at least once to the server you're connecting to. Multiple instances of PYAPNS will have to have their applications provisioned individually. To provision an application manually use the `PYAPNS::Client#provision` method. require 'pyapns' client = PYAPNS::Client.configure client.provision :app_id => 'cf', :cert => '/home/ss/cert.pem', :env => 'sandbox', :timeout => 15 This basically says "add an app reference named 'cf' to the server and start a connection using the certification, and if it can't within 15 seconds, raise a `PYAPNS::TimeoutException` That's all it takes to get started. Of course, this can be done automatically by using PYAPNS::ClientConfiguration middleware. `PYAPNS::Client` is a singleton class that is configured using the class method `PYAPNS::Client#configure`. It is sensibly configured by default, but can be customized by specifying a hash See the docs on `PYAPNS::ClientConfiguration` for a list of available configuration parameters (some of these are important, and you can specify initial applications) to be configured by default. :section: Sending Notifications Once your client is configured, and application provisioned (again, these should be taken care of before you write notification code) you can begin sending notifications to users. If you're wondering how to acquire a notification token, you've come to the wrong place... I recommend using google. However, if you want to send hundreds of millions of notifications to users, here's how it's done, one at a time... The `PYAPNS::Client#notify` is a sort of polymorphic method which can notify any number of devices at a time. It's basic form is as follows: client.notify 'cf', 'long ass app token', {:aps=> {:alert => 'hello?'}} However, as stated before, it is sort of polymorphic: client.notify 'cf', ['token', 'token2', 'token3'], [alert, alert2, alert3] client.notify :app_id => 'cf', :tokens => 'mah token', :notifications => alertHash client.notify 'cf', 'token', PYAPNS::Notification('hello tits!') As you can see, the method accepts paralell arrays of tokens and notifications meaning any number of notifications can be sent at once. Hashes will be automatically converted to `PYAPNS::Notification` objects so they can be optimized for the wire (nil values removed, etc...), and you can pass `PYAPNS::Notification` objects directly if you wish. :section: Retrieving Feedback The APS service offers a feedback functionality that allows application servers to retrieve a list of device tokens it deems to be no longer in use, and the time it thinks they stopped being useful (the user uninstalled your app, better luck next time...) Sounds pretty straight forward, and it is. Apple recommends you do this at least once an hour. PYAPNS will return a list of 2-element lists with the date and the token: feedbacks = client.feedback 'cf' :section: Asynchronous Calls PYAPNS::Client will, by default, perform no funny stuff and operate entirely within the calling thread. This means that certain applications may hang when, say, sending a notification, if only for a fraction of a second. Obviously not a desirable trait, all `provision`, `feedback` and `notify` methods also take a block, which indicates to the method you want to call PYAPNS asynchronously, and it will be done so handily in another thread, calling back your block with a single argument when finished. Note that `notify` and `provision` return absolutely nothing (nil, for you rub--wait you are ruby developers!). It is probably wise to always use this form of operation so your calling thread is never blocked (especially important in UI-driven apps and asynchronous servers) Just pass a block to provision/notify/feedback like so: PYAPNS::Client.instance.feedback do |feedbacks| feedbacks.each { |f| trim_token f } end :section: PYAPNS::ClientConfiguration A middleware class to make `PYAPNS::Client` easy to use in web contexts Automates configuration of the client in Rack environments using a simple confiuration middleware. To use `PYAPNS::Client` in Rack environments with the least code possible `use PYAPNS::ClientConfiguration` (no, really, in some cases, that's all you need!) middleware with an optional hash specifying the client variables. Options are as follows: use PYAPNS::ClientConfiguration( :host => 'http://localhost/' :port => 7077, :initial => [{ :app_id => 'myapp', :cert => '/home/myuser/apps/myapp/cert.pem', :env => 'sandbox', :timeout => 15 }]) Where the configuration variables are defined: :host String the host where the server can be found :port Number the port to which the client should connect :initial Array OPTIONAL - an array of INITIAL hashes INITIAL HASHES: :app_id String the id used to send messages with this certification can be a totally arbitrary value :cert String a path to the certification or the certification file as a string :env String the environment to connect to apple with, always either 'sandbox' or 'production' :timoeut Number The timeout for the server to use when connecting to the apple servers :section: PYAPNS::Notification An APNS Notification You can construct notification objects ahead of time by using this class. However unnecessary, it allows you to programmatically generate a Notification like so: note = PYAPNS::Notification.new 'alert text', 9, 'flynn.caf', {:extra => 'guid'} -- or -- note = PYAPNS::Notification.new 'alert text' These can be passed to `PYAPNS::Client#notify` the same as hashes
In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It provides near-constant-time operations (bounded by the inverse Ackermann function) to add new sets, to merge existing sets, and to determine whether elements are in the same set. In addition to many other uses (see the Applications section), disjoint-sets play a key role in Kruskal's algorithm for finding the minimum spanning tree of a graph. A disjoint-set forest consists of a number of elements each of which stores an id, a parent pointer, and, in efficient algorithms, a value called the "rank". The parent pointers of elements are arranged to form one or more trees, each representing a set. If an element's parent pointer points to no other element, then the element is the root of a tree and is the representative member of its set. A set may consist of only a single element. However, if the element has a parent, the element is part of whatever set is identified by following the chain of parents upwards until a representative element (one without a parent) is reached at the root of the tree. Forests can be represented compactly in memory as arrays in which parents are indicated by their array index. Disjoint-set data structures model the partitioning of a set, for example to keep track of the connected components of an undirected graph. This model can then be used to determine whether two vertices belong to the same component, or whether adding an edge between them would result in a cycle. The Union–Find algorithm is used in high-performance implementations of unification. This data structure is used by the Boost Graph Library to implement its Incremental Connected Components functionality. It is also a key component in implementing Kruskal's algorithm to find the minimum spanning tree of a graph. Note that the implementation as disjoint-set forests doesn't allow the deletion of edges, even without path compression or the rank heuristic. Sharir and Agarwal report connections between the worst-case behavior of disjoint-sets and the length of Davenport–Schinzel sequences, a combinatorial structure from computational geometry.
ALPHA Alert -- just uploaded initial release. Linux inotify is a means to receive events describing file system activity (create, modify, delete, close, etc). Sinotify was derived from aredridel's package (http://raa.ruby-lang.org/project/ruby-inotify/), with the addition of Paul Boon's tweak for making the event_check thread more polite (see http://www.mindbucket.com/2009/02/24/ruby-daemons-verifying-good-behavior/) In sinotify, the classes Sinotify::PrimNotifier and Sinotify::PrimEvent provide a low level wrapper to inotify, with the ability to establish 'watches' and then listen for inotify events using one of inotify's synchronous event loops, and providing access to the events' masks (see 'man inotify' for details). Sinotify::PrimEvent class adds a little semantic sugar to the event in to the form of 'etypes', which are just ruby symbols that describe the event mask. If the event has a raw mask of (DELETE_SELF & IS_DIR), then the etypes array would be [:delete_self, :is_dir]. In addition to the 'straight' wrapper in inotify, sinotify provides an asynchronous implementation of the 'observer pattern' for notification. In other words, Sinotify::Notifier listens in the background for inotify events, adapting them into instances of Sinotify::Event as they come in and immediately placing them in a concurrent queue, from which they are 'announced' to 'subscribers' of the event. [Sinotify uses the 'cosell' implementation of the Announcements event notification framework, hence the terminology 'subscribe' and 'announce' rather then 'listen' and 'trigger' used in the standard event observer pattern. See the 'cosell' package on github for details.] A variety of 'knobs' are provided for controlling the behavior of the notifier: whether a watch should apply to a single directory or should recurse into subdirectores, how fast it should broadcast queued events, etc (see Sinotify::Notifier, and the example in the synopsis section below). An event 'spy' can also be setup to log all Sinotify::PrimEvents and Sinotify::Events. Sinotify::Event simplifies inotify's muddled event model, sending events only for those files/directories that have changed. That's not to say you can't setup a notifier that recurses into subdirectories, just that any individual event will apply to a single file, and not to its children. Also, event types are identified using words (in the form of ruby :symbols) instead of inotify's event masks. See Sinotify::Event for more explanation. The README for inotify: http://www.kernel.org/pub/linux/kernel/people/rml/inotify/README Selected quotes from the README for inotify: * "Rumor is that the 'd' in 'dnotify' does not stand for 'directory' but for 'suck.'" * "The 'i' in inotify does not stand for 'suck' but for 'inode' -- the logical choice since inotify is inode-based." (The 's' in 'sinotify' does in fact stand for 'suck.')
ALPHA Alert -- just uploaded initial release. Linux inotify is a means to receive events describing file system activity (create, modify, delete, close, etc). Sinotify was derived from aredridel's package (http://raa.ruby-lang.org/project/ruby-inotify/), with the addition of Paul Boon's tweak for making the event_check thread more polite (see http://www.mindbucket.com/2009/02/24/ruby-daemons-verifying-good-behavior/) In sinotify, the classes Sinotify::PrimNotifier and Sinotify::PrimEvent provide a low level wrapper to inotify, with the ability to establish 'watches' and then listen for inotify events using one of inotify's synchronous event loops, and providing access to the events' masks (see 'man inotify' for details). Sinotify::PrimEvent class adds a little semantic sugar to the event in to the form of 'etypes', which are just ruby symbols that describe the event mask. If the event has a raw mask of (DELETE_SELF & IS_DIR), then the etypes array would be [:delete_self, :is_dir]. In addition to the 'straight' wrapper in inotify, sinotify provides an asynchronous implementation of the 'observer pattern' for notification. In other words, Sinotify::Notifier listens in the background for inotify events, adapting them into instances of Sinotify::Event as they come in and immediately placing them in a concurrent queue, from which they are 'announced' to 'subscribers' of the event. [Sinotify uses the 'cosell' implementation of the Announcements event notification framework, hence the terminology 'subscribe' and 'announce' rather then 'listen' and 'trigger' used in the standard event observer pattern. See the 'cosell' package on github for details.] A variety of 'knobs' are provided for controlling the behavior of the notifier: whether a watch should apply to a single directory or should recurse into subdirectores, how fast it should broadcast queued events, etc (see Sinotify::Notifier, and the example in the synopsis section below). An event 'spy' can also be setup to log all Sinotify::PrimEvents and Sinotify::Events. Sinotify::Event simplifies inotify's muddled event model, sending events only for those files/directories that have changed. That's not to say you can't setup a notifier that recurses into subdirectories, just that any individual event will apply to a single file, and not to its children. Also, event types are identified using words (in the form of ruby :symbols) instead of inotify's event masks. See Sinotify::Event for more explanation. The README for inotify: http://www.kernel.org/pub/linux/kernel/people/rml/inotify/README Selected quotes from the README for inotify: * "Rumor is that the 'd' in 'dnotify' does not stand for 'directory' but for 'suck.'" * "The 'i' in inotify does not stand for 'suck' but for 'inode' -- the logical choice since inotify is inode-based." (The 's' in 'sinotify' does in fact stand for 'suck.')
# Overview This guide documents the InsightVM Application Programming Interface (API) Version 3. This API supports the Representation State Transfer (REST) design pattern. Unless noted otherwise this API accepts and produces the `application/json` media type. This API uses Hypermedia as the Engine of Application State (HATEOAS) and is hypermedia friendly. All API connections must be made to the security console using HTTPS. ## Versioning Versioning is specified in the URL and the base path of this API is: `https://<host>:<port>/api/3/`. ## Specification An <a target="_blank" href="https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md">OpenAPI v2</a> specification (also known as Swagger 2) of this API is available. Tools such as <a target="_blank" href="https://github.com/swagger-api/swagger-codegen">swagger-codegen</a> can be used to generate an API client in the language of your choosing using this specification document. <p class="openapi">Download the specification: <a class="openapi-button" target="_blank" download="" href="/api/3/json"> Download </a></p> ## Authentication Authorization to the API uses HTTP Basic Authorization (see <a target="_blank" href="https://www.ietf.org/rfc/rfc2617.txt">RFC 2617</a> for more information). Requests must supply authorization credentials in the `Authorization` header using a Base64 encoded hash of `"username:password"`. <!-- ReDoc-Inject: <security-definitions> --> ### 2FA This API supports two-factor authentication (2FA) by supplying an authentication token in addition to the Basic Authorization. The token is specified using the `Token` request header. To leverage two-factor authentication, this must be enabled on the console and be configured for the account accessing the API. ## Resources ### Naming Resource names represent nouns and identify the entity being manipulated or accessed. All collection resources are pluralized to indicate to the client they are interacting with a collection of multiple resources of the same type. Singular resource names are used when there exists only one resource available to interact with. The following naming conventions are used by this API: | Type | Case | | --------------------------------------------- | ------------------------ | | Resource names | `lower_snake_case` | | Header, body, and query parameters parameters | `camelCase` | | JSON fields and property names | `camelCase` | #### Collections A collection resource is a parent resource for instance resources, but can itself be retrieved and operated on independently. Collection resources use a pluralized resource name. The resource path for collection resources follow the convention: ``` /api/3/{resource_name} ``` #### Instances An instance resource is a "leaf" level resource that may be retrieved, optionally nested within a collection resource. Instance resources are usually retrievable with opaque identifiers. The resource path for instance resources follows the convention: ``` /api/3/{resource_name}/{instance_id}... ``` ## Verbs The following HTTP operations are supported throughout this API. The general usage of the operation and both its failure and success status codes are outlined below. | Verb | Usage | Success | Failure | | --------- | ------------------------------------------------------------------------------------- | ----------- | -------------------------------------------------------------- | | `GET` | Used to retrieve a resource by identifier, or a collection of resources by type. | `200` | `400`, `401`, `402`, `404`, `405`, `408`, `410`, `415`, `500` | | `POST` | Creates a resource with an application-specified identifier. | `201` | `400`, `401`, `404`, `405`, `408`, `413`, `415`, `500` | | `POST` | Performs a request to queue an asynchronous job. | `202` | `400`, `401`, `405`, `408`, `410`, `413`, `415`, `500` | | `PUT` | Creates a resource with a client-specified identifier. | `200` | `400`, `401`, `403`, `405`, `408`, `410`, `413`, `415`, `500` | | `PUT` | Performs a full update of a resource with a specified identifier. | `201` | `400`, `401`, `403`, `405`, `408`, `410`, `413`, `415`, `500` | | `DELETE` | Deletes a resource by identifier or an entire collection of resources. | `204` | `400`, `401`, `405`, `408`, `410`, `413`, `415`, `500` | | `OPTIONS` | Requests what operations are available on a resource. | `200` | `401`, `404`, `405`, `408`, `500` | ### Common Operations #### OPTIONS All resources respond to the `OPTIONS` request, which allows discoverability of available operations that are supported. The `OPTIONS` response returns the acceptable HTTP operations on that resource within the `Allow` header. The response is always a `200 OK` status. ### Collection Resources Collection resources can support the `GET`, `POST`, `PUT`, and `DELETE` operations. #### GET The `GET` operation invoked on a collection resource indicates a request to retrieve all, or some, of the entities contained within the collection. This also includes the optional capability to filter or search resources during the request. The response from a collection listing is a paginated document. See [hypermedia links](#section/Overview/Paging) for more information. #### POST The `POST` is a non-idempotent operation that allows for the creation of a new resource when the resource identifier is not provided by the system during the creation operation (i.e. the Security Console generates the identifier). The content of the `POST` request is sent in the request body. The response to a successful `POST` request should be a `201 CREATED` with a valid `Location` header field set to the URI that can be used to access to the newly created resource. The `POST` to a collection resource can also be used to interact with asynchronous resources. In this situation, instead of a `201 CREATED` response, the `202 ACCEPTED` response indicates that processing of the request is not fully complete but has been accepted for future processing. This request will respond similarly with a `Location` header with link to the job-oriented asynchronous resource that was created and/or queued. #### PUT The `PUT` is an idempotent operation that either performs a create with user-supplied identity, or a full replace or update of a resource by a known identifier. The response to a `PUT` operation to create an entity is a `201 Created` with a valid `Location` header field set to the URI that can be used to access to the newly created resource. `PUT` on a collection resource replaces all values in the collection. The typical response to a `PUT` operation that updates an entity is hypermedia links, which may link to related resources caused by the side-effects of the changes performed. #### DELETE The `DELETE` is an idempotent operation that physically deletes a resource, or removes an association between resources. The typical response to a `DELETE` operation is hypermedia links, which may link to related resources caused by the side-effects of the changes performed. ### Instance Resources Instance resources can support the `GET`, `PUT`, `POST`, `PATCH` and `DELETE` operations. #### GET Retrieves the details of a specific resource by its identifier. The details retrieved can be controlled through property selection and property views. The content of the resource is returned within the body of the response in the acceptable media type. #### PUT Allows for and idempotent "full update" (complete replacement) on a specific resource. If the resource does not exist, it will be created; if it does exist, it is completely overwritten. Any omitted properties in the request are assumed to be undefined/null. For "partial updates" use `POST` or `PATCH` instead. The content of the `PUT` request is sent in the request body. The identifier of the resource is specified within the URL (not the request body). The response to a successful `PUT` request is a `201 CREATED` to represent the created status, with a valid `Location` header field set to the URI that can be used to access to the newly created (or fully replaced) resource. #### POST Performs a non-idempotent creation of a new resource. The `POST` of an instance resource most commonly occurs with the use of nested resources (e.g. searching on a parent collection resource). The response to a `POST` of an instance resource is typically a `200 OK` if the resource is non-persistent, and a `201 CREATED` if there is a resource created/persisted as a result of the operation. This varies by endpoint. #### PATCH The `PATCH` operation is used to perform a partial update of a resource. `PATCH` is a non-idempotent operation that enforces an atomic mutation of a resource. Only the properties specified in the request are to be overwritten on the resource it is applied to. If a property is missing, it is assumed to not have changed. #### DELETE Permanently removes the individual resource from the system. If the resource is an association between resources, only the association is removed, not the resources themselves. A successful deletion of the resource should return `204 NO CONTENT` with no response body. This operation is not fully idempotent, as follow-up requests to delete a non-existent resource should return a `404 NOT FOUND`. ## Requests Unless otherwise indicated, the default request body media type is `application/json`. ### Headers Commonly used request headers include: | Header | Example | Purpose | | ------------------ | --------------------------------------------- | ---------------------------------------------------------------------------------------------- | | `Accept` | `application/json` | Defines what acceptable content types are allowed by the client. For all types, use `*/*`. | | `Accept-Encoding` | `deflate, gzip` | Allows for the encoding to be specified (such as gzip). | | `Accept-Language` | `en-US` | Indicates to the server the client's locale (defaults `en-US`). | | `Authorization ` | `Basic Base64("username:password")` | Basic authentication | | `Token ` | `123456` | Two-factor authentication token (if enabled) | ### Dates & Times Dates and/or times are specified as strings in the ISO 8601 format(s). The following formats are supported as input: | Value | Format | Notes | | --------------------------- | ------------------------------------------------------ | ----------------------------------------------------- | | Date | YYYY-MM-DD | Defaults to 12 am UTC (if used for a date & time | | Date & time only | YYYY-MM-DD'T'hh:mm:ss[.nnn] | Defaults to UTC | | Date & time in UTC | YYYY-MM-DD'T'hh:mm:ss[.nnn]Z | | | Date & time w/ offset | YYYY-MM-DD'T'hh:mm:ss[.nnn][+|-]hh:mm | | | Date & time w/ zone-offset | YYYY-MM-DD'T'hh:mm:ss[.nnn][+|-]hh:mm[<zone-id>] | | ### Timezones Timezones are specified in the regional zone format, such as `"America/Los_Angeles"`, `"Asia/Tokyo"`, or `"GMT"`. ### Paging Pagination is supported on certain collection resources using a combination of two query parameters, `page` and `size`. As these are control parameters, they are prefixed with the underscore character. The page parameter dictates the zero-based index of the page to retrieve, and the `size` indicates the size of the page. For example, `/resources?page=2&size=10` will return page 3, with 10 records per page, giving results 21-30. The maximum page size for a request is 500. ### Sorting Sorting is supported on paginated resources with the `sort` query parameter(s). The sort query parameter(s) supports identifying a single or multi-property sort with a single or multi-direction output. The format of the parameter is: ``` sort=property[,ASC|DESC]... ``` Therefore, the request `/resources?sort=name,title,DESC` would return the results sorted by the name and title descending, in that order. The sort directions are either ascending `ASC` or descending `DESC`. With single-order sorting, all properties are sorted in the same direction. To sort the results with varying orders by property, multiple sort parameters are passed. For example, the request `/resources?sort=name,ASC&sort=title,DESC` would sort by name ascending and title descending, in that order. ## Responses The following response statuses may be returned by this API. | Status | Meaning | Usage | | ------ | ------------------------ |------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `200` | OK | The operation performed without error according to the specification of the request, and no more specific 2xx code is suitable. | | `201` | Created | A create request has been fulfilled and a resource has been created. The resource is available as the URI specified in the response, including the `Location` header. | | `202` | Accepted | An asynchronous task has been accepted, but not guaranteed, to be processed in the future. | | `400` | Bad Request | The request was invalid or cannot be otherwise served. The request is not likely to succeed in the future without modifications. | | `401` | Unauthorized | The user is unauthorized to perform the operation requested, or does not maintain permissions to perform the operation on the resource specified. | | `403` | Forbidden | The resource exists to which the user has access, but the operating requested is not permitted. | | `404` | Not Found | The resource specified could not be located, does not exist, or an unauthenticated client does not have permissions to a resource. | | `405` | Method Not Allowed | The operations may not be performed on the specific resource. Allowed operations are returned and may be performed on the resource. | | `408` | Request Timeout | The client has failed to complete a request in a timely manner and the request has been discarded. | | `413` | Request Entity Too Large | The request being provided is too large for the server to accept processing. | | `415` | Unsupported Media Type | The media type is not supported for the requested resource. | | `500` | Internal Server Error | An internal and unexpected error has occurred on the server at no fault of the client. | ### Security The response statuses 401, 403 and 404 need special consideration for security purposes. As necessary, error statuses and messages may be obscured to strengthen security and prevent information exposure. The following is a guideline for privileged resource response statuses: | Use Case | Access | Resource | Permission | Status | | ------------------------------------------------------------------ | ------------------ |------------------- | ------------ | ------------ | | Unauthenticated access to an unauthenticated resource. | Unauthenticated | Unauthenticated | Yes | `20x` | | Unauthenticated access to an authenticated resource. | Unauthenticated | Authenticated | No | `401` | | Unauthenticated access to an authenticated resource. | Unauthenticated | Non-existent | No | `401` | | Authenticated access to a unauthenticated resource. | Authenticated | Unauthenticated | Yes | `20x` | | Authenticated access to an authenticated, unprivileged resource. | Authenticated | Authenticated | No | `404` | | Authenticated access to an authenticated, privileged resource. | Authenticated | Authenticated | Yes | `20x` | | Authenticated access to an authenticated, non-existent resource | Authenticated | Non-existent | Yes | `404` | ### Headers Commonly used response headers include: | Header | Example | Purpose | | -------------------------- | --------------------------------- | --------------------------------------------------------------- | | `Allow` | `OPTIONS, GET` | Defines the allowable HTTP operations on a resource. | | `Cache-Control` | `no-store, must-revalidate` | Disables caching of resources (as they are all dynamic). | | `Content-Encoding` | `gzip` | The encoding of the response body (if any). | | `Location` | | Refers to the URI of the resource created by a request. | | `Transfer-Encoding` | `chunked` | Specified the encoding used to transform response. | | `Retry-After` | 5000 | Indicates the time to wait before retrying a request. | | `X-Content-Type-Options` | `nosniff` | Disables MIME type sniffing. | | `X-XSS-Protection` | `1; mode=block` | Enables XSS filter protection. | | `X-Frame-Options` | `SAMEORIGIN` | Prevents rendering in a frame from a different origin. | | `X-UA-Compatible` | `IE=edge,chrome=1` | Specifies the browser mode to render in. | ### Format When `application/json` is returned in the response body it is always pretty-printed (indented, human readable output). Additionally, gzip compression/encoding is supported on all responses. #### Dates & Times Dates or times are returned as strings in the ISO 8601 'extended' format. When a date and time is returned (instant) the value is converted to UTC. For example: | Value | Format | Example | | --------------- | ------------------------------ | --------------------- | | Date | `YYYY-MM-DD` | 2017-12-03 | | Date & Time | `YYYY-MM-DD'T'hh:mm:ss[.nnn]Z` | 2017-12-03T10:15:30Z | #### Content In some resources a Content data type is used. This allows for multiple formats of representation to be returned within resource, specifically `"html"` and `"text"`. The `"text"` property returns a flattened representation suitable for output in textual displays. The `"html"` property returns an HTML fragment suitable for display within an HTML element. Note, the HTML returned is not a valid stand-alone HTML document. #### Paging The response to a paginated request follows the format: ```json { resources": [ ... ], "page": { "number" : ..., "size" : ..., "totalResources" : ..., "totalPages" : ... }, "links": [ "first" : { "href" : "..." }, "prev" : { "href" : "..." }, "self" : { "href" : "..." }, "next" : { "href" : "..." }, "last" : { "href" : "..." } ] } ``` The `resources` property is an array of the resources being retrieved from the endpoint, each which should contain at minimum a "self" relation hypermedia link. The `page` property outlines the details of the current page and total possible pages. The object for the page includes the following properties: - number - The page number (zero-based) of the page returned. - size - The size of the pages, which is less than or equal to the maximum page size. - totalResources - The total amount of resources available across all pages. - totalPages - The total amount of pages. The last property of the paged response is the `links` array, which contains all available hypermedia links. For paginated responses, the "self", "next", "previous", "first", and "last" links are returned. The "self" link must always be returned and should contain a link to allow the client to replicate the original request against the collection resource in an identical manner to that in which it was invoked. The "next" and "previous" links are present if either or both there exists a previous or next page, respectively. The "next" and "previous" links have hrefs that allow "natural movement" to the next page, that is all parameters required to move the next page are provided in the link. The "first" and "last" links provide references to the first and last pages respectively. Requests outside the boundaries of the pageable will result in a `404 NOT FOUND`. Paginated requests do not provide a "stateful cursor" to the client, nor does it need to provide a read consistent view. Records in adjacent pages may change while pagination is being traversed, and the total number of pages and resources may change between requests within the same filtered/queries resource collection. #### Property Views The "depth" of the response of a resource can be configured using a "view". All endpoints supports two views that can tune the extent of the information returned in the resource. The supported views are `summary` and `details` (the default). View are specified using a query parameter, in this format: ```bash /<resource>?view={viewName} ``` #### Error Any error responses can provide a response body with a message to the client indicating more information (if applicable) to aid debugging of the error. All 40x and 50x responses will return an error response in the body. The format of the response is as follows: ```json { "status": <statusCode>, "message": <message>, "links" : [ { "rel" : "...", "href" : "..." } ] } ``` The `status` property is the same as the HTTP status returned in the response, to ease client parsing. The message property is a localized message in the request client's locale (if applicable) that articulates the nature of the error. The last property is the `links` property. This may contain additional [hypermedia links](#section/Overview/Authentication) to troubleshoot. #### Search Criteria <a section="section/Responses/SearchCriteria"></a> Multiple resources make use of search criteria to match assets. Search criteria is an array of search filters. Each search filter has a generic format of: ```json { "field": "<field-name>", "operator": "<operator>", ["value": "<value>",] ["lower": "<value>",] ["upper": "<value>"] } ``` Every filter defines two required properties `field` and `operator`. The field is the name of an asset property that is being filtered on. The operator is a type and property-specific operating performed on the filtered property. The valid values for fields and operators are outlined in the table below. Every filter also defines one or more values that are supplied to the operator. The valid values vary by operator and are outlined below. ##### Fields The following table outlines the search criteria fields and the available operators: | Field | Operators | | --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ | | `alternate-address-type` | `in` | | `container-image` | `is` ` is-not` ` starts-with` ` ends-with` ` contains` ` does-not-contain` ` is-like` ` not-like` | | `container-status` | `is` ` is-not` | | `containers` | `are` | | `criticality-tag` | `is` ` is-not` ` is-greater-than` ` is-less-than` ` is-applied` ` is-not-applied` | | `custom-tag` | `is` ` is-not` ` starts-with` ` ends-with` ` contains` ` does-not-contain` ` is-applied` ` is-not-applied` | | `cve` | `is` ` is-not` ` contains` ` does-not-contain` | | `cvss-access-complexity` | `is` ` is-not` | | `cvss-authentication-required` | `is` ` is-not` | | `cvss-access-vector` | `is` ` is-not` | | `cvss-availability-impact` | `is` ` is-not` | | `cvss-confidentiality-impact` | `is` ` is-not` | | `cvss-integrity-impact` | `is` ` is-not` | | `cvss-v3-confidentiality-impact` | `is` ` is-not` | | `cvss-v3-integrity-impact` | `is` ` is-not` | | `cvss-v3-availability-impact` | `is` ` is-not` | | `cvss-v3-attack-vector` | `is` ` is-not` | | `cvss-v3-attack-complexity` | `is` ` is-not` | | `cvss-v3-user-interaction` | `is` ` is-not` | | `cvss-v3-privileges-required` | `is` ` is-not` | | `host-name` | `is` ` is-not` ` starts-with` ` ends-with` ` contains` ` does-not-contain` ` is-empty` ` is-not-empty` ` is-like` ` not-like` | | `host-type` | `in` ` not-in` | | `ip-address` | `is` ` is-not` ` in-range` ` not-in-range` ` is-like` ` not-like` | | `ip-address-type` | `in` ` not-in` | | `last-scan-date` | `is-on-or-before` ` is-on-or-after` ` is-between` ` is-earlier-than` ` is-within-the-last` | | `location-tag` | `is` ` is-not` ` starts-with` ` ends-with` ` contains` ` does-not-contain` ` is-applied` ` is-not-applied` | | `mobile-device-last-sync-time` | `is-within-the-last` ` is-earlier-than` | | `open-ports` | `is` ` is-not` ` in-range` | | `operating-system` | `contains` ` does-not-contain` ` is-empty` ` is-not-empty` | | `owner-tag` | `is` ` is-not` ` starts-with` ` ends-with` ` contains` ` does-not-contain` ` is-applied` ` is-not-applied` | | `pci-compliance` | `is` | | `risk-score` | `is` ` is-not` ` in-range` ` greater-than` ` less-than` | | `service-name` | `contains` ` does-not-contain` | | `site-id` | `in` ` not-in` | | `software` | `contains` ` does-not-contain` | | `vAsset-cluster` | `is` ` is-not` ` contains` ` does-not-contain` ` starts-with` | | `vAsset-datacenter` | `is` ` is-not` | | `vAsset-host-name` | `is` ` is-not` ` contains` ` does-not-contain` ` starts-with` | | `vAsset-power-state` | `in` ` not-in` | | `vAsset-resource-pool-path` | `contains` ` does-not-contain` | | `vulnerability-assessed` | `is-on-or-before` ` is-on-or-after` ` is-between` ` is-earlier-than` ` is-within-the-last` | | `vulnerability-category` | `is` ` is-not` ` starts-with` ` ends-with` ` contains` ` does-not-contain` | | `vulnerability-cvss-v3-score` | `is` ` is-not` | | `vulnerability-cvss-score` | `is` ` is-not` ` in-range` ` is-greater-than` ` is-less-than` | | `vulnerability-exposures` | `includes` ` does-not-include` | | `vulnerability-title` | `contains` ` does-not-contain` ` is` ` is-not` ` starts-with` ` ends-with` | | `vulnerability-validated-status` | `are` | ##### Enumerated Properties The following fields have enumerated values: | Field | Acceptable Values | | ----------------------------------------- | ------------------------------------------------------------------------------------------------------------- | | `alternate-address-type` | 0=IPv4, 1=IPv6 | | `containers` | 0=present, 1=not present | | `container-status` | `created` `running` `paused` `restarting` `exited` `dead` `unknown` | | `cvss-access-complexity` | <ul><li><code>L</code> = Low</li><li><code>M</code> = Medium</li><li><code>H</code> = High</li></ul> | | `cvss-integrity-impact` | <ul><li><code>N</code> = None</li><li><code>P</code> = Partial</li><li><code>C</code> = Complete</li></ul> | | `cvss-confidentiality-impact` | <ul><li><code>N</code> = None</li><li><code>P</code> = Partial</li><li><code>C</code> = Complete</li></ul> | | `cvss-availability-impact` | <ul><li><code>N</code> = None</li><li><code>P</code> = Partial</li><li><code>C</code> = Complete</li></ul> | | `cvss-access-vector` | <ul><li><code>L</code> = Local</li><li><code>A</code> = Adjacent</li><li><code>N</code> = Network</li></ul> | | `cvss-authentication-required` | <ul><li><code>N</code> = None</li><li><code>S</code> = Single</li><li><code>M</code> = Multiple</li></ul> | | `cvss-v3-confidentiality-impact` | <ul><li><code>L</code> = Local</li><li><code>L</code> = Low</li><li><code>N</code> = None</li><li><code>H</code> = High</li></ul> | | `cvss-v3-integrity-impact` | <ul><li><code>L</code> = Local</li><li><code>L</code> = Low</li><li><code>N</code> = None</li><li><code>H</code> = High</li></ul> | | `cvss-v3-availability-impact` | <ul><li><code>N</code> = None</li><li><code>L</code> = Low</li><li><code>H</code> = High</li></ul> | | `cvss-v3-attack-vector` | <ul><li><code>N</code> = Network</li><li><code>A</code> = Adjacent</li><li><code>L</code> = Local</li><li><code>P</code> = Physical</li></ul> | | `cvss-v3-attack-complexity` | <ul><li><code>L</code> = Low</li><li><code>H</code> = High</li></ul> | | `cvss-v3-user-interaction` | <ul><li><code>N</code> = None</li><li><code>R</code> = Required</li></ul> | | `cvss-v3-privileges-required` | <ul><li><code>N</code> = None</li><li><code>L</code> = Low</li><li><code>H</code> = High</li></ul> | | `host-type` | 0=Unknown, 1=Guest, 2=Hypervisor, 3=Physical, 4=Mobile | | `ip-address-type` | 0=IPv4, 1=IPv6 | | `pci-compliance` | 0=fail, 1=pass | | `vulnerability-validated-status` | 0=present, 1=not present | ##### Operator Properties <a section="section/Responses/SearchCriteria/OperatorProperties"></a> The following table outlines which properties are required for each operator and the appropriate data type(s): | Operator | `value` | `lower` | `upper` | | ----------------------|-----------------------|-----------------------|-----------------------| | `are` | `string` | | | | `contains` | `string` | | | | `does-not-contain` | `string` | | | | `ends with` | `string` | | | | `in` | `Array[ string ]` | | | | `in-range` | | `numeric` | `numeric` | | `includes` | `Array[ string ]` | | | | `is` | `string` | | | | `is-applied` | | | | | `is-between` | | `numeric` | `numeric` | | `is-earlier-than` | `numeric` | | | | `is-empty` | | | | | `is-greater-than` | `numeric` | | | | `is-on-or-after` | `string` (yyyy-MM-dd) | | | | `is-on-or-before` | `string` (yyyy-MM-dd) | | | | `is-not` | `string` | | | | `is-not-applied` | | | | | `is-not-empty` | | | | | `is-within-the-last` | `numeric` | | | | `less-than` | `string` | | | | `like` | `string` | | | | `not-contains` | `string` | | | | `not-in` | `Array[ string ]` | | | | `not-in-range` | | `numeric` | `numeric` | | `not-like` | `string` | | | | `starts-with` | `string` | | | #### Discovery Connection Search Criteria <a section="section/Responses/DiscoverySearchCriteria"></a> Dynamic sites make use of search criteria to match assets from a discovery connection. Search criteria is an array of search filters. Each search filter has a generic format of: ```json { "field": "<field-name>", "operator": "<operator>", ["value": "<value>",] ["lower": "<value>",] ["upper": "<value>"] } ``` Every filter defines two required properties `field` and `operator`. The field is the name of an asset property that is being filtered on. The list of supported fields vary depending on the type of discovery connection configured for the dynamic site (e.g vSphere, ActiveSync, etc.). The operator is a type and property-specific operating performed on the filtered property. The valid values for fields outlined in the tables below and are grouped by the type of connection. Every filter also defines one or more values that are supplied to the operator. See <a href="#section/Responses/SearchCriteria/OperatorProperties">Search Criteria Operator Properties</a> for more information on the valid values for each operator. ##### Fields (ActiveSync) This section documents search criteria information for ActiveSync discovery connections. The discovery connections must be one of the following types: `"activesync-ldap"`, `"activesync-office365"`, or `"activesync-powershell"`. The following table outlines the search criteria fields and the available operators for ActiveSync connections: | Field | Operators | | --------------------------------- | ------------------------------------------------------------- | | `last-sync-time` | `is-within-the-last` ` is-earlier-than` | | `operating-system` | `contains` ` does-not-contain` | | `user` | `is` ` is-not` ` contains` ` does-not-contain` ` starts-with` | ##### Fields (AWS) This section documents search criteria information for AWS discovery connections. The discovery connections must be the type `"aws"`. The following table outlines the search criteria fields and the available operators for AWS connections: | Field | Operators | | ----------------------- | ------------------------------------------------------------- | | `availability-zone` | `contains` ` does-not-contain` | | `guest-os-family` | `contains` ` does-not-contain` | | `instance-id` | `contains` ` does-not-contain` | | `instance-name` | `is` ` is-not` ` contains` ` does-not-contain` ` starts-with` | | `instance-state` | `in` ` not-in` | | `instance-type` | `in` ` not-in` | | `ip-address` | `in-range` ` not-in-range` ` is` ` is-not` | | `region` | `in` ` not-in` | | `vpc-id` | `is` ` is-not` ` contains` ` does-not-contain` ` starts-with` | ##### Fields (DHCP) This section documents search criteria information for DHCP discovery connections. The discovery connections must be the type `"dhcp"`. The following table outlines the search criteria fields and the available operators for DHCP connections: | Field | Operators | | --------------- | ------------------------------------------------------------- | | `host-name` | `is` ` is-not` ` contains` ` does-not-contain` ` starts-with` | | `ip-address` | `in-range` ` not-in-range` ` is` ` is-not` | | `mac-address` | `is` ` is-not` ` contains` ` does-not-contain` ` starts-with` | ##### Fields (Sonar) This section documents search criteria information for Sonar discovery connections. The discovery connections must be the type `"sonar"`. The following table outlines the search criteria fields and the available operators for Sonar connections: | Field | Operators | | ------------------- | -------------------- | | `search-domain` | `contains` ` is` | | `ip-address` | `in-range` ` is` | | `sonar-scan-date` | `is-within-the-last` | ##### Fields (vSphere) This section documents search criteria information for vSphere discovery connections. The discovery connections must be the type `"vsphere"`. The following table outlines the search criteria fields and the available operators for vSphere connections: | Field | Operators | | -------------------- | ------------------------------------------------------------------------------------------ | | `cluster` | `is` ` is-not` ` contains` ` does-not-contain` ` starts-with` | | `data-center` | `is` ` is-not` | | `discovered-time` | `is-on-or-before` ` is-on-or-after` ` is-between` ` is-earlier-than` ` is-within-the-last` | | `guest-os-family` | `contains` ` does-not-contain` | | `host-name` | `is` ` is-not` ` contains` ` does-not-contain` ` starts-with` | | `ip-address` | `in-range` ` not-in-range` ` is` ` is-not` | | `power-state` | `in` ` not-in` | | `resource-pool-path` | `contains` ` does-not-contain` | | `last-time-seen` | `is-on-or-before` ` is-on-or-after` ` is-between` ` is-earlier-than` ` is-within-the-last` | | `vm` | `is` ` is-not` ` contains` ` does-not-contain` ` starts-with` | ##### Enumerated Properties (vSphere) The following fields have enumerated values: | Field | Acceptable Values | | ------------- | ------------------------------------ | | `power-state` | `poweredOn` `poweredOff` `suspended` | ## HATEOAS This API follows Hypermedia as the Engine of Application State (HATEOAS) principals and is therefore hypermedia friendly. Hyperlinks are returned in the `links` property of any given resource and contain a fully-qualified hyperlink to the corresponding resource. The format of the hypermedia link adheres to both the <a target="_blank" href="http://jsonapi.org">{json:api} v1</a> <a target="_blank" href="http://jsonapi.org/format/#document-links">"Link Object"</a> and <a target="_blank" href="http://json-schema.org/latest/json-schema-hypermedia.html">JSON Hyper-Schema</a> <a target="_blank" href="http://json-schema.org/latest/json-schema-hypermedia.html#rfc.section.5.2">"Link Description Object"</a> formats. For example: ```json "links": [{ "rel": "<relation>", "href": "<href>" ... }] ``` Where appropriate link objects may also contain additional properties than the `rel` and `href` properties, such as `id`, `type`, etc. See the [Root](#tag/Root) resources for the entry points into API discovery.
README ====== This is a simple API to evaluate information retrieval results. It allows you to load ranked and unranked query results and calculate various evaluation metrics (precision, recall, MAP, kappa) against a previously loaded gold standard. Start this program from the command line with: retreval -l <gold-standard-file> -q <query-results> -f <format> -o <output-prefix> The options are outlined when you pass no arguments and just call retreval You will find further information in the RDOC documentation and the HOWTO section below. If you want to see an example, use this command: retreval -l example/gold_standard.yml -q example/query_results.yml -f yaml -v INSTALLATION ============ If you have RubyGems, just run gem install retreval You can manually download the sources and build the Gem from there by `cd`ing to the folder where this README is saved and calling gem build retreval.gemspec This will create a gem file called which you just have to install with `gem install <file>` and you're done. HOWTO ===== This API supports the following evaluation tasks: - Loading a Gold Standard that takes a set of documents, queries and corresponding judgements of relevancy (i.e. "Is this document relevant for this query?") - Calculation of the _kappa measure_ for the given gold standard - Loading ranked or unranked query results for a certain query - Calculation of _precision_ and _recall_ for each result - Calculation of the _F-measure_ for weighing precision and recall - Calculation of _mean average precision_ for multiple query results - Calculation of the _11-point precision_ and _average precision_ for ranked query results - Printing of summary tables and results Typically, you will want to use this Gem either standalone or within another application's context. Standalone Usage ================ Call parameters --------------- After installing the Gem (see INSTALLATION), you can always call `retreval` from the commandline. The typical call is: retreval -l <gold-standard-file> -q <query-results> -f <format> -o <output-prefix> Where you have to define the following options: - `gold-standard-file` is a file in a specified format that includes all the judgements - `query-results` is a file in a specified format that includes all the query results in a single file - `format` is the format that the files will use (either "yaml" or "plain") - `output-prefix` is the prefix of output files that will be created Formats ------- Right now, we focus on the formats you can use to load data into the API. Currently, we support YAML files that must adhere to a special syntax. So, in order to load a gold standard, we need a file in the following format: * "query" denotes the query * "documents" these are the documents judged for this query * "id" the ID of the document (e.g. its filename, etc.) * "judgements" an array of judgements, each one with: * "relevant" a boolean value of the judgment (relevant or not) * "user" an optional identifier of the user Example file, with one query, two documents, and one judgement: - query: 12th air force germany 1957 documents: - id: g5701s.ict21311 judgements: [] - id: g5701s.ict21313 judgements: - relevant: false user: 2 So, when calling the program, specify the format as `yaml`. For the query results, a similar format is used. Note that it is necessary to specify whether the result sets are ranked or not, as this will heavily influence the calculations. You can specify the score for a document. By "score" we mean the score that your retrieval algorithm has given the document. But this is not necessary. The documents will always be ranked in the order of their appearance, regardless of their score. Thus in the following example, the document with "07" at the end is the first and "25" is the last, regardless of the score. --- query: 12th air force germany 1957 ranked: true documents: - score: 0.44034874 document: g5701s.ict21307 - score: 0.44034874 document: g5701s.ict21309 - score: 0.44034874 document: g5701s.ict21311 - score: 0.44034874 document: g5701s.ict21313 - score: 0.44034874 document: g5701s.ict21315 - score: 0.44034874 document: g5701s.ict21317 - score: 0.44034874 document: g5701s.ict21319 - score: 0.44034874 document: g5701s.ict21321 - score: 0.44034874 document: g5701s.ict21323 - score: 0.44034874 document: g5701s.ict21325 --- query: 1612 ranked: true documents: - score: 1.0174774 document: g3290.np000144 - score: 0.763108 document: g3201b.ct000726 - score: 0.763108 document: g3400.ct000886 - score: 0.6359234 document: g3201s.ct000130 --- **Note**: You can also use the `plain` format, which will load the gold standard in a different way (but not the results): my_query my_document_1 false my_query my_document_2 true See that every query/document/relevancy pair is separated by a tabulator? You can also add the user's ID in the fourth column if necessary. Running the evaluation ----------------------- After you have specified the input files and the format, you can run the program. If needed, the `-v` switch will turn on verbose messages, such as information on how many judgements, documents and users there are, but this shouldn't be necessary. The program will first load the gold standard and then calculate the statistics for each result set. The output files are automatically created and contain a YAML representation of the results. Calculations may take a while depending on the amount of judgements and documents. If there are a thousand judgements, always consider a few seconds for each result set. Interpreting the output files ------------------------------ Two output files will be created: - `output_avg_precision.yml` - `output_statistics.yml` The first lists the average precision for each query in the query result file. The second file lists all supported statistics for each query in the query results file. For example, for a ranked evaluation, the first two entries of such a query result statistic look like this: --- 12th air force germany 1957: - :precision: 0.0 :recall: 0.0 :false_negatives: 1 :false_positives: 1 :true_negatives: 2516 :true_positives: 0 :document: g5701s.ict21313 :relevant: false - :precision: 0.0 :recall: 0.0 :false_negatives: 1 :false_positives: 2 :true_negatives: 2515 :true_positives: 0 :document: g5701s.ict21317 :relevant: false You can see the precision and recall for that specific point and also the number of documents for the contingency table (true/false positives/negatives). Also, the document identifier is given. API Usage ========= Using this API in another ruby application is probably the more common use case. All you have to do is include the Gem in your Ruby or Ruby on Rails application. For details about available methods, please refer to the API documentation generated by RDoc. **Important**: For this implementation, we use the document ID, the query and the user ID as the primary keys for matching objects. This means that your documents and queries are identified by a string and thus the strings should be sanitized first. Loading the Gold Standard ------------------------- Once you have loaded the Gem, you will probably start by creating a new gold standard. gold_standard = GoldStandard.new Then, you can load judgements into this standard, either from a file, or manually: gold_standard.load_from_yaml_file "my-file.yml" gold_standard.add_judgement :document => doc_id, :query => query_string, :relevant => boolean, :user => John There is a nice shortcut for the `add_judgement` method. Both lines are essentially the same: gold_standard.add_judgement :document => doc_id, :query => query_string, :relevant => boolean, :user => John gold_standard << :document => doc_id, :query => query_string, :relevant => boolean, :user => John Note the usage of typical Rails hashes for better readability (also, this Gem was developed to be used in a Rails webapp). Now that you have loaded the gold standard, you can do things like: gold_standard.contains_judgement? :document => "a document", :query => "the query" gold_standard.relevant? :document => "a document", :query => "the query" Loading the Query Results ------------------------- Now we want to create a new `QueryResultSet`. A query result set can contain more than one result, which is what we normally want. It is important that you specify the gold standard it belongs to. query_result_set = QueryResultSet.new :gold_standard => gold_standard Just like the Gold Standard, you can read a query result set from a file: query_result_set.load_from_yaml_file "my-results-file.yml" Alternatively, you can load the query results one by one. To do this, you have to create the results (either ranked or unranked) and then add documents: my_result = RankedQueryResult.new :query => "the query" my_result.add_document :document => "test_document 1", :score => 13 my_result.add_document :document => "test_document 2", :score => 11 my_result.add_document :document => "test_document 3", :score => 3 This result would be ranked, obviously, and contain three documents. Documents can have a score, but this is optional. You can also create an Array of documents first and add them altogether: documents = Array.new documents << ResultDocument.new :id => "test_document 1", :score => 20 documents << ResultDocument.new :id => "test_document 2", :score => 21 my_result = RankedQueryResult.new :query => "the query", :documents => documents The same applies to `UnrankedQueryResult`s, obviously. The order of ranked documents is the same as the order in which they were added to the result. The `QueryResultSet` will now contain all the results. They are stored in an array called `query_results`, which you can access. So, to iterate over each result, you might want to use the following code: query_result_set.query_results.each_with_index do |result, index| # ... end Or, more simply: for result in query_result_set.query_results # ... end Calculating statistics ---------------------- Now to the interesting part: Calculating statistics. As mentioned before, there is a conceptual difference between ranked and unranked results. Unranked results are much easier to calculate and thus take less CPU time. No matter if unranked or ranked, you can get the most important statistics by just calling the `statistics` method. statistics = my_result.statistics In the simple case of an unranked result, you will receive a hash with the following information: * `precision` - the precision of the results * `recall` - the recall of the results * `false_negatives` - number of not retrieved but relevant items * `false_positives` - number of retrieved but nonrelevant * `true_negatives` - number of not retrieved and nonrelevantv items * `true_positives` - number of retrieved and relevant items In case of a ranked result, you will receive an Array that consists of _n_ such Hashes, depending on the number of documents. Each Hash will give you the information at a certain rank, e.g. the following to lines return the recall at the fourth rank. statistics = my_ranked_result.statistics statistics[3][:recall] In addition to the information mentioned above, you can also get for each rank: * `document` - the ID of the document that was returned at this rank * `relevant` - whether the document was relevant or not Calculating statistics with missing judgements ---------------------------------------------- Sometimes, you don't have judgements for all document/query pairs in the gold standard. If this happens, the results will be cleaned up first. This means that every document in the results that doesn't appear to have a judgement will be removed temporarily. As an example, take the following results: * A * B * C * D Our gold standard only contains judgements for A and C. The results will be cleaned up first, thus leading to: * A * C With this approach, we can still provide meaningful results (for precision and recall). Other statistics ---------------- There are several other statistics that can be calculated, for example the **F measure**. The F measure weighs precision and recall and has one parameter, either "alpha" or "beta". Get the F measure like so: my_result.f_measure :beta => 1 If you don't specify either alpha or beta, we will assume that beta = 1. Another interesting measure is **Cohen's Kappa**, which tells us about the inter-agreement of assessors. Get the kappa statistic like this: gold_standard.kappa This will calculate the average kappa for each pairwise combination of users in the gold standard. For ranked results one might also want to calculate an **11-point precision**. Just call the following: my_ranked_result.eleven_point_precision This will return a Hash that has indices at the 11 recall levels from 0 to 1 (with steps of 0.1) and the corresponding precision at that recall level.
Ame Ame provides a simple command-line interface API for Ruby¹. It can be used to provide both simple interfaces like that of ‹rm›² and complex ones like that of ‹git›³. It uses Ruby’s own classes, methods, and argument lists to provide an interface that is both simple to use from the command-line side and from the Ruby side. The provided command-line interface is flexible and follows commond standards for command-line processing. ¹ See http://ruby-lang.org/ ² See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/rm.html ³ See http://git-scm.com/docs/ § Usage Let’s begin by looking at two examples, one where we mimic the POSIX¹ command-line interface to the ‹rm› command. Looking at the entry² in the standard, ‹rm› takes the following options: = -f. = Do not prompt for confirmation. = -i. = Prompt for confirmation. = -R. = Remove file hierarchies. = -r. = Equivalent to /-r/. It also takes the following arguments: = FILE. = A pathname or directory entry to be removed. And actually allows one or more of these /FILE/ arguments to be given. We also note that the ‹rm› command is described as a command to “remove directory entries”. ¹ See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/contents.html ² See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/rm.html Let’s turn this specification into one using Ame’s API. We begin by adding a flag for each of the options listed above: class Rm < Ame::Root flag 'f', '', false, 'Do not prompt for confirmation' flag 'i', '', nil, 'Prompt for confirmation' do |options| options['f'] = false end flag 'R', '', false, 'Remove file hierarchies' flag 'r', '', nil, 'Equivalent to -R' do |options| options['r'] = true end A flag¹ is a boolean option that doesn’t take an argument. Each flag gets a short and long name, where an empty name means that there’s no corresponding short or long name for the flag, a default value (true, false, or nil), and a description of what the flag does. Each flag can also optionally take a block that can do further processing. In this case we use this block to modify the Hash that maps option names to their values passed to the block to set other flags’ values than the ones that the block is associated with. As these flags (‘i’ and ‘r’) aren’t themselves of interest, their default values have been set to nil, which means that they won’t be included in the Hash that maps option names to their values when passed to the method. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#flag-class-method There are quite a few other kinds of options besides flags that can be defined using Ame, but flags are all that are required for this example. We’ll get to the other kinds in later examples. Next we add a “splus” argument. splus 'FILE', String, 'File to remove' A splus¹ argument is like a Ruby “splat”, that is, an Array argument at the end of the argument list to a method preceded by a star, except that a splus requires at least one argument. A splus argument gets a name for the argument (‹FILE›), the type of argument it represents (String), and a description. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#splus-class-method Then we add a description of the command (method) itself: description 'Remove directory entries' Descriptions¹ will be used in help output to assist the user in using the command. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#description-class-method Finally, we add the Ruby method that’ll implement the command (all preceding code included here for completeness): class Rm < Ame::Root version '1.0.0' flag 'f', '', false, 'Do not prompt for confirmation' flag 'i', '', nil, 'Prompt for confirmation' do |options| options['f'] = false end flag 'R', '', false, 'Remove file hierarchies' flag 'r', '', nil, 'Equivalent to -R' do |options| options['r'] = true end splus 'FILE', String, 'File to remove' description 'Remove directory entries' def rm(files, options = {}) require 'fileutils' FileUtils.send options['R'] ? :rm_r : :rm, [first] + rest, :force => options['f'] end end Actually, another bit of code was also added, namely version '1.0.0' This sets the version¹ String of the command. This information is used when the command is invoked with the “‹--version›” flag. This flag is automatically added, so you don’t need to add it yourself. Another flag, “‹--help›”, is also added automatically. When given, this flag’ll make Ame output usage information of the command. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#version-class-method To actually run the command, all you need to do is invoke Rm.process This’ll invoke the command using the command-line arguments stored in ‹ARGV›, but you can also specify other ones if you want to: Rm.process 'rm', %w[-r /tmp/*] The first argument to #process¹ is the name of the method to invoke, which defaults to ‹File.basename($0)›, and the second argument is an Array of Strings that should be processed as command-line arguments passed to the command. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#process-class-method If you’d store the complete ‹Rm› class defined above in a file called ‹rm› and add ‹#! /usr/bin/ruby -w› at the beginning and ‹Rm.process› at the end, you’d have a fully functional ‹rm› command (after making it executable). Let’s see it in action: % rm --help Usage: rm [OPTIONS]... FILE... Remove directory entries Arguments: FILE... File to remove Options: -R Remove file hierarchies -f Do not prompt for confirmation --help Display help for this method -i Prompt for confirmation -r Equivalent to -R --version Display version information % rm --version rm 1.0.0 Some commands are more complex than ‹rm›. For example, ‹git›¹ has a rather complex command-line interface. We won’t mimic it all here, but let’s introduce the rest of the Ame API using a fake ‹git› clone as an example. ¹ See http://git-scm.com/docs/ ‹Git› uses sub-commands to achieve most things. Implementing sub-commands with Ame is done using a “dispatch”. We’ll discuss dispatches in more detail later, but suffice it to say that a dispatch delegates processing to a child class that’ll handle the sub-command in question. We begin by defining our main ‹git› command using a class called ‹Git› under the ‹Git::CLI› namespace: module Git end class Git::CLI < Ame::Root version '1.0.0' class Git < Ame::Class description 'The stupid content tracker' def initialize; end We’re setting things up to use the ‹Git› class as a dispatch in the ‹Git::CLI› class. The description on the ‹initialize› method will be used as a description of the ‹git› dispatch command itself. Next, let’s add the ‹format-patch›¹ sub-command: description 'Prepare patches for e-mail submission' flag ?n, 'numbered', false, 'Name output in [PATCH n/m] format' flag ?N, 'no-numbered', nil, 'Name output in [PATCH] format' do |options| options['numbered'] = false end toggle ?s, 'signoff', false, 'Add Signed-off-by: line to the commit message' switch '', 'thread', 'STYLE', nil, Ame::Types::Enumeration[:shallow, :deep], 'Controls addition of In-Reply-To and References headers' flag '', 'no-thread', nil, 'Disables addition of In-Reply-To and Reference headers' do |options, _| options.delete 'thread' end option '', 'start-number', 'N', 1, 'Start numbering the patches at N instead of 1' multioption '', 'to', 'ADDRESS', String, 'Add a To: header to the email headers' optional 'SINCE', 'N/A', 'Generate patches for commits after SINCE' def format_patch(since = '', options = {}) p since, options end ¹ See http://git-scm.com/docs/git-format-patch/ We’re using quite a few new Ame commands here. Let’s look at each in turn: toggle ?s, 'signoff', false, 'Add Signed-off-by: line to the commit message' A “toggle”¹ is a flag that also has an inverse. Beyond the flags ‘s’ and “signoff”, the toggle also defines “no-signoff”, which will set “signoff” to false. This is useful if you want to support configuration files that set “signoff”’s default to true, but still allow it to be overridden on the command line. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#toggle-class-method When using the short form of a toggle (and flag and switch), multiple ones may be juxtaposed after the initial one. For example, “‹-sn›” is equivalent to “‹-s -n›” to “git format-patch›”. switch '', 'thread', 'STYLE', nil, Ame::Types::Enumeration[:shallow, :deep], 'Controls addition of In-Reply-To and References headers' A “switch”¹ is an option that takes an optional argument. This allows you to have separate defaults for when the switch isn’t present on the command line and for when it’s given without an argument. The third argument to a switch is the name of the argument. We’re also introducing a new concept here in ‹Ame::Types::Enumeration›. An enumeration² allows you to limit the allowed input to a set of Symbols. An enumeration also has a default value in the first item to its constructor (which is aliased as ‹.[]›). In this case, the “thread” switch defaults to nil, but, when given, will default to ‹:shallow› if no argument is given. If an argument is given it must be either “shallow” or “deep”. A switch isn’t required to take an enumeration as its argument default and can take any kind of default value for its argument that Ame knows how to handle. We’ll look at this in more detail later, but know that the type of the default value will be used to inform Ame how to parse a command-line argument into a Ruby value. An argument to a switch must be given, in this case, as “‹--thread=deep›” on the command line. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#switch-class-method ² See http://disu.se/software/ame-1.0/api/user/Ame/Types/Enumeration/ option '', 'start-number', 'N', 1, 'Start numbering the patches at N instead of 1' An “option”¹ is an option that takes an argument. The argument must always be present and may be given, in this case, as “‹--start-number=2›” or “‹--start-number 2›” on the command line. For a short-form option, anything that follows the option is seen as an argument, so assuming that “start-number” also had a short name of ‘S’, “‹-S2›” would be equivalent to “‹-S 2›”, which would be equivalent to “‹--start-number 2›”. Note that “‹-snS2›” would still work as expected. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#option-class-method multioption '', 'to', 'ADDRESS', String, 'Add a To: header to the email headers' A “multioption”¹ is an option that takes an argument and may be repeated any number of times. Each argument will be added to an Array stored in the Hash that maps option names to their values. Instead of taking a default argument, it takes a type for the argument (String, in this case). Again, types are used to inform Ame how to parse command-line arguments into Ruby values. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#multioption-class-method optional 'SINCE', 'N/A', 'Generate patches for commits after SINCE' An “optional”¹ argument is an argument that isn’t required. If it’s not present on the command line it’ll get its default value (the String ‹'N/A'›, in this case). ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#optional-class-method We’ve now covered all kinds of options and one new kind of argument. There are three more types of argument (one that we’ve already seen and two new) that we’ll look into now: “argument”, “splat”, and “splus”. description 'Annotate file lines with commit information' argument 'FILE', String, 'File to annotate' def annotate(file) p file end An “argument”¹ is an argument that’s required. If it’s not present on the command line, an error will be raised (and by default reported to the terminal). As it’s required, it doesn’t take a default, but rather a type. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#argument-class-method description 'Add file contents to the index' splat 'PATHSPEC', String, 'Files to add content from' def add(paths) p paths end A “splat”¹ is an argument that’s not required, but may be given any number of times. The type of a splat is the type of one argument and the type of a splat as a whole is an Array of values of that type. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#splat-class-method description 'Display gitattributes information' splus 'PATHNAME', String, 'Files to list attributes of' def check_attr(paths) p paths end A “splus”¹ is an argument that’s required, but may also be given any number of times. The type of a splus is the type of one argument and the type of a splus as a whole is an Array of values of that type. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#splus-class-method Now that we’ve seen all kinds of options and arguments, let’s look on an additional tool at our disposal, the dispatch¹. class Remote < Ame::Class description 'Manage set of remote repositories' def initialize; end description 'Shows a list of existing remotes' flag 'v', 'verbose', false, 'Show remote URL after name' def list(options = {}) p options end description 'Adds a remote named NAME for the repository at URL' argument 'name', String, 'Name of the remote to add' argument 'url', String, 'URL to the repository of the remote to add' def add(name, url) p name, url end end ¹ See http://disu.se/software/ame-1.0/api/user/Ame/Class#dispatch-class-method Here we’re defining a child class to Git::CLI::Git called “Remote” that doesn’t introduce anything new. Then we set up the dispatch: dispatch Remote, :default => 'list' This adds a method called “remote” to Git::CLI::Git that will dispatch processing of the command line to an instance of the Remote class when “‹git remote›” is seen on the command line. The “remote” method expects an argument that’ll be used to decide what sub-command to execute. Here we’ve specified that in the absence of such an argument, the “list” method should be invoked. We add the same kind of dispatch to Git under Git::CLI: dispatch Git and then we’re done. Here’s all the previous code in its entirety: module Git end class Git::CLI < Ame::Root version '1.0.0' class Git < Ame::Class description 'The stupid content tracker' def initialize; end description 'Prepare patches for e-mail submission' flag ?n, 'numbered', false, 'Name output in [PATCH n/m] format' flag ?N, 'no-numbered', nil, 'Name output in [PATCH] format' do |options| options['numbered'] = false end toggle ?s, 'signoff', false, 'Add Signed-off-by: line to the commit message' switch '', 'thread', 'STYLE', nil, Ame::Types::Enumeration[:shallow, :deep], 'Controls addition of In-Reply-To and References headers' flag '', 'no-thread', nil, 'Disables addition of In-Reply-To and Reference headers' do |options, _| options.delete 'thread' end option '', 'start-number', 'N', 1, 'Start numbering the patches at N instead of 1' multioption '', 'to', 'ADDRESS', String, 'Add a To: header to the email headers' optional 'SINCE', 'N/A', 'Generate patches for commits after SINCE' def format_patch(since = '', options = {}) p since, options end description 'Annotate file lines with commit information' argument 'FILE', String, 'File to annotate' def annotate(file) p file end description 'Add file contents to the index' splat 'PATHSPEC', String, 'Files to add content from' def add(paths) p paths end description 'Display gitattributes information' splus 'PATHNAME', String, 'Files to list attributes of' def check_attr(paths) p paths end class Remote < Ame::Class description 'Manage set of remote repositories' def initialize; end description 'Shows a list of existing remotes' flag 'v', 'verbose', false, 'Show remote URL after name' def list(options = {}) p options end description 'Adds a remote named NAME for the repository at URL' argument 'name', String, 'Name of the remote to add' argument 'url', String, 'URL to the repository of the remote to add' def add(name, url) p name, url end end dispatch Remote, :default => 'list' end dispatch Git end If we put this code in a file called “git” and add ‹#! /usr/bin/ruby -w› at the beginning and ‹Git::CLI.process› at the end, you’ll have a very incomplete git command-line interface on your hands. Let’s look at what some of its ‹--help› output looks like: % git --help Usage: git [OPTIONS]... METHOD [ARGUMENTS]... The stupid content tracker Arguments: METHOD Method to run [ARGUMENTS]... Arguments to pass to METHOD Options: --help Display help for this method --version Display version information Methods: add Add file contents to the index annotate Annotate file lines with commit information check-attr Display gitattributes information format-patch Prepare patches for e-mail submission remote Manage set of remote repositories % git format-patch --help Usage: git format-patch [OPTIONS]... [SINCE] Prepare patches for e-mail submission Arguments: [SINCE=N/A] Generate patches for commits after SINCE Options: -N, --no-numbered Name output in [PATCH] format --help Display help for this method -n, --numbered Name output in [PATCH n/m] format --no-thread Disables addition of In-Reply-To and Reference headers -s, --signoff Add Signed-off-by: line to the commit message --start-number=N Start numbering the patches at N instead of 1 --thread[=STYLE] Controls addition of In-Reply-To and References headers --to=ADDRESS* Add a To: header to the email headers % git remote --help Usage: git remote [OPTIONS]... [METHOD] [ARGUMENTS]... Manage set of remote repositories Arguments: [METHOD=list] Method to run [ARGUMENTS]... Arguments to pass to METHOD Options: --help Display help for this method Methods: add Adds a remote named NAME for the repository at URL list Shows a list of existing remotes § API The previous section gave an introduction to the whole user API in an informal and introductory way. For an indepth reference to the user API, see the {user API documentation}¹. ¹ See http://disu.se/software/ame-1.0/api/user/Ame/ If you want to extend the API or use it in some way other than as a command-line-interface writer, see the {developer API documentation}¹. ¹ See http://disu.se/software/ame-1.0/api/developer/Ame/ § Financing Currently, most of my time is spent at my day job and in my rather busy private life. Please motivate me to spend time on this piece of software by donating some of your money to this project. Yeah, I realize that requesting money to develop software is a bit, well, capitalistic of me. But please realize that I live in a capitalistic society and I need money to have other people give me the things that I need to continue living under the rules of said society. So, if you feel that this piece of software has helped you out enough to warrant a reward, please PayPal a donation to now@disu.se¹. Thanks! Your support won’t go unnoticed! ¹ Send a donation: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=now@disu.se&item_name=Ame § Reporting Bugs Please report any bugs that you encounter to the {issue tracker}¹. ¹ See https://github.com/now/ame/issues § Authors Nikolai Weibull wrote the code, the tests, the documentation, and this README. § Licensing Ame is free software: you may redistribute it and/or modify it under the terms of the {GNU Lesser General Public License, version 3}¹ or later², as published by the {Free Software Foundation}³. ¹ See http://disu.se/licenses/lgpl-3.0/ ² See http://gnu.org/licenses/ ³ See http://fsf.org/