normalize-code-similarity — package search across npm, PyPI, crates.io, RubyGems, Go, Maven & NuGet

RubyGems matches

5 matches · Ruby

ruby-duplicatesv0.1.0

A small duplicate-code metric for Ruby that compares normalized Ripper syntax fingerprints with Jaccard similarity. Inspired by Uncle Bob's dry4clj.

Maintained. Niche but maintained, actively maintained.

uv1.0.2

RubyGems

U U extends Ruby’s Unicode support. It provides a string class called U::String with an interface that mimics that of the String class in Ruby 2.0, but that can also be used from both Ruby 1.8. This interface also has more complete Unicode support and never modifies the receiver. Thus, a U::String is an immutable value object. U comes with complete and very accurate documentation¹. The documentation can realistically also be used as a reference to the Ruby String API and may actually be preferable, as it’s a lot more explicit and complete than the documentation that comes with Ruby. ¹ See http://disu.se/software/u-1.0/api/ § Installation Install u with % gem install u § Usage Usage is basically the following: require 'u-1.0' a = 'äbc' a.upcase # ⇒ 'äBC' a.u.upcase # ⇒ 'ÄBC' That is, you require the library, then you invoke #u on a String. This’ll give you a U::String that has much better Unicode support than a normal String. It’s important to note that U only uses UTF-8, which means that #u will try to #encode the String as such. This shouldn’t be an issue in most cases, as UTF-8 is now more or less the universal encoding – and rightfully so. As U::Strings¹ are immutable value objects, there’s also a U::Buffer² available for building U::Strings efficiently. See the API³ for more complete usage information. The following sections will only cover the extensions and differences that U::String exhibit from Ruby’s built-in String class. ¹ See http://disu.se/software/u-1.0/api/U/String/ ² See http://disu.se/software/u-1.0/api/U/Buffer/ ³ See http://disu.se/software/u-1.0/api/ § Unicode Properties There are quite a few property-checking interrogators that let you check if all characters in a U::String have the given Unicode property: • #alnum?¹ • #alpha?² • #assigned?³ • #case_ignorable?⁴ • #cased?⁵ • #cntrl?⁶ • #defined?⁷ • #digit?⁸ • #graph?⁹ • #newline?¹⁰ • #print?¹¹ • #punct?¹² • #soft_dotted?¹³ • #space?¹⁴ • #title?¹⁵ • #valid?¹⁶ • #wide?¹⁷ • #wide_cjk?¹⁸ • #xdigit?¹⁹ • #zero_width?²⁰ ¹ See http://disu.se/software/u-1.0/api/U/String/#alnum-p-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#alpha-p-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#assigned-p-instance-method ⁴ See http://disu.se/software/u-1.0/api/U/String/#case_ignorable-p-instance-method ⁵ See http://disu.se/software/u-1.0/api/U/String/#cased-p-instance-method ⁶ See http://disu.se/software/u-1.0/api/U/String/#cntrl-p-instance-method ⁷ See http://disu.se/software/u-1.0/api/U/String/#defined-p-instance-method ⁸ See http://disu.se/software/u-1.0/api/U/String/#digit-p-instance-method ⁹ See http://disu.se/software/u-1.0/api/U/String/#graph-p-instance-method ¹⁰ See http://disu.se/software/u-1.0/api/U/String/#newline-p-instance-method ¹¹ See http://disu.se/software/u-1.0/api/U/String/#print-p-instance-method ¹² See http://disu.se/software/u-1.0/api/U/String/#punct-p-instance-method ¹³ See http://disu.se/software/u-1.0/api/U/String/#soft_dotted-p-instance-method ¹⁴ See http://disu.se/software/u-1.0/api/U/String/#space-p-instance-method ¹⁵ See http://disu.se/software/u-1.0/api/U/String/#title-p-instance-method ¹⁶ See http://disu.se/software/u-1.0/api/U/String/#valid-p-instance-method ¹⁷ See http://disu.se/software/u-1.0/api/U/String/#wide-p-instance-method ¹⁸ See http://disu.se/software/u-1.0/api/U/String/#wide_cjk-p-instance-method ¹⁹ See http://disu.se/software/u-1.0/api/U/String/#xdigit-p-instance-method ²⁰ See http://disu.se/software/u-1.0/api/U/String/#zero_width-p-instance-method Similar to these methods are • #folded?¹ • #lower?² • #upper?³ which check whether a ‹U::String› has been cased in a given manner. ¹ See http://disu.se/software/u-1.0/api/U/String/#folded-p-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#lower-p-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#upper-p-instance-method There’s also a #normalized?¹ method that checks whether a ‹U::String› has been normalized on a given form. ¹ See http://disu.se/software/u-1.0/api/U/String/#normalized-p-instance-method You can also access certain Unicode properties of the characters of a U::String: • #canonical_combining_class¹ • #general_category² • #grapheme_break³ • #line_break⁴ • #script⁵ • #word_break⁶ ¹ See http://disu.se/software/u-1.0/api/U/String/#canonical_combining_class-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#general_category-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#grapheme_break-instance-method ⁴ See http://disu.se/software/u-1.0/api/U/String/#line_break-instance-method ⁵ See http://disu.se/software/u-1.0/api/U/String/#script-instance-method ⁶ See http://disu.se/software/u-1.0/api/U/String/#word_break-instance-method § Locale-specific Comparisons Comparisons of U::Strings respect the current locale (and also allow you to specify a locale to use): ‹#<=>›¹, #casecmp², and #collation_key³. ¹ See http://disu.se/software/u-1.0/api/U/String/#comparison-operator ² See http://disu.se/software/u-1.0/api/U/String/#casecmp-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#collation_key-instance-method § Additional Enumerators There are a couple of additional enumerators in #each_grapheme_cluster¹ and #each_word² (along with aliases #grapheme_clusters³ and #words⁴). ¹ See http://disu.se/software/u-1.0/api/U/String/#each_grapheme_cluster-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#each_word-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#grapheme_clusters-instance-method ⁴ See http://disu.se/software/u-1.0/api/U/String/#words-instance-method § Unicode-aware Sub-sequence Removal #Chomp¹, #chop², #lstrip³, #rstrip⁴, and #strip⁵ all look for Unicode newline and space characters, rather than only ASCII ones. ¹ See http://disu.se/software/u-1.0/api/U/String/#chomp-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#chop-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#lstrip-instance-method ⁴ See http://disu.se/software/u-1.0/api/U/String/#rstrip-instance-method ⁵ See http://disu.se/software/u-1.0/api/U/String/#strip-instance-method § Unicode-aware Conversions Case-shifting methods #downcase¹ and #upcase² do proper Unicode casing and the interface is further augmented by #foldcase³ and #titlecase⁴. #Mirror⁵ and #normalize⁶ do conversions similar in nature to the case-shifting methods. ¹ See http://disu.se/software/u-1.0/api/U/String/#downcase-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#upcase-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#foldcase-instance-method ⁴ See http://disu.se/software/u-1.0/api/U/String/#titlecase-instance-method ⁵ See http://disu.se/software/u-1.0/api/U/String/#mirror-instance-method ⁶ See http://disu.se/software/u-1.0/api/U/String/#normalize-instance-method § Width Calculations #Width¹ will return the number of cells on a terminal that a U::String will occupy. #Center², #ljust³, and #rjust⁴ deal in width rather than length, making them much more useful for generating terminal output. #%⁵ (and its alias #format⁶) similarly deal in width. ¹ See http://disu.se/software/u-1.0/api/U/String/#width-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#center-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#ljust-instance-method ⁴ See http://disu.se/software/u-1.0/api/U/String/#rjust-instance-method ⁵ See http://disu.se/software/u-1.0/api/U/String/#modulo-operator ⁶ See http://disu.se/software/u-1.0/api/U/String/#format-instance-method § Extended Type Conversions Finally, #hex¹, #oct², and #to_i³ use Unicode alpha-numerics for their respective conversions. ¹ See http://disu.se/software/u-1.0/api/U/String/#hex-instance-method ² See http://disu.se/software/u-1.0/api/U/String/#oct-instance-method ³ See http://disu.se/software/u-1.0/api/U/String/#to_i-instance-method § News § 1.0.0 Initial public release! § Financing Currently, most of my time is spent at my day job and in my rather busy private life. Please motivate me to spend time on this piece of software by donating some of your money to this project. Yeah, I realize that requesting money to develop software is a bit, well, capitalistic of me. But please realize that I live in a capitalistic society and I need money to have other people give me the things that I need to continue living under the rules of said society. So, if you feel that this piece of software has helped you out enough to warrant a reward, please PayPal a donation to now@disu.se¹. Thanks! Your support won’t go unnoticed! ¹ Send a donation: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=now@disu.se&item_name=U § Reporting Bugs Please report any bugs that you encounter to the {issue tracker}¹. ¹ See https://github.com/now/u/issues § Authors Nikolai Weibull wrote the code, the tests, the documentation, and this README. § Licensing U is free software: you may redistribute it and/or modify it under the terms of the {GNU Lesser General Public License, version 3}¹ or later², as published by the {Free Software Foundation}³. ¹ See http://disu.se/licenses/lgpl-3.0/ ² See http://gnu.org/licenses/ ³ See http://fsf.org/

Abandoned. Last published 10 years ago.

xqueryv0.2.1

RubyGems

# XQuery [![Join the chat at https://gitter.im/JelF/xquery](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/JelF/xquery?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![Build Status](https://travis-ci.org/JelF/xquery.svg?branch=master)](https://travis-ci.org/JelF/xquery) [![Code Climate](https://codeclimate.com/github/JelF/xquery/badges/gpa.svg)](https://codeclimate.com/github/JelF/xquery) [![Test Coverage](https://codeclimate.com/github/JelF/xquery/badges/coverage.svg)](https://codeclimate.com/github/JelF/xquery/coverage) [![Issue Count](https://codeclimate.com/github/JelF/xquery/badges/issue_count.svg)](https://codeclimate.com/github/JelF/xquery) XQuery is designed to replace boring method call chains and allow to easier convert it in a builder classes ## Usage of `XQuery` function `XQuery` is a shortcat to `XQuery::Generic.with` ``` r = XQuery(''.html_safe) do |q| # similar to tap q << 'bla bla bla' q << 'bla bla bla' # using truncate q.truncate(15) # real content (q.send(:query)) mutated q << '!' end r # => "bla bla blab...!" ``` ## Usage of `XQuery::Abstract` I designed this gem to help me with `ActiveRecord` Queries, so i inherited `XQuery::Abstract` and used it's powers. It provides the following features ### `wrap_method` and `wrap_methods` when you call each of this methods they became automatically wrapped (`XQuery::Abstract` basically wraps all methods query `#respond_to?`) It means, that there are instance methods with same name defined and will change a `#query` to their call result. ``` self.query = query.foo(x) # is basically the same as foo(x) # when `wrap_method :foo` called ``` You can also specify new name using `wrap_method :foo, as: :bar` syntax ### `q` object `q` is a proxy object which holds all of wrapped methods, but not methods you defined inside your class. E.g. i have defined `wrap_method(:foo)`, but also delegated `#foo` to some another object. If i call `q.foo`, i will get wrapped method. Note, that if you redefine `#__foo` method, q.foo will call it instead of normal work. You can add additional methods to `q` using something like `alias_on_q :foo`. I used it with `kaminary` and it was useful ``` def page=(x) apply { |query| query.page(x) } end alias_on_q :page= def page query.current_page end alias_on_q :page ``` ### `query_superclass` You should specify `query_superclass` class_attribute to inherit `XQuery::Abstract`. Whenever `query.is_a?(query_superclass)` evaluate to false, you will get `XQuery::QuerySuperclassChanged` exception. It can save you much time when your class misconfigured. E.g. you are using `select!` and it returns `nil`, because why not? ### `#apply` method `#apply` does exact what it source tells ``` # yields query inside block # @param block [#to_proc] # @return [XQuery::Abstract] self def apply(&block) self.query = block.call(query) self end ``` It is usefull to merge different queries. ### `with` class method You can get XQuery functionality even you have not defined a specific class (You are still have to inherit XQuery::Abstract to use it) You can see it in this document when i described `XQuery` function. Note, that it yields a class instance, not `q` object. It accepts any arguments, they will be passed to a constructor (except block) ### `execute` method Preferred way to call public instance methods. Resulting query would be returned

Abandoned. Last published 9 years ago.

rake-toolkit_programv0.1.2

RubyGems

# Rake::ToolkitProgram Create toolkit programs easily with `Rake` and `OptionParser` syntax. Bash completions and usage help are baked in. ## Installation Add this line to your application's Gemfile: ```ruby gem 'rake-toolkit_program' ``` And then execute: $ bundle Or install it yourself as: $ gem install rake-toolkit_program ## Quickstart * Shebang it up (in a file named `awesome_tool.rb`) ```ruby #!/usr/bin/env ruby ``` * Require the library ```ruby require 'rake/toolkit_program' ``` * Make your life easier ```ruby Program = Rake::ToolkitProgram ``` * Define your command tasks ```ruby Program.command_tasks do desc "Build it" task 'build' do # Ruby code here end desc "Test it" task 'test' => ['build'] do # Rake syntax ↑↑↑↑↑↑↑ for dependencies # Ruby code here end end ``` You can use `Program.args` in your tasks to access the other arguments on the command line. For argument parsing integrated into the help provided by the program, see the use of `Rake::Task(Rake::ToolkitProgram::TaskExt)#parse_args` below. * Wire the mainline ```ruby Program.run(on_error: :exit_program!) if $0 == __FILE__ ``` * In the shell, prepare to run the program (UNIX/Linux systems only) ```console $ chmod +x awesome_tool.rb $ ./awesome_tool.rb --install-completions Completions installed in /home/rtweeks/.bashrc Source /home/rtweeks/.bash-complete/awesome_tool.rb-completions for immediate availability. $ source /home/rtweeks/.bash-complete/awesome_tool.rb-completions ``` * Ask for help ```console $ ./awesome_tool.rb help *** ./awesome_tool.rb Toolkit Program *** . . . ``` ## Usage Let's look at a short sample toolkit program -- put this in `awesome.rb`: ```ruby #!/usr/bin/env ruby require 'rake/toolkit_program' require 'ostruct' ToolkitProgram = Rake::ToolkitProgram ToolkitProgram.title = "My Awesome Toolkit of Awesome" ToolkitProgram.command_tasks do desc <<-END_DESC.dedent Fooing myself I'm not sure what I'm doing, but I'm definitely fooing! END_DESC task :foo do a = ToolkitProgram.args puts "I'm fooed#{' on a ' if a.implement}#{a.implement}" end.parse_args(into: OpenStruct.new) do |parser, args| parser.no_positional_args! parser.on('-i', '--implement IMPLEMENT', 'An implement on which to be fooed') do |val| args.implement = val end end end if __FILE__ == $0 ToolkitProgram.run(on_error: :exit_program!) end ``` Make sure to `chmod +x awesome.rb`! What does this support? $ ./awesome.rb foo I'm fooed $ ./awesome.rb --help *** My Awesome Toolkit of Awesome *** Usage: ./awesome.rb COMMAND [OPTION ...] Avaliable options vary depending on the command given. For details of a particular command, use: ./awesome.rb help COMMAND Commands: foo Fooing myself help Show a list of commands or details of one command Use help COMMAND to get more help on a specific command. $ ./awesome.rb help foo *** My Awesome Toolkit of Awesome *** Usage: ./awesome.rb foo [OPTION ...] Fooing myself I'm not sure what I'm doing, but I'm definitely fooing! Options: -i, --implement IMPLEMENT An implement on which to be fooed $ ./awesome.rb --install-completions Completions installed in /home/rtweeks/.bashrc Source /home/rtweeks/.bash-complete/awesome.rb-completions for immediate availability. $ source /home/rtweeks/.bash-complete/awesome.rb-completions $ ./awesome.rb <tab><tab> foo help $ ./awesome.rb f<tab> ↳ ./awesome.rb foo $ ./awesome.rb foo <tab> ↳ ./awesome.rb foo -- $ ./awesome.rb foo --<tab><tab> --help --implement $ ./awesome.rb foo --i<tab> ↳ ./awesome.rb foo --implement $ ./awesome.rb foo --implement <tab><tab> --help awesome.rb $ ./awesome.rb foo --implement spoon I'm fooed on a spoon ### Defining Toolkit Commands Just define tasks in the block of `Rake::ToolkitProgram.command_tasks` with `task` (i.e. `Rake::DSL#task`). If `desc` is used to provide a description, the task will become visible in help and completions. When a command task is initially defined, positional arguments to the command are available as an `Array` through `Rake::ToolkitProgram.args`. ### Option Parsing This gem extends `Rake::Task` with a `#parse_args` method that creates a `Rake::ToolkitProgram::CommandOptionParser` (derived from the standard library's `OptionParser`) and an argument accumulator and `yield`s them to its block. * The arguments accumulated through the `Rake::ToolkitProgram::CommandOptionParser` are available to the task in `Rake::ToolkitProgram.args`, replacing the normal `Array` of positional arguments. * Use the `into:` keyword of `#parse_args` to provide a custom argument accumulator object for the associated command. The default argument accumulator constructor can be defined with `Rake::ToolkitProgram.default_parsed_args`. Without either of these, the default accumulator is a `Hash`. * Options defined using `OptionParser#on` (or any of the variants) will print in the help for the associated command. ### Positional Arguments Accessing positional arguments given after the command name depends on whether or not `Rake::Task(Rake::ToolkitProgram::TaskExt)#parse_args` has been called on the command task. If this method is not called, positional arguments will be an `Array` accessible through `Rake::ToolkitProgram.args`. When `Rake::Task(Rake::ToolkitProgram::TaskExt)#parse_args` is used: * `Rake::ToolkitProgram::CommandOptionParser#capture_positionals` can be used to define how positional arguments are accumulated. * If the argument accumulator is a `Hash`, the default (without calling this method) is to assign the `Array` of positional arguments to the `nil` key of the `Hash`. * For other types of accumulators, the positional arguments are only accessible if `Rake::ToolkitProgram::CommandOptionParser#capture_positionals` is used to define how they are captured. * If a block is given to this method, the block of the method will receive the `Array` of positional arguments. If it is passed an argument value, that value is used as the key under which to store the positional arguments if the argument accumulator is a `Hash`. * `Rake::ToolkitProgram::CommandOptionParser#expect_positional_cardinality` can be used to set a rule for the count of positional arguments. This will affect the _usage_ presented in the help for the associated command. * `Rake::ToolkitProgram::CommandOptionParser#map_positional_args` may be used to transform (or otherwise process) positional arguments one at a time and in the context of options and/or arguments appearing earlier on the command line. ### Convenience Methods * `Rake::Task(Rake::ToolkitProgram::TaskExt)#prohibit_args` is a quick way, for commands that accept no options or positional arguments, to declare this so the help and bash completions reflect this. It is equivalent to using `#parse_args` and telling the parser `parser.expect_positional_cardinality(0)`. * `Rake::ToolkitProgram::CommandOptionParser#no_positional_args!` is a shortcut for calling `#expect_positional_cardinality(0)` on the same object. * `Rake::Task(Rake::ToolkitProgram::TaskExt)#invalid_args!` and `Rake::ToolkitProgram::CommandOptionParser#invalid_args!` are convenient ways to raise `Rake::ToolkitProgram::InvalidCommandLine` with a message. ## OptionParser in Rubies Before and After v2.4 The `OptionParser` class was extended in Ruby 2.4 to simplify capturing options into a `Hash` or other container implementing `#[]=` in a similar way. This gem supports that, but it means that behavior varies somewhat between the pre-2.4 era and the 2.4+ era. To have consistent behavior across that version change, the recommendation is to use a `Struct`, `OpenStruct`, or custom class to hold program options rather than `Hash`. ## Development After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment. To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org). To run the tests, use `rake`, `rake test`, or `rspec spec`. Tests can only be run on systems that support `Kernel#fork`, as this is used to present a pristine and isolated environment for setting up the tool. If run using Ruby 2.3 or earlier, some tests will be pending because functionality expects Ruby 2.4's `OptionParser`. ## Contributing Bug reports and pull requests are welcome on GitHub at https://github.com/PayTrace/rake-toolkit_program. For further details on contributing, see [CONTRIBUTING.md](./CONTRIBUTING.md).

Abandoned. Last published 6 years ago.

retrevalv0.1.2

RubyGems

README ====== This is a simple API to evaluate information retrieval results. It allows you to load ranked and unranked query results and calculate various evaluation metrics (precision, recall, MAP, kappa) against a previously loaded gold standard. Start this program from the command line with: retreval -l <gold-standard-file> -q <query-results> -f <format> -o <output-prefix> The options are outlined when you pass no arguments and just call retreval You will find further information in the RDOC documentation and the HOWTO section below. If you want to see an example, use this command: retreval -l example/gold_standard.yml -q example/query_results.yml -f yaml -v INSTALLATION ============ If you have RubyGems, just run gem install retreval You can manually download the sources and build the Gem from there by `cd`ing to the folder where this README is saved and calling gem build retreval.gemspec This will create a gem file called which you just have to install with `gem install <file>` and you're done. HOWTO ===== This API supports the following evaluation tasks: - Loading a Gold Standard that takes a set of documents, queries and corresponding judgements of relevancy (i.e. "Is this document relevant for this query?") - Calculation of the _kappa measure_ for the given gold standard - Loading ranked or unranked query results for a certain query - Calculation of _precision_ and _recall_ for each result - Calculation of the _F-measure_ for weighing precision and recall - Calculation of _mean average precision_ for multiple query results - Calculation of the _11-point precision_ and _average precision_ for ranked query results - Printing of summary tables and results Typically, you will want to use this Gem either standalone or within another application's context. Standalone Usage ================ Call parameters --------------- After installing the Gem (see INSTALLATION), you can always call `retreval` from the commandline. The typical call is: retreval -l <gold-standard-file> -q <query-results> -f <format> -o <output-prefix> Where you have to define the following options: - `gold-standard-file` is a file in a specified format that includes all the judgements - `query-results` is a file in a specified format that includes all the query results in a single file - `format` is the format that the files will use (either "yaml" or "plain") - `output-prefix` is the prefix of output files that will be created Formats ------- Right now, we focus on the formats you can use to load data into the API. Currently, we support YAML files that must adhere to a special syntax. So, in order to load a gold standard, we need a file in the following format: * "query" denotes the query * "documents" these are the documents judged for this query * "id" the ID of the document (e.g. its filename, etc.) * "judgements" an array of judgements, each one with: * "relevant" a boolean value of the judgment (relevant or not) * "user" an optional identifier of the user Example file, with one query, two documents, and one judgement: - query: 12th air force germany 1957 documents: - id: g5701s.ict21311 judgements: [] - id: g5701s.ict21313 judgements: - relevant: false user: 2 So, when calling the program, specify the format as `yaml`. For the query results, a similar format is used. Note that it is necessary to specify whether the result sets are ranked or not, as this will heavily influence the calculations. You can specify the score for a document. By "score" we mean the score that your retrieval algorithm has given the document. But this is not necessary. The documents will always be ranked in the order of their appearance, regardless of their score. Thus in the following example, the document with "07" at the end is the first and "25" is the last, regardless of the score. --- query: 12th air force germany 1957 ranked: true documents: - score: 0.44034874 document: g5701s.ict21307 - score: 0.44034874 document: g5701s.ict21309 - score: 0.44034874 document: g5701s.ict21311 - score: 0.44034874 document: g5701s.ict21313 - score: 0.44034874 document: g5701s.ict21315 - score: 0.44034874 document: g5701s.ict21317 - score: 0.44034874 document: g5701s.ict21319 - score: 0.44034874 document: g5701s.ict21321 - score: 0.44034874 document: g5701s.ict21323 - score: 0.44034874 document: g5701s.ict21325 --- query: 1612 ranked: true documents: - score: 1.0174774 document: g3290.np000144 - score: 0.763108 document: g3201b.ct000726 - score: 0.763108 document: g3400.ct000886 - score: 0.6359234 document: g3201s.ct000130 --- **Note**: You can also use the `plain` format, which will load the gold standard in a different way (but not the results): my_query my_document_1 false my_query my_document_2 true See that every query/document/relevancy pair is separated by a tabulator? You can also add the user's ID in the fourth column if necessary. Running the evaluation ----------------------- After you have specified the input files and the format, you can run the program. If needed, the `-v` switch will turn on verbose messages, such as information on how many judgements, documents and users there are, but this shouldn't be necessary. The program will first load the gold standard and then calculate the statistics for each result set. The output files are automatically created and contain a YAML representation of the results. Calculations may take a while depending on the amount of judgements and documents. If there are a thousand judgements, always consider a few seconds for each result set. Interpreting the output files ------------------------------ Two output files will be created: - `output_avg_precision.yml` - `output_statistics.yml` The first lists the average precision for each query in the query result file. The second file lists all supported statistics for each query in the query results file. For example, for a ranked evaluation, the first two entries of such a query result statistic look like this: --- 12th air force germany 1957: - :precision: 0.0 :recall: 0.0 :false_negatives: 1 :false_positives: 1 :true_negatives: 2516 :true_positives: 0 :document: g5701s.ict21313 :relevant: false - :precision: 0.0 :recall: 0.0 :false_negatives: 1 :false_positives: 2 :true_negatives: 2515 :true_positives: 0 :document: g5701s.ict21317 :relevant: false You can see the precision and recall for that specific point and also the number of documents for the contingency table (true/false positives/negatives). Also, the document identifier is given. API Usage ========= Using this API in another ruby application is probably the more common use case. All you have to do is include the Gem in your Ruby or Ruby on Rails application. For details about available methods, please refer to the API documentation generated by RDoc. **Important**: For this implementation, we use the document ID, the query and the user ID as the primary keys for matching objects. This means that your documents and queries are identified by a string and thus the strings should be sanitized first. Loading the Gold Standard ------------------------- Once you have loaded the Gem, you will probably start by creating a new gold standard. gold_standard = GoldStandard.new Then, you can load judgements into this standard, either from a file, or manually: gold_standard.load_from_yaml_file "my-file.yml" gold_standard.add_judgement :document => doc_id, :query => query_string, :relevant => boolean, :user => John There is a nice shortcut for the `add_judgement` method. Both lines are essentially the same: gold_standard.add_judgement :document => doc_id, :query => query_string, :relevant => boolean, :user => John gold_standard << :document => doc_id, :query => query_string, :relevant => boolean, :user => John Note the usage of typical Rails hashes for better readability (also, this Gem was developed to be used in a Rails webapp). Now that you have loaded the gold standard, you can do things like: gold_standard.contains_judgement? :document => "a document", :query => "the query" gold_standard.relevant? :document => "a document", :query => "the query" Loading the Query Results ------------------------- Now we want to create a new `QueryResultSet`. A query result set can contain more than one result, which is what we normally want. It is important that you specify the gold standard it belongs to. query_result_set = QueryResultSet.new :gold_standard => gold_standard Just like the Gold Standard, you can read a query result set from a file: query_result_set.load_from_yaml_file "my-results-file.yml" Alternatively, you can load the query results one by one. To do this, you have to create the results (either ranked or unranked) and then add documents: my_result = RankedQueryResult.new :query => "the query" my_result.add_document :document => "test_document 1", :score => 13 my_result.add_document :document => "test_document 2", :score => 11 my_result.add_document :document => "test_document 3", :score => 3 This result would be ranked, obviously, and contain three documents. Documents can have a score, but this is optional. You can also create an Array of documents first and add them altogether: documents = Array.new documents << ResultDocument.new :id => "test_document 1", :score => 20 documents << ResultDocument.new :id => "test_document 2", :score => 21 my_result = RankedQueryResult.new :query => "the query", :documents => documents The same applies to `UnrankedQueryResult`s, obviously. The order of ranked documents is the same as the order in which they were added to the result. The `QueryResultSet` will now contain all the results. They are stored in an array called `query_results`, which you can access. So, to iterate over each result, you might want to use the following code: query_result_set.query_results.each_with_index do |result, index| # ... end Or, more simply: for result in query_result_set.query_results # ... end Calculating statistics ---------------------- Now to the interesting part: Calculating statistics. As mentioned before, there is a conceptual difference between ranked and unranked results. Unranked results are much easier to calculate and thus take less CPU time. No matter if unranked or ranked, you can get the most important statistics by just calling the `statistics` method. statistics = my_result.statistics In the simple case of an unranked result, you will receive a hash with the following information: * `precision` - the precision of the results * `recall` - the recall of the results * `false_negatives` - number of not retrieved but relevant items * `false_positives` - number of retrieved but nonrelevant * `true_negatives` - number of not retrieved and nonrelevantv items * `true_positives` - number of retrieved and relevant items In case of a ranked result, you will receive an Array that consists of _n_ such Hashes, depending on the number of documents. Each Hash will give you the information at a certain rank, e.g. the following to lines return the recall at the fourth rank. statistics = my_ranked_result.statistics statistics[3][:recall] In addition to the information mentioned above, you can also get for each rank: * `document` - the ID of the document that was returned at this rank * `relevant` - whether the document was relevant or not Calculating statistics with missing judgements ---------------------------------------------- Sometimes, you don't have judgements for all document/query pairs in the gold standard. If this happens, the results will be cleaned up first. This means that every document in the results that doesn't appear to have a judgement will be removed temporarily. As an example, take the following results: * A * B * C * D Our gold standard only contains judgements for A and C. The results will be cleaned up first, thus leading to: * A * C With this approach, we can still provide meaningful results (for precision and recall). Other statistics ---------------- There are several other statistics that can be calculated, for example the **F measure**. The F measure weighs precision and recall and has one parameter, either "alpha" or "beta". Get the F measure like so: my_result.f_measure :beta => 1 If you don't specify either alpha or beta, we will assume that beta = 1. Another interesting measure is **Cohen's Kappa**, which tells us about the inter-agreement of assessors. Get the kappa statistic like this: gold_standard.kappa This will calculate the average kappa for each pairwise combination of users in the gold standard. For ranked results one might also want to calculate an **11-point precision**. Just call the following: my_ranked_result.eleven_point_precision This will return a Hash that has indices at the 11 recall levels from 0 to 1 (with steps of 0.1) and the corresponding precision at that recall level.

Abandoned. Last published 14 years ago.