Ethics in every day life.

Unify the world so that it can exist in future. Ethics is the unifying factor to unify the world. Environment and teachings combine to form the mold an individual’ ethical behaviors. Without challenging the spirituality of religions and the code of ethics that comes with each one of them, it may be useful to examine where is ethics prevailing and in terms of actions of the individuals.

 

 

Searchable Text Database

Open Source Options

  1. Full Text Search
    1. http://en.wikipedia.org/wiki/Full_text_search#Software
    2. http://www.mediawiki.org/wiki/Fulltext_search_engines
  2. Interesting Search in My Opinion
    1. Sphinx
    2. MySql Search
    3. Sql Server search
    4. Lucene and Elastic Search on top of Lucene.
  3. Full Text search comparison
    1. http://full-text-search.findthebest.com/
    2. A very nice comparison http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/
    3. http://taschenorakel.de/mathias/2012/04/18/fulltext-search-benchmarks/
    4. http://www.dbbest.com/blog/lucene-vs-sql-server-fts/
    5. http://beerpla.net/2009/09/03/comparison-between-solr-and-sphinx-search-servers-solr-vs-sphinx-fight/
  4. Sphinx http://sphinxsearch.com/
    1. I have personally used Sphinx with a Ruby on Rails project by installing sphinx in the background, installing a gem to interact with sphinx, defining which attributes in the model file to index, how to do searching, using delta index to speed up the process by using a delayed gem which would make a local copy of the change and when the index was updated after some period sometimes days sometimes after a week (done through cron job) then it would move the delta changes to the full index. I found the sphinx server easy to use once i got the hang of it. The delta index is normally smaller then the full index and is often the most recent changes which have not been integrated into the full index. This is normally used to avoid updating the whole index(as re-indexing is a time consuming process which can take long time based on the index size). 
    2. http://en.wikipedia.org/wiki/Sphinx_(search_engine)
    3. Can be used as stand-alone or with  MySQLMariaDB and PostgreSQL, or by using ODBC with ODBC-compliant DBMS’s
    4. Sphinx latest release download http://sphinxsearch.com/downloads/release/
    5. Documentation 
    6. Support for many programming languages integration and highly scalable.
    7. Has a lot of features related to natural language processing like using stopwords, tokenization etc.
    8. Note that the original contents of the fields are not stored in the Sphinx index. The text that you send to Sphinx gets processed, and a full-text index (a special data structure that enables quick searches for a keyword) gets built from that text. But the original text contents are then simply discarded. Sphinx assumes that you store those contents elsewhere anyway
    9. There are multiple modes of searching which can be found
    10. http://stackoverflow.com/questions/737275/comparison-of-full-text-search-engine-lucene-sphinx-postgresql-mysql
  5. MySql Full Text Search 
    1. Modes of search:
      1. A boolean search interprets the search string using the rules of a special query language
      2. A natural language search interprets the search string as a phrase in natural human language
      3. A query expansion search is a modification of a natural language search
  6. Sql Server Full Text Search http://technet.microsoft.com/en-us/library/ms142571.aspx
    1. The beginning of the article give overview of text search, functionality, architecture, and modes of searching.
    2. Interesting section on this page are the related tasks at the end which gives more detail on how exactly to do the search. The most helpful article is the first one on how to get started with full text search http://technet.microsoft.com/en-us/library/ms142497.aspx
  7. http://lucene.apache.org/solr/ Apache Solr/Lucene
    1. Rest Api
    2. Stand alone
    3. Tutorial http://lucene.apache.org/solr/4_6_0/tutorial.html
  8. Interesting Project on top of Lucene http://www.elasticsearch.org/overview/
    1. Interesting because it supports real time analytics and real time search, document oriented, restful(like lucene) and full text search
  9. BaseX http://basex.org/
    1. Xml Database with full text search using XPath for search.
  10. Datapark search http://www.dataparksearch.org/ for search within a website or group or intarnet
    1. Documentation http://www.dataparksearch.org/index.en.html
  11. ht://Dig http://www.htdig.org/
  12. Apache Lucy http://lucy.apache.org/
    1. Loose C port for Lucene(Java search engine)
    2. Full Text Search
  13. Lemur Project http://www.lemurproject.org/
  14. Search for Websites http://www.searchdaimon.com/
  15. http://swish-e.org/ Swish-e

Oct 12-13 2013

  1. Domain for nData Consulting and nDataAnalytics and the nDataConsulting website about to expire in a month or so. Need to renew it.
  2. Discuss the visualization status and the progress made and the next steps.

September 28

  1. Discuss the project meeting about Recipe client.
  2. Discuss the current progress on the visualizations by Asad.

September 21 – September 22 2013

  1. Discuss report in which the architecture used in analysis and the visualization future steps as well as steps used in the generation of clusters. The
    file is here. The original iWork pages file is here . The doc file is
    . I will add the pdf file for all to see at the time of meeting or when the document is finished whichever comes first as right now still writing the document.
  2. Will need to discuss the previous point if the direction is correct and next steps. Also need to come up with some visualization for company as can’t display 2500 nodes and 90000 edges.
  3. Discuss the angular.js project. (STATUS: I turned down the project as neither me or Asad can fulfill the requirements of the project and the requirement is straight 8 hours i can’t commit to that as i work in mutliple periods of day and need to go for prayers and can’t work 8 hours online straight and also the learning curve of angular.js is right now too high to learn within a week and start work straight away … Me and Asad will explore angular.js but we can’t commit to this project right now).
  4. From Jawad: Just an update that on Monday i have meeting with the client which i have worked in past for mobile development project related to Ruby on Rails project i have worked in the past related to recipes. I will update after Monday what the client says.

 

August 31 – September 1st 2013

  • Asad:Need to discuss the work done on the visualization for Venture Captial Dataset
    • Show the visualizations
    • Discuss the generation of data
    • Discuss integration with mongodb using spring architecture.
    • Discuss modifications on bubble graph
    • Discuss modification on the basic graph generated when clicking on a firm.
    • Discuss the main visualization for displaying the companies invested when clicked upon a firm.
    • Discuss if anything is to be done if clicked upon a company and what exactly?
    • Discuss next week goals/work
  • Asad: Discuss the progress about demo prepared for d3.js client and any future developments.
  • Fawad bhai: Discuss if there is anything to discuss.
  • Jawad: Need to discuss the Rubric-based assessment with personalized learning recommendations article shared (LINKto article)
    • I have read the article and to some extent understood the architecture for the recommendation system. How to go forward from here?
    • What are the next steps?
  • Jawad:Share progress related to clustering
    • Discuss the MasterProjectRA.pdf which contains clustering algorithms
    • Discuss progress on the MasterProjectRA
    • Discuss next steps for the clustering of the data for venture capital and steps for it.

Venture Capital – Startup Network

Venture Capital – Startup Network

Databases In Review

Can the new NoSQL databases formats like key-store, graph based, column family datastore, and document oriented databases compete with already optimized relational databases like oracle, MySQL etc.

The traditional relational databases have little room for improvement as they are highly optimized and are already in place in number of applications but these relational databases don’t scale well that is where the different type of NoSQL databases come in they don’t some things like transactions or comprise on some features but they are built to be fast and scalable. Also, with the fast speed of changes in applications and need to adapt the database quickly to changing requirements and changing database structure the traditional databases are more difficult to change. Changing the traditional databases requires changing the whole database structure, change all the applications which uses the applications and this makes it harder for the databases to accomodate change whereas in less critical areas like in social network data where change is normal and using traditional databases is too difficult to use and to scale NoSQL provides the flexibility to be able to add change the structure for new data and merge different format of data without changing existing applications. Sharding and replication work well with large databases. With the progression of internet more and more data is collected by all organizations and existing databases fail to accomodate Big data. In order to accomodate big data there is a need to use technologies like NoSQL databases, hadoop, map-reduce and similar techniques to reduce the problems in smaller chunks and use cloud computing to do what is not possible to do anymore in traditional databases.

In the past if you had a lot of data with a lot of columns and based on the columns you wanted to find a pattern between the variables and the output we are interested to analyze you would use statistical analysis. These statistical models are too difficult to use when the data approaches a large scale i.e. Big data. Big data makes the statistical models slow to use and impossible to use. So in order to use them there is a need to make some kind of algorithms which distribute the data in buckets, uses hadoop and map-reduce to apply some kind of calculation we are interested in and apply them to smaller problems, finding the result and merging them to get the result we want. This involves now use of cloud computing.