Mining patterns using ROC curve Use case economics,Biometrics

Receiver Operator curve or ROC curve are used in data mining , machine learning. from area under ROC curve u can calculate Gini coefficient. I have made an excel template

2013-05-12 18.58.03

Example to show how its calculated.

if AUC is area under curve then,

G= 2AUC-1

Gini coefficient the most watched coefficient of economics these days :
I wrote a article comparing different countries of world with data available

http://sandyclassic.wordpress.com/2013/02/06/watch-gini-coefficient-only-show-income-distribution-not-lowhigh-income-distribution/

Gini coefficient AUC has some component of noise which called to question of better measures which are used in machine learning DeltP or informedness ,mattews correlation coefficient each one is suitable to its own field while informedness=1 shows perfect performance while -1 represent perverse of negative performance despite all informedness. Economics Gini zero shows perfect equality.

So parameters keep improving there is no end result and there cannot be as our understanding increases we come at better measures and change is constant..but what is truth today was mystery or magic for old and would be kind of half truth for future..But the subjects are interconnected the branching of knowledge areas is going on since last 250 yrs.. earlier there was no engineering everything was under philosophy during Socrates. Socrates rightly said : that you cannot say anything with absolute certainty. But you can have informed decision that is what informedness quantifies that your decision how much they are informed decisions.

See a case from Biometrics:

Data Science, Master data management,hadoop and Informatica

MDM:-> What does it do?

MDM seeks to ensure that an organization does not use multiple version/terms (potentially inconsistent) versions of the same master data in different parts of its operations, which can occur in large organizations.Thus CRM, DW/BI, Sales,Production ,finance each has its own way of representing things

There are lot of Products in MDM space One that have good presence in market are:

Tibco Information collaboration tool leader

Collaborative Information Manager.

– work on to standardize across ERP,CRM,DW,PLM

– cleanising and aggregation.

– distribute onwers to natural business users of data(sales,Logistics,Finance,HR,Publishing)

– automated Business Processes to clollaborate to maintain info asset and data governace poilcy

– built in data models can extended (industry template,validation rule)

– built in process to manage change elliminate confusion manageing change ,estb clear audit and governace trail for reporting.

– sync relevant subset of info  downstream application trading partner and exchanges.SOA to pass data to as web service to composite applications.

IBM MDM Inforsphere MDM Server

Still its incomplete i will continue to add on this.

Product detail( informatica.com)

source: (http://www.biia.com/wp-content/uploads/2012/01/White-Paper-1601_big_data_wp.pdf)

Short Notes below taken from source:+ My comments on them.

Informatica MDM capabilities:

Informatica 9.1 supplies master data management (MDM) and data quality technologies to

enable your organization to achieve better business outcomes by delivering authoritative, trusted data to business processes, applications, and analytics, regardless of the diversity or scope of Big

Data.

Single platform for all MDM architectural styles and data domains Universal MDM capabilities

in Informatica 9.1 enable your organization to manage, consolidate, and reconcile all master

data, no matter its type or location, in a single, unified solution. Universal MDM is defined by four

characteristics:

• Multi-domain: Master data on customers, suppliers, products, assets, locations, can be managed, consolidated, and accessed.

• Multi-style: A flexible solution may be used in any style: registry, analytical, transactional, or

co-existence.

• Multi-deployment: The solution may be used as a single-instance hub, or in federated, cloud, or service architectures.

• Multi-use: The MDM solution interoperates seamlessly with data integration and data quality technologies as part of a single platform.

Universal MDM eliminates the risk of standalone, single MDM instances—in effect, a set of data silos meant to solve problems with other data silos.

• Flexibly adapt to different data architectures and changing business needs

• Start small in a single domain and extend the solution to other enterprise domains, using any style

• Cost-effectively reuse skill sets and data logic by repurposing the MDM solution

“No data is discarded anymore!

U.S. xPress leverages a large scale of transaction data and a diversity of interaction data, now extended

to perform big data processing like Hadoop with Informatica 9.1. We assess driver performance with image files and pick up

customer behaviors from texts by customer service reps. U.S. xPress saved millions of dollars per year by reducing fuels and optimizing

routes augmenting our enterprise data with sensor, meter, RFID tags, and geospatial data.” Tim Leonard Chief Technology Officer

Source: U.S. xPress Big Data Unleashed: Turning Big Data into Big Opportunities with the Informatica 9.1 Platform.

Reusable data quality policies across all project types Interoperability among the MDM, data quality, and data integration capabilities in Informatica 9.1 ensures that data quality rules can

be reused and applied to all data throughout an implementation lifecycle, across both MDM and data integration projects (see Figure 3).

• Seamlessly and efficiently apply data quality rules regardless of project type, improving data accuracy

• Maximize reuse of skills and resources while increasing ROI on existing investments

• Centrally author, implement, and maintain data quality rules within source applications and propagate downstream

Proactive data quality assurance Informatica 9.1 delivers technology that enables both business and IT users to proactively monitor and profile data as it becomes available, from

internal applications or external Big Data sources. You can continuously check for completeness, conformity, and anomalies and receive alerts via multiple channels when data quality issues are

found.

• Receive “early warnings” and proactively identify and correct data quality problems before they happen

• Prevent data quality problems from affecting downstream applications and business processes

• Shorten testing cycles by as much as 80 percent

Putting Authoritative and Trustworthy Data to Work

The diversity and complexity of Big Data can worsen the data quality problems that exist in

many organizations. Standalone, ad hoc data quality tools are ill equipped to handle large-scale

streams from multiple sources and cannot generate the reliable, accurate data that enterprises

need. Bad data inevitably means bad business. In fact, according to a CIO Insight report, 46

percent of survey respondents say they’ve made an inaccurate business decision based on bad or

outdated data.9

MDM and data quality are prerequisites for making the most of the Big Data opportunity. Here are

two examples:

Using social media data to attract and retain customers For some organizations, tapping

social media data to enrich customer profiles can be putting the cart before the horse. Many

companies lack a single, complete view of their customers, ranging from reliable and consistent

names and contact information to the products and services in place. Customer data is

often fragmented across CRM, ERP, marketing automation, service, and other applications.

Informatica 9.1 MDM and data quality enable you to build a complete customer profile from

multiple sources. With that authoritative view in place, you’re poised to augment it with the

intelligence you glean from social media.

Data-driven response to business issues Let’s say you’re a Fortune 500 manufacturer and

a supplier informs you that a part it sold you is faulty and needs to be replaced. You need

answers fast to critical questions: In which products did we use the faulty part? Which

customers bought those products and where are they? Do we have substitute parts in stock?

Do we have an alternate supplier?

But the answers are sprawled across multiple domains of your enterprise—your procurement

system, CRM, inventory, ERP, maybe others in multiple countries. How can you respond swiftly

and precisely to a problem that could escalate into a business crisis? Business issues often

span multiple domains, exerting a domino effect across the enterprise and confounding

an easy solution. Addressing them depends on seamlessly orchestrating interdependent

processes—and the data that drives them.

With the universal MDM capabilities in Informatica 9.1, our manufacturer could quickly locate

reliable, authoritative master data to answer its pressing business questions, regardless of

where the data resided or whether multiple MDM styles and deployments were in place.

Self-Service

Big Data’s value is limited if the business depends on IT to deliver it. Informatica 9.1 enables your

organization to go beyond business/IT collaboration to empower business analysts, data stewards,

and project owners to do more themselves without IT involvement with the following capabilities

Analysts and data stewards can assume a greater role in

defining specifications, promoting a better understanding of the data, and improving productivity

for business and IT.

• Empower business users to access data based on business terms and semantic metadata

• Accelerate data integration projects through reuse, automation, and collaboration

• Minimize errors and ensure consistency by accurately translating business requirements into

data integration mappings and quality rules

Application-aware accelerators for project owners:

empowers project owners to rapidly understand and access data for data

warehousing, data migration, test data management, and other projects. Project owners can

source business entities within applications instead of specifying individual tables that require

deep knowledge of the data models and relational schemas.

•Reduce data integration project delivery time

•Ensure data is complete and maintains referential integrity

• Adapt to meet business-specific and compliance requirements

Informatica 9.1 introduces complex event processing (CEP) technology into data quality and

integration monitoring to alert business users and IT of issues in real time. For instance, it will notify an analyst if a data quality key performance indicator exceeds a threshold, or if integration processes differ from the norm by a predefined percentage.

• Enable business users to define monitoring criteria by using prebuilt templates

• Alert business users on data quality and integration issues as they arise

• Identify and correct problems before they impact performance and operational systems

• Speeding and strengthening business effectiveness Informatica 9.1 makes “MDM-aware”

everyday business applications such as Salesforce.com, Oracle, Siebel, SAP for CRM, ERP, and

others by presenting reconciled master data directly within those applications. For example,

Informatica’s MDM solution will advise a salesperson creating a new account for “John Jones”

that a customer named Jonathan Jones, with the same address, already exists. Through

the Salesforce interface, the user can access complete, reliable customer information that

Informatica MDM has consolidated from disparate applications.

She can see the products and services that John has in place and that he follows her

company’s Twitter tweets and is a Facebook fan. She has visibility into his household and

business relationships and can make relevant cross-sell offers. In both B2B and B2C scenarios,

MDM-aware applications spare the sales force from hunting for data or engaging IT while

substantially increasing productivity.

• Giving business users a hands-on role in data integration and quality Long delays and

high costs are typical when the business attempts to communicate data specifications to

IT in spreadsheets. Part of the problem has been the lack of tools that promote business/IT

collaboration and make data integration and quality accessible to the business user.

As Big Data unfolds, Informatica 9.1 gives analysts and data stewards a hands-on role. Let’s

say your company has acquired a competitor and needs to migrate and merge new Big Data

into your operational systems. A data steward can browse a data quality scorecard and identify

anomalies in how certain customers were identified and share a sample specification with IT.

Once validated, the steward can propagate the specification across affected applications. A

role-based interface also enables the steward to view data integration logic in semantic terms

and create data integration mappings that can be readily understood and reused by other

business users or IT. Big Data Unleashed: Turning Big Data into Big Opportunities with the Informatica 9.1 Platform

Collaboration Management System relation to Analytics and data Science

Collaboration tools integrated offering (course grain integration using ) integration tools like TIBCO, Oracle BPEL, : Components to be integrated:
1. Content management system CMS  (SharePoint, Joomla, drupal) and
2. Document Management system like (liferay, Document-um, IBM file-net) can be integrated using flexible integration tools.

3. Communication platform like Windows Communication Foundation ,IBM lotus notes integrated with mail client and Social network like Facebook using Facebook API, LinkedIn API, twitter API ,skype API to direct plugin as well as data Analysis of Social networking platform unstructured data captured of the collaboration for the project discussion.
soft-phone using Skype offering recording conversation facility for later use.

http://sandyclassic.wordpress.com/2013/06/19/how-to-do-social-media-analysis/

Oracle Web centre:
http://sandyclassic.wordpress.com/2011/11/04/new-social-computing-war-oracle-web-centre/
4. Integrated Project specific Wikki/Sharepoint/other CMS pages integrated with PMO site Artefacts, Enterprise Architecture Artefacts.
5. seamless integration to Enterprise Search using Endeca or Microsoft FAST for discovery of document, information, answers from indexed,tagged repository of data.
6. Structured and Unstructured data : hosted on Hadoop clusters using Map-reduce algorithm to Analyse data, consolidate data using Hadoop Hive, HBase and mining to discover hidden information using data mining library in Mahout for unstructured data.
Structured data kept in RDBMS clusters like RAC rapid application clusters.
http://sandyclassic.wordpress.com/2011/10/19/hadoop-its-relation-to-new-architecture-enterprise-datawarehouse/


http://sandyclassic.wordpress.com/2013/07/02/data-warehousing-business-intelligence-and-cloud-computing/
7. Integrated with Domain specific Enterprise resource planning ERP packages the communication, collaboration,Discovery, Search layer.
8. All integrated with mesh up architecture providing real-time information maps of resource located and information of nearest help.
9. messaging and communication layer integrated with all on-line company software.
10.Process Orchestration and integration Using Business Process Management tool BPM tool, PEGA BPM, Jboss BPM , windows workflow foundation depending landscape used.
11. Private cloud integration using Oracle cloud , Microsoft Azure, Eucalyptus, open Nebula integrated with web API other web platform landscape.
http://sandyclassic.wordpress.com/2011/10/20/infrastructure-as-service-iaas-offerings-and-tools-in-market-trends/
12. Integrated BI system with real time information access by tools like TIBCO spotfire which can analyse real time data flowing between integrated systems.
Data centre API and virtualisation plaform can also throw in data for analysis to hadoop cluster.
External links for reference: http://www.sap.com/index.epx
http://www.oracle.com,http://www.tibco.com/,http://spotfire.tibco.com/,
http://scn.sap.com/thread/1228659
S
AP XI: http://help.sap.com/saphelp_nw04/helpdata/en/9b/821140d72dc442e10000000a1550b0/content.htm

Oracle Web centre: http://www.oracle.com/technetwork/middleware/webcenter/suite/overview/index.html

CMS: http://www.joomla.org/,http://www.liferay.com/http://www-03.ibm.com/software/products/us/en/filecontmana/
Hadoop: http://hadoop.apache.org/

Map reduce: http://hadoop.apache.org/docs/stable/mapred_tutorial.html
f
acebook API: https://developers.facebook.com/docs/reference/apis/
L
inkedin API: http://developer.linkedin.com/apis
T
witter API: https://dev.twitter.com/