Reflecting on Metadata and the Semantic web: May 2008

Friday, 23 May 2008

Digital images

I've started writing my report and I'm using a digital photo to illustrate the point. While thinking about it, I reminded myself of some stuff from Lev Manovich, and his crazy digital world.

I was writing about digital data, and how it allows us to have a way of synthesizing images. We simply reverse the photo capturing process and we can form any image we want.

I'm trying to find some article by Manovich to illustrate this point better.

Thursday, 22 May 2008

Distinguishing between essence data and metadata

I need a good example to demonstrate the differences between these two classes of data, and I'm going to use a digital photograph, tagged on Flickr: the pixel values being examples of essence data, and the contextual tags as metadata. I think it illustrates the point well.

Tuesday, 20 May 2008

Trust in C2C e-commerce

Jones K., Leonard L. N. K., "Trust in consumer-to-consumer electronic commerce", Information & Management, 45 (2008) 88-95

Jones and Leonard present their findings from considering C2C e-commerce when testing on students. They provide a definition of trust by referencing [1]. In the article by McKnight et al. they define using a model based on three factors:

Disposition to trust
institution-based trust
trusting beliefs

Disposition to trust is based on the attitude to the trustor to others (the trustee). This attitude is based on their individual faith in humanity (a general feeling that others can be trusted) and their trusting stance (the belief that good things happen as a result of doing good).

Institution-based trust provides reassurance. It relies on the trustor feeling that a system is in place to protect them in making a decision. This is normally a promise, guarantee or legal solution. But in order to gain this assurance, the trustor has to feel that everything is normal: that normal rules apply and that each condition of the trust still applies. They must have confidence in the assurance being given.

Trusting beliefs is also based on confidence of the trustor, but the confidence the trustor has about the trustee, and this is affected by many qualities which we as humans can measure, and deem to be representative of one's nature such as integrity, competence and benevolence.

With this proposed definition of trust, Jones and Leonard proposed four potential influences on trust for a C2C e-commerce application:

natural propensity to trust
perception of website quality
the extent to which others trust the trustee
recognition from third parties

Interestingly, from the above list, Jones and Leonard conclude that only perception of website quality and recognition of third parties had any significant affect on a user's trust of a website. Although they also go on to suggest there many be other factors to consider. Speaking as a student myself, I feel the conclusions drawn by Jones and Leonard in this article are a direct result of the views of todays students in society. We so-called 'Gen Yers' have grown up with technology, and therefore e-commerce. We are familiar with it, and therefore comfortable. Our natural propensity to trust is already great as this is what we've always been doing. It's what we have had to do in order to survive online, and take advantage of the services being provided. The extent to which others trust the trustee may affect our own opinions to trust: indeed, in many instances I have used public, subject-based, forums and blogs to gauge an informed opinion on a subject. But I have also learnt not to trust these sources if I myself known a great deal on the subject already. I don't know why I have this double standard. Perhaps there is some threshold to which I suddenly become confident, and trust my own opinion over someone elses.

I've enjoyed diving into the inner workings of the word "trust", and the definition will certainly be useful when I'm considering it in the context of Web 3.0. I feel each of the proposed factors to trust will certainly affect how users feel about a website. I will be certain to consider each of the four proposed factors to influence trust. The result presented here may be valid for the sample used in the experiment, but for wider, more diverse markets these two factors alone will not suffice. A website which builds trust will be successful as users will return, and spread their stories success with their friends.

[1] McKnight D. H., Choudhury V., Kacmar C., "Developing and validating trust measures for e-commerce: an integrative typology", Information Systems Research, 13 (2002) 334-359

Thoughts so far

So far I have only really been reading journal articles. I think this is because this is what I'm most used to and comfortable with when doing research due to my scientific background. Considering our lecture today, I feel this is quite a broad resource that features many deatiled articles on detailed subjects, but it doesn't provide the depth needed for a dissertation. I need to consider case studies, and any observations I can make in real-time in order to produce a meaningful, and distinction-worthy manuscript.

Despite this however, the articles have helped me to narrow down my topic and think more about human behavior on the web, and how this behavior may be affected by the coming Web 3.0 revolution. I seem to be thinking about trust. I think it could be a major factor in determining why people still don't trust the web as a real service, why it is still impersonal and hence, why people have their reservations about using it.

In my final practical project I aim to provide a service through the web. In order to do this, I need to investigate how I can build user's trust, and this is how I regard the relationship between my final project and dissertation.

Thursday, 15 May 2008

Trust on the Web

My research has led me to think about trust on the web. Many of today's internet applications rely on trust: eBay and PayPal are only a few. It seems internet users are completely divided on the matter. I was at some event recently and I was speaking to mother about how she uses computers. She told me she didn't shop online at all because she couldn't trust it. It was an interesting attitude because I personally feel more secure about making payments online. I know they are near instant, I can do that from whichever computer I happen to be sitting at at the time, and I don't need to leave that seat in order to do it.

Semantic Web provides context to data. Context provides the means for the data to be checked and validated. But the problem of fake data hasn't disappeared, it can just shift to having fake metadata instead. I wonder if Semantic markup will build up the trust of the internet user? How will it do this? What information could it provide the user that they are secure? Certainly in many situations it may have the opposite effect: machine-readable context will allow machines to go off and bring back relevant data automatically. To an unsuspecting user, this could be quite an un-nerving experience, especially if the machine knew what extra data to provide based on what other people wanted when searching for other similar things. I'm getting a bit carried away here, but I'm thinking about the changes Semantics will make to our experiences on the web, and I think there will have to be a lot of trust before the technology can take off.

Of course, this is only true for the internet users like that mother at the party. There are internet users (myself included) that will jump at the opportunity of having a computer provide them with relevant data before they themselves have even realised they want it. In my limited experience, younger generations will almost certainly take this attitude and some there will be plenty of scope for the Semantic Web or "Web 3.0" to survive.

Zempod: Semantic podcasting

Celma Ò., Raimond Y., 'Zempod: A semantic web approach to podcasting', J. Web Semantics: Science, Services and Agents on the World Wide Web, 6 (2008) 162-169.

This articles outlines a process for the automatic extraction of metadata from a given podcast. The system called "Zempod" can identify segments of speech and segments of music within a podcast. It then has workflows designed to collect metadata from each of the segment forms: music segments have a unique fingerprint which can be matched with a record in a database and therefore used to identify the track and provide the metadata about the track. Similarly, speech segments can be put through some speech recognition to produce a transcript and keywords. These methods provide searchable podcasts which can be related to other objects on the internet through Semantic markup.

What I found particularly interesting in this article was that Celma and Raimond attempt to use Semantic markup in order to solve a problem, as opposed to trying it out. Web 3.0 philosophies are incorporated specifically to provide a solutions to the problem of not knowing what a podcast is about. Some information is provided about podcasts such as a brief title and/or description but this comes back to an early thought, how can we trust what's written?

Health Care and life sciences data mashup using Web 2.0/3.0

Cheung K-H et al., 'HCLS 2.0/3.0: Health care and life sciences data mashup using Web 2.0/3.0', J. Biomed. Inform. (2008), doi: 10.1016/j.jbi.2008.04.001

In the above article, Cheung et al. attempt to 'mashup' or combine two independent sets of data from various sources on the internet. They do so using various web services which automatically combine the data sets based on some programmed rules. They did this in order to increase the productivity and workflow of a research study involving a spotted microarray. They begin with a description of the current state of the web - Web 2.0. and describe the various internet tools used to perform the data mashup. They then apply their principles to real data and produce an interesting heat map based on cancer rates in the States and water pollution. The article then goes on to describe a Semantic Web (or Web 3.0), and summarises how it is applied now to Heath care and life sciences (HCLS) data by describing a typical scenario which demonstrates the benefits that Semantics can provide.

Cheung et al. have used inexpensive, publicly-available tools to combine interesting sets of data to provide a new visual form of each set. Whereas this is quite an achievement in itself, I feel it's worth noting that the point here was not so much about the data that this new visual map displays, but that such a map is possible with today's technology and resources. The conclusion of Cheung et al. supports this by explaining how simple data mashups can be, and that programming experience is not at all necessary. This will be important in the future as it will become easier for people to get the specific data they need and filter out the rest.

My project title is focused on metadata - the data about data. The semantic markup in web applications is metadata, and this article has demonstrated the use of Semantics in web applications. We have seen how this metadata will facilitate searches in the future and improve the internet experience for the user. But how difficult will the incorporation of Semantics to today's Web 2.0 applications be? It seems the schemes in place to provide these Semantics are drawn-out and complicated. How will this affect the transition to Web 3.0?

Tuesday, 13 May 2008

Social Networking and Semantic Web Technology in Software Engineering

Dietrict, J., Jones, N., Wright, J., 'Using Social Networking and Semantic Web Technology in Software Engineering - Use Cases, Patterns, and a Case Study', The Journal of Systems and Software (2008), doi: 10.1016/j.jss.2008.03.060

The above article is the first piece of literature I've read on the subject of Semantic Web. It attempts to apply well-established social network methodologies to the Software Engineering community in order to share, discuss and rate patterns of code.

Although this article provides a new service for the community. The idea of the service in this particular instance is new and so too is it's implementation, but it's foundation of ideas and concepts are standard practice in today's social network applications. Despite this lack of creativity, the authors did, however, consider the 'trustworthiness' of the commodities they were dealing with. In the article, the authors attempt to gauge the trustworthiness of a resource using certain measurable properties such as average user rating, and number of inbound links. This is interesting because although these properties are already use to assess certain properties about a given object, I like that it is being used to judge trustworthiness instead of perhaps, more commonly, popularity. It seems today's applications are to caught up with including a social aspect that they forget to consider the consequences of doing so. Open-access, collective contributions may raises questions which simply aren't answered now:

What is this person's real background?
How can I trust what this person is saying/doing?
Why do they want my response?
What do they want me to do with my response?

I believe these questions demonstrate a natural inquisitive response for any slightly skeptical human being. They are real questions people may develop, and they are rooted in people's sense of security on the net. I think internet security will be another interesting issue to investigate, and it may provide some insight as to the fears of internet users in the future. These fears could be used to drive development of software to provide some kind of reassurance to surfers, and prove to successful.

Progression so far

I'm beginning this module with a literature review of the use of 'semantic web' in current applications. I want to get some sense of how the shapers of today feel that these technologies will affect internet applications in the future. This approach should provide me with a well-rounded foundation upon which to build my own view.

Monday, 12 May 2008

My Attitude towards this blog

Before I begin this blog of reflection, I wanted to explain how I will use it to log my thoughts on the research I'm doing. The project I'm involved with requires me to write roughly 5 posts of 200 words each. Although this is an approximate guide, I don't think it will suit my methods of thinking. I like to have ideas, write them down and move onto the next one. This will show much more of a progression in my thinking rather than just the conclusions of my thoughts.

I aim to keep to the word count as close to 1000 words as much as possible, but I will post often and on very specific topics. I will recap and summarise when I'm recapping and summarising in my head. That way I hope the reader can take the journey of research with me.

Thank you

Reflecting on Metadata and the Semantic web